matchms.Scores module

class matchms.Scores.Scores(references: List[object] | Tuple[object] | ndarray, queries: List[object] | Tuple[object] | ndarray, is_symmetric: bool = False)[source]

Bases: object

Contains reference and query spectrums and the scores between them.

The scores can be retrieved as a matrix with the Scores.scores attribute. The reference spectrum, query spectrum, score pairs can also be iterated over in query then reference order.

Example to calculate scores between 2 spectrums and iterate over the scores

import numpy as np
from matchms import calculate_scores
from matchms import Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2]
queries = [spectrum_3, spectrum_4]

similarity_measure = CosineGreedy()
scores = calculate_scores(references, queries, similarity_measure)

for (reference, query, score) in scores:
    print(f"Cosine score between {reference.get('id')} and {query.get('id')}" +
          f" is {score[0]:.2f} with {score[1]} matched peaks")

Should output

Cosine score between spectrum1 and spectrum4 is 0.80 with 3 matched peaks
Cosine score between spectrum2 and spectrum3 is 0.14 with 1 matched peaks
Cosine score between spectrum2 and spectrum4 is 0.61 with 1 matched peaks
__init__(references: List[object] | Tuple[object] | ndarray, queries: List[object] | Tuple[object] | ndarray, is_symmetric: bool = False)[source]
Parameters:
  • references – List of reference objects

  • queries – List of query objects

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.

calculate(similarity_function: BaseSimilarity, name: str | None = None, array_type: str = 'numpy', join_type='left') Scores[source]

Calculate the similarity between all reference objects vs all query objects using the most suitable available implementation of the given similarity_function. If Scores object already contains similarity scores, the newly computed measures will be added to a new layer (name –> layer name). Additional scores will be added as specified with join_type, the default being ‘left’.

Parameters:
  • similarity_function – Function which accepts a reference + query object and returns a score or tuple of scores

  • name – Label of the new scores layer. If None, the name of the similarity_function class will be used.

  • array_type – Specify the type of array to store and compute the scores. Choose from “numpy” or “sparse”.

  • join_type – Choose from left, right, outer, inner to specify the merge type.

filter_by_range(**kwargs)[source]

Remove all scores for which the score name is outside the given range.

Parameters:

kwargs – See “Keyword arguments” section below.

Keyword Arguments:
  • name – Name of the score which is used for filtering. Run .score_names to see all scores stored in the sparse array.

  • low – Lower threshold below which all scores will be removed.

  • high – Upper threshold above of which all scores will be removed.

  • above_operator – Define operator to be used to compare against low. Default is ‘>’. Possible choices are ‘>’, ‘<’, ‘>=’, ‘<=’.

  • below_operator – Define operator to be used to compare against high. Default is ‘<’. Possible choices are ‘>’, ‘<’, ‘>=’, ‘<=’.

scores_by_query(query: List[object] | Tuple[object] | ndarray, name: str | None = None, sort: bool = False) ndarray[source]

Return all scores for the given query spectrum.

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2, spectrum_3]
queries = [spectrum_2, spectrum_3, spectrum_4]

scores = calculate_scores(references, queries, CosineGreedy())
selected_scores = scores.scores_by_query(spectrum_4, 'CosineGreedy_score', sort=True)
print([x[1][0].round(3) for x in selected_scores])

Should output

[0.796, 0.613]
Parameters:
  • query – Single query Spectrum.

  • name – Name of the score that should be returned (if multiple scores are stored).

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

scores_by_reference(reference: List[object] | Tuple[object] | ndarray, name: str | None = None, sort: bool = False) ndarray[source]

Return all scores of given name for the given reference spectrum.

Parameters:
  • reference – Single reference Spectrum.

  • name – Name of the score that should be returned (if multiple scores are stored).

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

to_array(name=None) ndarray[source]

Scores as numpy array

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import IntersectMz

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]))
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]))
spectrums = [spectrum_1, spectrum_2]

scores = calculate_scores(spectrums, spectrums, IntersectMz()).to_array()

print(scores.shape)
print(scores)

Should output

(2, 2)
[[1.  0.2]
 [0.2 1. ]]
Parameters:

name – Name of the score that should be returned (if multiple scores are stored).

to_coo(name=None) coo_matrix[source]

Scores as scipy sparse COO matrix

Parameters:

name – Name of the score that should be returned (if multiple scores are stored).

to_dict() dict[source]

Return a dictionary representation of scores.

to_json(filename: str)[source]

Export Scores to a JSON file.

Parameters:

filename – Path to file to write to

to_pickle(filename: str)[source]

Export Scores to a Pickle file.

Parameters:

filename – Path to file to write to

class matchms.Scores.ScoresBuilder[source]

Bases: object

Builder class for Scores.

__init__()[source]
build() Scores[source]

Build scores object

from_json(file_path: str)[source]

Import scores data from a JSON file. :param file_path: Path to the scores file.

class matchms.Scores.ScoresJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

__init__(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(o)[source]

JSON Encoder for a matchms.Scores.Scores object

encode(o)

Return a JSON string representation of a Python data structure.

>>> from json.encoder import JSONEncoder
>>> JSONEncoder().encode({"foo": ["bar", "baz"]})
'{"foo": ["bar", "baz"]}'
iterencode(o, _one_shot=False)

Encode the given object and yield each string representation as available.

For example:

for chunk in JSONEncoder().iterencode(bigobject):
    mysocket.write(chunk)