matchms.Scores module

class matchms.Scores.Scores(references: List[object] | Tuple[object] | ndarray, queries: List[object] | Tuple[object] | ndarray, is_symmetric: bool = False)[source]

Bases: object

Contains reference and query spectrums and the scores between them.

The scores can be retrieved as a matrix with the Scores.scores attribute. The reference spectrum, query spectrum, score pairs can also be iterated over in query then reference order.

Example to calculate scores between 2 spectrums and iterate over the scores

import numpy as np
from matchms import calculate_scores
from matchms import Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2]
queries = [spectrum_3, spectrum_4]

similarity_measure = CosineGreedy()
scores = calculate_scores(references, queries, similarity_measure)

for (reference, query, score) in scores:
    print(f"Cosine score between {reference.get('id')} and {query.get('id')}" +
          f" is {score[0]:.2f} with {score[1]} matched peaks")

Should output

Cosine score between spectrum1 and spectrum4 is 0.80 with 3 matched peaks
Cosine score between spectrum2 and spectrum3 is 0.14 with 1 matched peaks
Cosine score between spectrum2 and spectrum4 is 0.61 with 1 matched peaks
__init__(references: List[object] | Tuple[object] | ndarray, queries: List[object] | Tuple[object] | ndarray, is_symmetric: bool = False)[source]
Parameters:
  • references – List of reference objects

  • queries – List of query objects

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.

calculate(similarity_function: BaseSimilarity, name: str | None = None, array_type: str = 'numpy', join_type='left') Scores[source]

Calculate the similarity between all reference objects vs all query objects using the most suitable available implementation of the given similarity_function. If Scores object already contains similarity scores, the newly computed measures will be added to a new layer (name –> layer name). Additional scores will be added as specified with join_type, the default being ‘left’.

Parameters:
  • similarity_function – Function which accepts a reference + query object and returns a score or tuple of scores

  • name – Label of the new scores layer. If None, the name of the similarity_function class will be used.

  • array_type – Specify the type of array to store and compute the scores. Choose from “numpy” or “sparse”.

  • join_type – Choose from left, right, outer, inner to specify the merge type.

filter_by_range(**kwargs)[source]

Remove all scores for which the score name is outside the given range.

Parameters:

kwargs – See “Keyword arguments” section below.

Keyword Arguments:
  • name – Name of the score which is used for filtering. Run .score_names to see all scores stored in the sparse array.

  • low – Lower threshold below which all scores will be removed.

  • high – Upper threshold above of which all scores will be removed.

  • above_operator – Define operator to be used to compare against low. Default is ‘>’. Possible choices are ‘>’, ‘<’, ‘>=’, ‘<=’.

  • below_operator – Define operator to be used to compare against high. Default is ‘<’. Possible choices are ‘>’, ‘<’, ‘>=’, ‘<=’.

scores_by_query(query: List[object] | Tuple[object] | ndarray, name: str | None = None, sort: bool = False) ndarray[source]

Return all scores for the given query spectrum.

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2, spectrum_3]
queries = [spectrum_2, spectrum_3, spectrum_4]

scores = calculate_scores(references, queries, CosineGreedy())
selected_scores = scores.scores_by_query(spectrum_4, 'CosineGreedy_score', sort=True)
print([x[1][0].round(3) for x in selected_scores])

Should output

[0.796, 0.613]
Parameters:
  • query – Single query Spectrum.

  • name – Name of the score that should be returned (if multiple scores are stored).

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

scores_by_reference(reference: List[object] | Tuple[object] | ndarray, name: str | None = None, sort: bool = False) ndarray[source]

Return all scores of given name for the given reference spectrum.

Parameters:
  • reference – Single reference Spectrum.

  • name – Name of the score that should be returned (if multiple scores are stored).

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

to_array(name=None) ndarray[source]

Scores as numpy array

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import IntersectMz

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]))
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]))
spectrums = [spectrum_1, spectrum_2]

scores = calculate_scores(spectrums, spectrums, IntersectMz()).to_array()

print(scores.shape)
print(scores)

Should output

(2, 2)
[[1.  0.2]
 [0.2 1. ]]
Parameters:

name – Name of the score that should be returned (if multiple scores are stored).

to_coo(name=None) coo_matrix[source]

Scores as scipy sparse COO matrix

Parameters:

name – Name of the score that should be returned (if multiple scores are stored).

to_dict() dict[source]

Return a dictionary representation of scores.

to_json(filename: str)[source]

Export Scores to a JSON file.

Parameters:

filename – Path to file to write to

to_pickle(filename: str)[source]

Export Scores to a Pickle file.

Parameters:

filename – Path to file to write to

class matchms.Scores.ScoresBuilder[source]

Bases: object

Builder class for Scores.

__init__()[source]
build() Scores[source]

Build scores object

from_json(file_path: str)[source]

Import scores data from a JSON file. :param file_path: Path to the scores file.