matchms.Scores module¶
- class matchms.Scores.Scores(references: List[object] | Tuple[object] | ndarray, queries: List[object] | Tuple[object] | ndarray, is_symmetric: bool = False)[source]¶
Bases:
object
Contains reference and query spectrums and the scores between them.
The scores can be retrieved as a matrix with the
Scores.scores
attribute. The reference spectrum, query spectrum, score pairs can also be iterated over in query then reference order.Example to calculate scores between 2 spectrums and iterate over the scores
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.similarity import CosineGreedy spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={'id': 'spectrum1'}) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={'id': 'spectrum2'}) spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]), intensities=np.array([0.6, 0.2, 0.1]), metadata={'id': 'spectrum3'}) spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.6, 0.1, 0.6]), metadata={'id': 'spectrum4'}) references = [spectrum_1, spectrum_2] queries = [spectrum_3, spectrum_4] similarity_measure = CosineGreedy() scores = calculate_scores(references, queries, similarity_measure) for (reference, query, score) in scores: print(f"Cosine score between {reference.get('id')} and {query.get('id')}" + f" is {score[0]:.2f} with {score[1]} matched peaks")
Should output
Cosine score between spectrum1 and spectrum4 is 0.80 with 3 matched peaks Cosine score between spectrum2 and spectrum3 is 0.14 with 1 matched peaks Cosine score between spectrum2 and spectrum4 is 0.61 with 1 matched peaks
- __init__(references: List[object] | Tuple[object] | ndarray, queries: List[object] | Tuple[object] | ndarray, is_symmetric: bool = False)[source]¶
- Parameters:
references – List of reference objects
queries – List of query objects
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.
- calculate(similarity_function: BaseSimilarity, name: str | None = None, array_type: str = 'numpy', join_type='left') Scores [source]¶
Calculate the similarity between all reference objects vs all query objects using the most suitable available implementation of the given similarity_function. If Scores object already contains similarity scores, the newly computed measures will be added to a new layer (name –> layer name). Additional scores will be added as specified with join_type, the default being ‘left’.
- Parameters:
similarity_function – Function which accepts a reference + query object and returns a score or tuple of scores
name – Label of the new scores layer. If None, the name of the similarity_function class will be used.
array_type – Specify the type of array to store and compute the scores. Choose from “numpy” or “sparse”.
join_type – Choose from left, right, outer, inner to specify the merge type.
- filter_by_range(**kwargs)[source]¶
Remove all scores for which the score name is outside the given range.
- Parameters:
kwargs – See “Keyword arguments” section below.
- Keyword Arguments:
name – Name of the score which is used for filtering. Run .score_names to see all scores stored in the sparse array.
low – Lower threshold below which all scores will be removed.
high – Upper threshold above of which all scores will be removed.
above_operator – Define operator to be used to compare against low. Default is ‘>’. Possible choices are ‘>’, ‘<’, ‘>=’, ‘<=’.
below_operator – Define operator to be used to compare against high. Default is ‘<’. Possible choices are ‘>’, ‘<’, ‘>=’, ‘<=’.
- scores_by_query(query: List[object] | Tuple[object] | ndarray, name: str | None = None, sort: bool = False) ndarray [source]¶
Return all scores for the given query spectrum.
For example
import numpy as np from matchms import calculate_scores, Scores, Spectrum from matchms.similarity import CosineGreedy spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={'id': 'spectrum1'}) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={'id': 'spectrum2'}) spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]), intensities=np.array([0.6, 0.2, 0.1]), metadata={'id': 'spectrum3'}) spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.6, 0.1, 0.6]), metadata={'id': 'spectrum4'}) references = [spectrum_1, spectrum_2, spectrum_3] queries = [spectrum_2, spectrum_3, spectrum_4] scores = calculate_scores(references, queries, CosineGreedy()) selected_scores = scores.scores_by_query(spectrum_4, 'CosineGreedy_score', sort=True) print([x[1][0].round(3) for x in selected_scores])
Should output
[0.796, 0.613]
- Parameters:
query – Single query Spectrum.
name – Name of the score that should be returned (if multiple scores are stored).
sort – Set to True to obtain the scores in a sorted way (relying on the
sort()
function from the given similarity_function).
- scores_by_reference(reference: List[object] | Tuple[object] | ndarray, name: str | None = None, sort: bool = False) ndarray [source]¶
Return all scores of given name for the given reference spectrum.
- Parameters:
reference – Single reference Spectrum.
name – Name of the score that should be returned (if multiple scores are stored).
sort – Set to True to obtain the scores in a sorted way (relying on the
sort()
function from the given similarity_function).
- to_array(name=None) ndarray [source]¶
Scores as numpy array
For example
import numpy as np from matchms import calculate_scores, Scores, Spectrum from matchms.similarity import IntersectMz spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1])) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1])) spectrums = [spectrum_1, spectrum_2] scores = calculate_scores(spectrums, spectrums, IntersectMz()).to_array() print(scores.shape) print(scores)
Should output
(2, 2) [[1. 0.2] [0.2 1. ]]
- Parameters:
name – Name of the score that should be returned (if multiple scores are stored).
- to_coo(name=None) coo_matrix [source]¶
Scores as scipy sparse COO matrix
- Parameters:
name – Name of the score that should be returned (if multiple scores are stored).