matchms.similarity.CosineGreedy module¶
- class matchms.similarity.CosineGreedy.CosineGreedy(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
Bases:
matchms.similarity.BaseSimilarity.BaseSimilarity
Calculate ‘cosine similarity score’ between two spectra.
The cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’. The underlying peak assignment problem is here solved in a ‘greedy’ way. This can perform notably faster, but does occasionally deviate slightly from a fully correct solution (as with the Hungarian algorithm, see
CosineHungarian
). In practice this will rarely affect similarity scores notably, in particular for smaller tolerances.For example
import numpy as np from matchms import Spectrum from matchms.similarity import CosineGreedy reference = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1])) query = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1])) # Use factory to construct a similarity function cosine_greedy = CosineGreedy(tolerance=0.2) score = cosine_greedy.pair(reference, query) print(f"Cosine score is {score['score']:.2f} with {score['matches']} matched peaks")
Should output
Cosine score is 0.83 with 1 matched peaks
- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
- Parameters
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
- matrix(references: List[Spectrum], queries: List[Spectrum], is_symmetric: bool = False) numpy.ndarray ¶
Optional: Provide optimized method to calculate an numpy.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.
- Parameters
references – List of reference objects
queries – List of query objects
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) Tuple[float, int] [source]¶
Calculate cosine score between two spectra.
- Parameters
reference – Single reference spectrum.
query – Single query spectrum.
- Returns
Tuple with cosine score and number of matched peaks.
- Return type
Score
- sort(scores: numpy.ndarray)¶
Return array of indexes for sorted list of scores. This method can be adapted for different styles of scores.
- Parameters
scores – 1D Array of scores.
- Returns
Indexes of sorted scores.
- Return type
idx_sorted