matchms.similarity.ModifiedCosine module

class matchms.similarity.ModifiedCosine.ModifiedCosine(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]

Bases: matchms.similarity.BaseSimilarity.BaseSimilarity

Calculate ‘modified cosine score’ between mass spectra.

The modified cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is simply the difference in precursor-m/z between the two spectra. See Watrous et al. [PNAS, 2012, https://www.pnas.org/content/109/26/E1743] for further details.

For example

import numpy as np
from matchms import Spectrum
from matchms.similarity import ModifiedCosine

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={"precursor_mz": 100.0})
spectrum_2 = Spectrum(mz=np.array([104.9, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={"precursor_mz": 105.0})

# Use factory to construct a similarity function
modified_cosine = ModifiedCosine(tolerance=0.2)

score = modified_cosine.pair(spectrum_1, spectrum_2)

print(f"Modified cosine score is {score['score']:.2f} with {score['matches']} matched peaks")

Should output

Modified cosine score is 0.83 with 1 matched peaks
__init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]
Parameters
  • tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.

  • mz_power – The power to raise mz to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.

  • intensity_power – The power to raise intensity to in the cosine function. The default is 1.

matrix(references: List[Spectrum], queries: List[Spectrum], is_symmetric: bool = False) numpy.ndarray

Optional: Provide optimized method to calculate an numpy.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.

Parameters
  • references – List of reference objects

  • queries – List of query objects

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.

pair(reference: Spectrum, query: Spectrum) Tuple[float, int][source]

Calculate modified cosine score between two spectra.

Parameters
  • reference – Single reference spectrum.

  • query – Single query spectrum.

Return type

Tuple with cosine score and number of matched peaks.

sort(scores: numpy.ndarray)

Return array of indexes for sorted list of scores. This method can be adapted for different styles of scores.

Parameters

scores – 1D Array of scores.

Returns

Indexes of sorted scores.

Return type

idx_sorted