matchms.similarity.ModifiedCosineGreedy module

class matchms.similarity.ModifiedCosineGreedy.ModifiedCosineGreedy(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]

Bases: BaseSimilarity

Calculate an approximate modified cosine score between mass spectra.

This implementation solves the peak assignment in a greedy way and is therefore an approximation. See ModifiedCosineHungarian for the exact assignment variant.

The modified cosine score aims at quantifying the similarity between two mass spectra. Two peaks are considered a potential match if their m/z ratios lie within the given tolerance, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is the difference in precursor m/z between the two spectra.

See Watrous et al. [PNAS, 2012, https://www.pnas.org/content/109/26/E1743] for further details.

__init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]

Initialize approximate modified cosine.

Parameters:

tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise mz to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.

keep_score(score): In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.

matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False, progress_bar: bool = True) → ndarray

Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectra. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.

Parameters:

references – List of reference objects
queries – List of query objects
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
progress_bar – When True a progress bar is shown. Default is True.

pair(reference: Spectrum, query: Spectrum) → Tuple[float, int][source]: Calculate approximate modified cosine score between two spectra.

sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False, progress_bar: bool = True)

Optional: Provide optimized method to calculate an sparse matrix of similarity scores.

Compute similarity scores for pairs of reference and query spectra as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.

Parameters:

references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
progress_bar – When True a progress bar is shown. Default is True.

to_dict() → dict: Return a dictionary representation of a similarity function.