matchms.similarity.CosineLinear module

class matchms.similarity.CosineLinear.CosineLinear(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]

Bases: BaseSimilarity

Calculate ‘linear cosine similarity score’ between two spectra.

This implements the CosineLinear similarity from SIRIUS (BOECKER lab), which achieves O(n+m) time complexity by requiring spectra to be “well-separated” (consecutive peaks more than 2x tolerance apart). A preprocessing step (sirius_merge_close_peaks) enforces this invariant by greedily merging close peaks in descending intensity order.

For example

import numpy as np
from matchms import Spectrum
from matchms.similarity import CosineLinear

reference = Spectrum(mz=np.array([100, 150, 200.]),
                     intensities=np.array([0.7, 0.2, 0.1]))
query = Spectrum(mz=np.array([100, 140, 190.]),
                 intensities=np.array([0.4, 0.2, 0.1]))

cosine_linear = CosineLinear(tolerance=0.2)
score = cosine_linear.pair(reference, query)

print(f"CosineLinear score is {score['score']:.2f} with {score['matches']} matched peaks")

Should output

CosineLinear score is 0.83 with 1 matched peaks

__init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]

Parameters:

tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1. Peaks closer than 2 * tolerance are merged before scoring.
mz_power – The power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.

keep_score(score): In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.

matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False, progress_bar: bool = True) → ndarray[source]

Optimized matrix computation that precomputes merged spectra.

Each spectrum is merged once (N+M calls to sirius_merge_close_peaks) instead of 2*N*M times in the naive double-loop approach.

pair(reference: Spectrum, query: Spectrum) → ndarray[source]

Calculate linear cosine score between two spectra.

Parameters:

reference – Single reference spectrum.
query – Single query spectrum.

Returns:

Tuple with cosine score and number of matched peaks.

Return type:

Score

sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False, progress_bar: bool = True)

Optional: Provide optimized method to calculate an sparse matrix of similarity scores.

Compute similarity scores for pairs of reference and query spectra as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.

Parameters:

references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
progress_bar – When True a progress bar is shown. Default is True.

to_dict() → dict: Return a dictionary representation of a similarity function.