matchms.similarity.ModifiedCosineGreedy module

class matchms.similarity.ModifiedCosineGreedy.ModifiedCosineGreedy(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0, noise_cutoff: float = 0.01)[source]

Bases: BaseSimilarityWithSparse

Calculate an approximate modified cosine score between mass spectra.

This implementation solves the peak assignment in a greedy way and is therefore an approximation. See ModifiedCosineHungarian for the exact assignment variant.

The modified cosine score aims at quantifying the similarity between two mass spectra. Two peaks are considered a potential match if their m/z ratios lie within the given tolerance, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is the difference in precursor m/z between the two spectra.

See Watrous et al. [PNAS, 2012, https://www.pnas.org/content/109/26/E1743] for further details.

Unlike in matchms < 1.0, this method also applies a noise filter by default, which removes peaks with intensity below a certain cutoff. This is typically highly beneficial for the performance of the greedy algorithm, and for most applications the results are very similar to the exact assignment variant. If you want to disable this noise filtering, you can set noise_cutoff to 0 or None.

__init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0, noise_cutoff: float = 0.01)[source]

Initialize approximate modified cosine.

Parameters:

tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise mz to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
noise_cutoff – Minimum relative intensity for a peak to be considered. Default is 0.01.

property is_structured_score: bool: Return True if this similarity uses a structured score dtype.

keep_score(score) → bool

Return whether a score should be retained in sparse outputs.

This defines the default sparse retention behavior. Users can override it per call via score_filter=....

Default behavior: - scalar score: keep if score != 0 - structured score: keep if all fields are non-zero

matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True)

Calculate a dense similarity matrix.

Parameters:

spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare spectra_1 against itself. For commutative similarities this automatically uses a symmetric optimization.
score_fields – Score fields to return. - None means return all available fields. - For scalar scores, only ("score",) is valid. - For structured scores, this can be a subset such as ("score",).
progress_bar – When True, show a progress bar. Default is True.

Returns:

Dense score result wrapped in a Scores container.

Return type:

Scores

pair(spectrum_1: Spectrum, spectrum_2: Spectrum) → tuple[float, int][source]: Calculate approximate modified cosine score between two spectra.

sparse_matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, idx_row: ArrayLike | None = None, idx_col: ArrayLike | None = None, score_fields: Sequence[str] | None = None, score_filter: Callable[[ndarray], bool] | None = None, progress_bar: bool = True)

Calculate sparse similarity results.

Filtering is applied to the full score before score field projection.

Parameters:

spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare spectra_1 against itself.
idx_row – Row indices of pairs to compute. If None and idx_col is also None, all pairwise comparisons are considered and only retained scores are stored.
idx_col – Column indices of pairs to compute. Must have the same shape as idx_row.
score_fields – Score fields to return. - None means return all available fields. - For scalar scores, only ("score",) is valid. - For structured scores, this can be a subset such as ("score",).
score_filter – Optional callable receiving the full score and returning whether it should be retained. If None, keep_score() is used.
progress_bar – When True, show a progress bar.

Returns:

Sparse score result wrapped in a Scores container.

Return type:

Scores

to_dict() → dict: Return a dictionary representation of the similarity function.