matchms.similarity.ModifiedCosineGreedy module
- class matchms.similarity.ModifiedCosineGreedy.ModifiedCosineGreedy(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0, noise_cutoff: float = 0.01)[source]
Bases:
BaseSimilarityWithSparseCalculate an approximate modified cosine score between mass spectra.
This implementation solves the peak assignment in a greedy way and is therefore an approximation. See
ModifiedCosineHungarianfor the exact assignment variant.The modified cosine score aims at quantifying the similarity between two mass spectra. Two peaks are considered a potential match if their m/z ratios lie within the given
tolerance, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is the difference in precursor m/z between the two spectra.See Watrous et al. [PNAS, 2012, https://www.pnas.org/content/109/26/E1743] for further details.
Unlike in matchms < 1.0, this method also applies a noise filter by default, which removes peaks with intensity below a certain cutoff. This is typically highly beneficial for the performance of the greedy algorithm, and for most applications the results are very similar to the exact assignment variant. If you want to disable this noise filtering, you can set
noise_cutoffto 0 or None.- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0, noise_cutoff: float = 0.01)[source]
Initialize approximate modified cosine.
- Parameters:
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise mz to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
noise_cutoff – Minimum relative intensity for a peak to be considered. Default is 0.01.
- keep_score(score) bool
Return whether a score should be retained in sparse outputs.
This defines the default sparse retention behavior. Users can override it per call via
score_filter=....Default behavior: - scalar score: keep if
score != 0- structured score: keep if all fields are non-zero
- matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True)
Calculate a dense similarity matrix.
- Parameters:
spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare
spectra_1against itself. For commutative similarities this automatically uses a symmetric optimization.score_fields – Score fields to return. -
Nonemeans return all available fields. - For scalar scores, only("score",)is valid. - For structured scores, this can be a subset such as("score",).progress_bar – When True, show a progress bar. Default is True.
- Returns:
Dense score result wrapped in a
Scorescontainer.- Return type:
- pair(spectrum_1: Spectrum, spectrum_2: Spectrum) tuple[float, int][source]
Calculate approximate modified cosine score between two spectra.
- sparse_matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, idx_row: ArrayLike | None = None, idx_col: ArrayLike | None = None, score_fields: Sequence[str] | None = None, score_filter: Callable[[ndarray], bool] | None = None, progress_bar: bool = True)
Calculate sparse similarity results.
Filtering is applied to the full score before score field projection.
- Parameters:
spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare
spectra_1against itself.idx_row – Row indices of pairs to compute. If None and
idx_colis also None, all pairwise comparisons are considered and only retained scores are stored.idx_col – Column indices of pairs to compute. Must have the same shape as
idx_row.score_fields – Score fields to return. -
Nonemeans return all available fields. - For scalar scores, only("score",)is valid. - For structured scores, this can be a subset such as("score",).score_filter – Optional callable receiving the full score and returning whether it should be retained. If None,
keep_score()is used.progress_bar – When True, show a progress bar.
- Returns:
Sparse score result wrapped in a
Scorescontainer.- Return type: