matchms.similarity.CosineHungarian module
- class matchms.similarity.CosineHungarian.CosineHungarian(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]
Bases:
BaseSimilarityWithSparseCalculate ‘cosine similarity score’ between two spectra using the Hungarian algorithm.
The cosine score quantifies the similarity between two mass spectra by finding the optimal one-to-one matching between their peaks. Two peaks are considered a potential match if their m/z ratios lie within the given tolerance.
The peak assignment is solved using the Hungarian algorithm (
scipy.optimize.linear_sum_assignment), which finds the assignment that maximises the sum of intensity products. This is mathematically optimal but can be notably slower than the greedy heuristic inCosineGreedy.- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]
- Parameters:
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
- keep_score(score) bool
Return whether a score should be retained in sparse outputs.
This defines the default sparse retention behavior. Users can override it per call via
score_filter=....Default behavior: - scalar score: keep if
score != 0- structured score: keep if all fields are non-zero
- matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True)
Calculate a dense similarity matrix.
- Parameters:
spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare
spectra_1against itself. For commutative similarities this automatically uses a symmetric optimization.score_fields – Score fields to return. -
Nonemeans return all available fields. - For scalar scores, only("score",)is valid. - For structured scores, this can be a subset such as("score",).progress_bar – When True, show a progress bar. Default is True.
- Returns:
Dense score result wrapped in a
Scorescontainer.- Return type:
- pair(spectrum_1: Spectrum, spectrum_2: Spectrum) tuple[float, int][source]
Calculate cosine score between two spectra.
- Parameters:
spectrum_1 – Single spectrum.
spectrum_2 – Single spectrum.
- Return type:
Tuple with cosine score and number of matched peaks.
- sparse_matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, idx_row: ArrayLike | None = None, idx_col: ArrayLike | None = None, score_fields: Sequence[str] | None = None, score_filter: Callable[[ndarray], bool] | None = None, progress_bar: bool = True)
Calculate sparse similarity results.
Filtering is applied to the full score before score field projection.
- Parameters:
spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare
spectra_1against itself.idx_row – Row indices of pairs to compute. If None and
idx_colis also None, all pairwise comparisons are considered and only retained scores are stored.idx_col – Column indices of pairs to compute. Must have the same shape as
idx_row.score_fields – Score fields to return. -
Nonemeans return all available fields. - For scalar scores, only("score",)is valid. - For structured scores, this can be a subset such as("score",).score_filter – Optional callable receiving the full score and returning whether it should be retained. If None,
keep_score()is used.progress_bar – When True, show a progress bar.
- Returns:
Sparse score result wrapped in a
Scorescontainer.- Return type: