matchms.similarity.FlashSimilarity module
- class matchms.similarity.FlashSimilarity.CosineFlash(*args, dtype: dtype = <class 'numpy.float64'>, **kwargs)[source]
Bases:
_BaseFlashSimilarityFlash Cosine similarity following the original Flash Entropy (Li & Fiehn, 2023) with a fast .matrix() that builds a library-wide index over ‘queries’ and streams all ‘references’ through it. This corresponds to the “CosineGreedy” scoring logic but with the same fast Flash path as Flash Entropy.
- Key options:
matching_mode: ‘fragment’, ‘neutral_loss’, or ‘hybrid’ (fragment-priority).
tolerance in Da or symmetric ppm (use_ppm=True).
- cleanup: remove precursor & > (precursor_mz - 1.6), 1% noise removal,
entropy weighting, normalize ∑I’ = 0.5, optional within-peak merge.
- Notes:
.pair() works but is not the fast path. Use .matrix().
For identity-search behavior, pass identity_precursor_tolerance (Da or ppm).
- Parameters:
matching_mode – Matching mode: ‘fragment’, ‘neutral_loss’, or ‘hybrid’ (default is ‘fragment’).
tolerance – Matching tolerance in Da or ppm (use_ppm=True). Default is 0.02.
use_ppm – If True, interpret tolerance as parts-per-million. Default is False.
intensity_power – The power to raise intensity to in the cosine function. The default is 1 (no weighting).
remove_precursor – If True, remove precursor peak and peaks within precursor_window. Default is False.
precursor_window – If remove_precursor is True, remove peaks within this window around the precursor m/z. Default is 1.6 Da (as suggested by Li & Fiehn(2023)).
noise_cutoff – If > 0, remove peaks with intensities below this fraction of the maximum intensity. Default is 0.01 (1%).
normalize_to_half – If True, normalize intensities such that the sum of intensities is 0.5. Default is False.
merge_within – If > 0, merge peaks within this distance (in Da) to a single peak. Default is 0.
identity_precursor_tolerance – If not None, enforce identity search behavior by requiring the precursor m/z of the query to be within this tolerance of the reference precursor m/z.
identity_use_ppm – If True, interpret identity_precursor_tolerance as ppm. Default is False.
dtype – Data type for the output scores. Default is np.float64 which properly accounts for highest resolution MS/MS data (even far beyond current MS/MS possibilties!). To save memory, np.float32 can be used instead, which is sufficient for peak resolutions up to about 8,000,000.
- matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True, n_jobs: int = -1)[source]
Calculate matrix of Flash Cosine scores.
- Parameters:
spectra_1 – First collection of input spectra.
spectra_2 – Second collection of input spectra. If None, compare spectra_1 against itself.
score_fields – Requested score fields. Only
("score",)is supported.progress_bar – When True, show a progress bar.
n_jobs – Number of parallel jobs to run. Default is -1, which means that all available CPUs minus one will be used.
- Returns:
Dense score matrix as a
Scoresobject.- Return type:
- pair(spectrum_1: Spectrum, spectrum_2: Spectrum) ndarray[source]
Calculate the similarity for one pair of spectra.
- Parameters:
spectrum_1 – First spectrum.
spectrum_2 – Second spectrum.
- Returns:
Similarity result for one pair. The returned value should be compatible with
self.score_datatype.- Return type:
score
Examples
- Scalar score:
return np.asarray(score, dtype=self.score_datatype)- Structured score:
return np.asarray((score, matches), dtype=self.score_datatype)
- score_datatype
alias of
float64
- class matchms.similarity.FlashSimilarity.FlashEntropy(*args, normalize_to_half: bool = True, **kwargs)[source]
Bases:
_BaseFlashSimilarityFlash entropy similarity (Li & Fiehn, 2023) with a fast .matrix() that builds a library-wide index over ‘queries’ and streams all ‘references’ through it.
- Key options:
matching_mode: ‘fragment’, ‘neutral_loss’, or ‘hybrid’ (fragment-priority).
tolerance in Da or symmetric ppm (use_ppm=True).
- cleanup: remove precursor & > (precursor_mz - 1.6), 1% noise removal,
entropy weighting, normalize ∑I’ = 0.5, optional within-peak merge.
- Notes:
.pair() works but is not the fast path. Use .matrix().
For identity-search behavior, pass identity_precursor_tolerance (Da or ppm).
- Parameters:
matching_mode – Matching mode: ‘fragment’, ‘neutral_loss’, or ‘hybrid’ (default is ‘fragment’).
tolerance – Matching tolerance in Da or ppm (use_ppm=True). Default is 0.02.
use_ppm – If True, interpret tolerance as parts-per-million. Default is False.
remove_precursor – If True, remove precursor peak and peaks within precursor_window. Default is False.
precursor_window – If remove_precursor is True, remove peaks within this window around the precursor m/z. Default is 1.6 Da (as suggested by Li & Fiehn(2023)).
noise_cutoff – If > 0, remove peaks with intensities below this fraction of the maximum intensity. Default is 0.01 (1%).
normalize_to_half – If True, normalize intensities such that the sum of intensities is 0.5. Default is True.
merge_within – If > 0, merge peaks within this distance (in Da) to a single peak. Default is 0.
identity_precursor_tolerance – If not None, enforce identity search behavior by requiring the precursor m/z of the query to be within this tolerance of the reference precursor m/z.
identity_use_ppm – If True, interpret identity_precursor_tolerance as ppm. Default is False.
dtype – Data type for the output scores. Default is np.float64 which properly accounts for highest resolution MS/MS data (even far beyond current MS/MS possibilties!). To save memory, np.float32 can be used instead, which is sufficient for peak resolutions up to about 8,000,000.
- matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True, n_jobs: int = -1)[source]
Calculate matrix of Flash Entropy scores.
- Parameters:
spectra_1 – First collection of input spectra.
spectra_2 – Second collection of input spectra. If None, compare spectra_1 against itself.
score_fields – Requested score fields. Only
("score",)is supported.progress_bar – When True, show a progress bar.
n_jobs – Number of parallel jobs to run. Default is -1, which means that all available CPUs minus one will be used.
- Returns:
Dense score matrix as a
Scoresobject.- Return type:
- pair(spectrum_1: Spectrum, spectrum_2: Spectrum) ndarray[source]
Compute Flash Entropy for a single (reference, query) pair. Uses the same preprocessing and scoring logic as the matrix path, but builds a tiny 1-spectrum library from the query.
Careful: This is not the fast intended use; better .matrix() instead.
- score_datatype
alias of
float32