matchms
  • API reference
    • Fingerprints
    • Fragments
    • Metadata
    • MetadataCollection
    • Pipeline
    • Scores
    • SpectraCollection
    • SpectraCollectionProcessor
    • Spectrum
    • SpectrumProcessor
    • calculate_scores()
    • set_matchms_logger_level()
    • Subpackages
      • matchms.exporting package
      • matchms.filtering package
      • matchms.importing package
      • matchms.networking package
      • matchms.plotting package
      • matchms.reference_spectra package
      • matchms.similarity package
        • Functions for computing spectra similarities
        • BinnedEmbeddingSimilarity
        • Cosine
        • CosineBlink
        • CosineFlash
        • CosineGreedy
        • CosineHungarian
        • CosineLinear
        • FingerprintSimilarity
        • FlashEntropy
        • MetadataMatch
        • ModifiedCosine
        • ModifiedCosineGreedy
        • ModifiedCosineHungarian
        • NeutralLossesCosine
        • ParentMassMatch
        • PrecursorMzMatch
        • Submodules
    • Submodules
matchms
  • matchms package
  • matchms.similarity package
  • matchms.similarity.CosineBlink module
  • View page source

matchms.similarity.CosineBlink module

class matchms.similarity.CosineBlink.CosineBlink(tolerance: float = 0.01, bin_width: float = 0.001, mz_power: float = 0.0, intensity_power: float = 1.0, clip_to_one: bool = True, use_numba: bool = True, prefilter: bool = True, min_relative_intensity: float = 0.01, crop_above_precursor: bool = True, remove_zero_intensities: bool = True, top_k: int | None = None, batch_size: int = 1024, sparse_score_min: float = 0.0)[source]

Bases: BaseSimilarity

BLINK-style approximate cosine similarity for mass spectra with fast .pair() and .matrix(). This score is implemented based on the method BLINK, proposed by Harwood et al. (2023, https://www.nature.com/articles/s41598-023-40496-9).

  • Integer binning with bin_width (Da); tolerance window is ± floor(tolerance/bin_width) bins.

  • Per-spectrum L2 normalization (after optional mz/intensity weighting).

  • Blur only one side (spectra_2 in .matrix(), smaller spectrum in .pair()).

Parameters:
  • tolerance – True m/z tolerance (Da). Peaks within +/- tolerance are considered matches. Default 0.01.

  • bin_width – Discretization width (Da). Default 0.001 (1 mDa). Effective radius R=floor(tolerance/bin_width).

  • mz_power – Power for mz weighting (intensity *= mz**mz_power). Default 0.0.

  • intensity_power – Power for intensity weighting before normalization. Default 1.0 (set 0.5 for sqrt scaling).

  • clip_to_one – Clip score to [0,1]. Default True.

  • use_numba (bool) – Use numba-accelerated pairwise kernel when available. Default True.

  • prefilter (bool) – Apply BLINK-like pre-filtering (remove <1% base peak, > precursor m/z, zeros). Default True.

  • min_relative_intensity (float) – Relative base-peak threshold for prefilter. Default 0.01 (1%).

  • crop_above_precursor (bool) – Drop fragments > precursor m/z if available in metadata. Default True.

  • remove_zero_intensities (bool) – Remove peaks with intensity <= 0. Default True.

  • top_k (Optional[int]) – Keep only top-K most intense fragments after other filters (per spectrum). Default None.

  • path) (# Batching (matrix)

  • batch_size (int) – Number of query spectra per batch in .matrix(). Default 1024.

  • sparse_score_min (float) – When array_type=’sparse’, drop scores < sparse_score_min. Default 0.0.

__init__(tolerance: float = 0.01, bin_width: float = 0.001, mz_power: float = 0.0, intensity_power: float = 1.0, clip_to_one: bool = True, use_numba: bool = True, prefilter: bool = True, min_relative_intensity: float = 0.01, crop_above_precursor: bool = True, remove_zero_intensities: bool = True, top_k: int | None = None, batch_size: int = 1024, sparse_score_min: float = 0.0)[source]
property is_structured_score: bool

Return True if this similarity uses a structured score dtype.

matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True) → Scores[source]

All-vs-all BLINK-style cosine scores.

Implementation: - Build a global dense bin axis in integer bins from min to max across refs+queries

(rows ~ (max_bin - min_bin + 1)), which keeps matrices sparse.

  • Build a CSR intensity matrix for refs (rows=bins, cols=ref spectra) after per-spectrum L2 normalization.

  • For spectra_2, build per-batch blurred CSR by expanding each nonzero to its ±R neighbors.

  • Multiply: scores_batch = (I_ref.T @ I_qry_blur), accumulate into the final output.

Parameters:
  • spectra_1 – List of input spectra.

  • spectra_2 – List of input spectra.

  • score_fields – Requested score fields.

Returns:

Dense Scores object.

Return type:

Scores

pair(spectrum_1: Spectrum, spectrum_2: Spectrum) → tuple[float, int][source]

Calculate BLINK-style cosine between two spectra.

Parameters:
  • spectrum_1 – Single reference spectrum.

  • spectrum_2 – Single query spectrum.

score_datatype

alias of float32

sparse_matrix(spectra_1, spectra_2=None, idx_row=None, idx_col=None, score_fields=None, score_filter=None, progress_bar: bool = True)

Sparse score computation is not available for this similarity.

to_dict() → dict

Return a dictionary representation of the similarity function.

Previous Next

© Copyright 2023, Düsseldorf University of Applied Sciences & Netherlands eScience Center.

Built with Sphinx using a theme provided by Read the Docs.