matchms.similarity.cosine_linear_functions module

matchms.similarity.cosine_linear_functions.linear_cosine_score(spec1, spec2, tolerance, mz_power, intensity_power)[source]

Compute the CosineLinear similarity between two well-separated spectra.

Both spectra must have consecutive m/z gaps > 2 * tolerance (as ensured by sirius_merge_close_peaks). Uses an O(n+m) two-pointer sweep.

Parameters:
  • spec1 – 2D array (N, 2) with columns [mz, intensity], sorted ascending m/z.

  • spec2 – 2D array (M, 2) with columns [mz, intensity], sorted ascending m/z.

  • tolerance – Maximum allowed difference between m/z values for a match.

  • mz_power – Power to raise m/z values to.

  • intensity_power – Power to raise intensity values to.

Returns:

  • score (float) – Cosine similarity score.

  • matches (int) – Number of matched peak pairs.

matchms.similarity.cosine_linear_functions.sirius_merge_close_peaks(spec, mz_tolerance)[source]

Merge close peaks following the Sirius/BOECKER lab algorithm.

Peaks are merged greedily in descending intensity order. Each unconsumed peak adopts its own m/z and sums the intensities of all unconsumed neighbors within a merge window of 2 * mz_tolerance. The result is guaranteed to have consecutive m/z gaps > 2 * mz_tolerance. When multiple peaks share the same intensity, the lower-m/z peak is processed first so the representative peak stays deterministic across NumPy and Numba sort implementations.

Parameters:
  • spec – 2D array (N, 2) with columns [mz, intensity], sorted by ascending m/z.

  • mz_tolerance – Tolerance for scoring. Merge window is 2 * mz_tolerance.

Returns:

(M, 2) array of merged peaks sorted by ascending m/z.

Return type:

numpy.ndarray