matchms.similarity.CosineHungarian module

class matchms.similarity.CosineHungarian.CosineHungarian(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]

Bases: BaseSimilarity

Calculate ‘cosine similarity score’ between two spectra using the Hungarian algorithm.

The cosine score quantifies the similarity between two mass spectra by finding the optimal one-to-one matching between their peaks. Two peaks are considered a potential match if their m/z ratios lie within the given tolerance.

The peak assignment is solved using the Hungarian algorithm (scipy.optimize.linear_sum_assignment), which finds the assignment that maximises the sum of intensity products. This is mathematically optimal but can be notably slower than the greedy heuristic in CosineGreedy.

__init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]
Parameters:
  • tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.

  • mz_power – The power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.

  • intensity_power – The power to raise intensity to in the cosine function. The default is 1.

keep_score(score)

In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.

matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False, progress_bar: bool = True) ndarray

Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectra. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.

Parameters:
  • references – List of reference objects

  • queries – List of query objects

  • array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.

  • progress_bar – When True a progress bar is shown. Default is True.

pair(reference: Spectrum, query: Spectrum) Tuple[float, int][source]

Calculate cosine score between two spectra.

Parameters:
  • reference – Single reference spectrum.

  • query – Single query spectrum.

Return type:

Tuple with cosine score and number of matched peaks.

sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False, progress_bar: bool = True)

Optional: Provide optimized method to calculate an sparse matrix of similarity scores.

Compute similarity scores for pairs of reference and query spectra as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.

Parameters:
  • references – List of reference objects

  • queries – List of query objects

  • idx_row – List/array of row indices

  • idx_col – List/array of column indices

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.

  • progress_bar – When True a progress bar is shown. Default is True.

to_dict() dict

Return a dictionary representation of a similarity function.