matchms.hashing module

Helper functions related to hashing.

matchms.hashing.compute_combined_hashes(fragment_hashes: list[str] | ndarray, metadata_hashes: list[str]) list[int][source]

Combine fragment and metadata hashes into a single hash value.

matchms.hashing.metadata_hash(metadata: dict, hash_length: int = 20)[source]

Compute hash from metadata dictionary.

matchms.hashing.spectra_hashes(fragments: csr_array, bin_to_mz_func, hash_length: int = 20, **kwargs) ndarray[source]

Compute hashes for a collection of spectra stored in a sparse (CSR) matrix.

Parameters:
  • fragments (csr_array) – A Scipy sparse matrix in CSR format where each row represents a spectrum and each column represents an m/z bin. Cell values are intensities.

  • bin_to_mz_func (callable) – A function or method that accepts an array of bin indices (column indices) and returns an array of corresponding m/z values (floats). This is part of SpectraCollection.

  • hash_length (int, optional) – The desired length of the resulting hash strings. Defaults to 20.

  • **kwargs – Additional parameters passed to spectrum_hash_arrays, such as mz_precision and intensity_precision.

Returns:

A NumPy array of type ‘U<hash_length>’ containing the calculated hashes for all rows in the input matrix in their original order.

Return type:

np.ndarray

matchms.hashing.spectrum_hash(peaks: Fragments, hash_length: int = 20, mz_precision: int = 5, intensity_precision: int = 2) str[source]

Compute hash from mz-intensity pairs of all peaks in spectrum.

Parameters:
  • peaks – The Fragments object containing mz and intensities.

  • hash_length – The length of the hash to be computed.

  • mz_precision – The precision of the mz values.

  • intensity_precision – The precision of the intensities.

Returns:

The hash of the spectrum.

Return type:

str

matchms.hashing.spectrum_hash_arrays(mz: ndarray, intensities: ndarray, hash_length: int = 20, mz_precision: int = 5, intensity_precision: int = 2) str[source]

Compute hash from mz-intensity pairs of all peaks in spectrum.

Parameters:
  • mz – mz values as ndarray.

  • intensities – intensities as ndarray.

  • hash_length – The length of the hash to be computed.

  • mz_precision – The precision of the mz values.

  • intensity_precision – The precision of the intensities.

Returns:

The hash of the spectrum.

Return type:

str