matchms.hashing module
Helper functions related to hashing.
- matchms.hashing.compute_combined_hashes(fragment_hashes: list[str] | ndarray, metadata_hashes: list[str]) list[int][source]
Combine fragment and metadata hashes into a single hash value.
- matchms.hashing.metadata_hash(metadata: dict, hash_length: int = 20)[source]
Compute hash from metadata dictionary.
- matchms.hashing.spectra_hashes(fragments: csr_array, bin_to_mz_func, hash_length: int = 20, **kwargs) ndarray[source]
Compute hashes for a collection of spectra stored in a sparse (CSR) matrix.
- Parameters:
fragments (csr_array) – A Scipy sparse matrix in CSR format where each row represents a spectrum and each column represents an m/z bin. Cell values are intensities.
bin_to_mz_func (callable) – A function or method that accepts an array of bin indices (column indices) and returns an array of corresponding m/z values (floats). This is part of SpectraCollection.
hash_length (int, optional) – The desired length of the resulting hash strings. Defaults to 20.
**kwargs – Additional parameters passed to spectrum_hash_arrays, such as mz_precision and intensity_precision.
- Returns:
A NumPy array of type ‘U<hash_length>’ containing the calculated hashes for all rows in the input matrix in their original order.
- Return type:
np.ndarray
- matchms.hashing.spectrum_hash(peaks: Fragments, hash_length: int = 20, mz_precision: int = 5, intensity_precision: int = 2) str[source]
Compute hash from mz-intensity pairs of all peaks in spectrum.
- Parameters:
peaks – The Fragments object containing mz and intensities.
hash_length – The length of the hash to be computed.
mz_precision – The precision of the mz values.
intensity_precision – The precision of the intensities.
- Returns:
The hash of the spectrum.
- Return type:
- matchms.hashing.spectrum_hash_arrays(mz: ndarray, intensities: ndarray, hash_length: int = 20, mz_precision: int = 5, intensity_precision: int = 2) str[source]
Compute hash from mz-intensity pairs of all peaks in spectrum.
- Parameters:
mz – mz values as ndarray.
intensities – intensities as ndarray.
hash_length – The length of the hash to be computed.
mz_precision – The precision of the mz values.
intensity_precision – The precision of the intensities.
- Returns:
The hash of the spectrum.
- Return type: