matchms.similarity.MetadataMatch module
- class matchms.similarity.MetadataMatch.MetadataMatch(field: str, matching_type: str = 'equal_match', tolerance: float = 0.1, tolerance_type: str = 'Dalton')[source]
Bases:
BaseSimilarityWithSparseReturn True if metadata entries of a specified field match between two spectra.
This is supposed to be used to compare a wide range of possible metadata entries and use this to later select related or similar spectra.
Matching can be done by:
exact equality (
matching_type="equal_match")numerical difference within a tolerance (
matching_type="difference")
For numerical differences, the tolerance can be interpreted as:
absolute difference in Dalton / raw units (
tolerance_type="Dalton")relative difference in ppm (
tolerance_type="ppm")
Example to calculate scores between 2 pairs of spectra and inspect the score matrix
import numpy as np from matchms import Spectrum from matchms.similarity import MetadataMatch spectrum_1 = Spectrum( mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "orbitrap", "id": 1}, ) spectrum_2 = Spectrum( mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "qtof", "id": 2}, ) spectrum_3 = Spectrum( mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "qtof", "id": 3}, ) spectrum_4 = Spectrum( mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "orbitrap", "id": 4}, ) spectra_1 = [spectrum_1, spectrum_2] spectra_2 = [spectrum_3, spectrum_4] similarity = MetadataMatch(field="instrument_type") scores = similarity.matrix(spectra_1, spectra_2) score_array = scores.to_array() for i, spectrum_1 in enumerate(spectra_1): for j, spectrum_2 in enumerate(spectra_2): print( f"Metadata match between {spectrum_1.get('id')} and " f"{spectrum_2.get('id')} is {bool(score_array[i, j])}" )
Should output
Metadata match between 1 and 3 is False Metadata match between 1 and 4 is True Metadata match between 2 and 3 is True Metadata match between 2 and 4 is False
- __init__(field: str, matching_type: str = 'equal_match', tolerance: float = 0.1, tolerance_type: str = 'Dalton')[source]
- Parameters:
field – Specify field name for metadata that should be compared.
matching_type – Specify how field entries should be matched. Can be one of
["equal_match", "difference"]."equal_match": entries must be exactly equal (default)."difference": entries are considered a match if their numerical difference is less than or equal totolerance.tolerance – Specify tolerance below which two values are counted as match. This only applies to numerical values.
tolerance_type – Choose between fixed tolerance in Dalton / raw units (
"Dalton") or a relative difference in ppm ("ppm"). This only applies whenmatching_type="difference".
- keep_score(score) bool
Return whether a score should be retained in sparse outputs.
This defines the default sparse retention behavior. Users can override it per call via
score_filter=....Default behavior: - scalar score: keep if
score != 0- structured score: keep if all fields are non-zero
- matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True) Scores[source]
Compare metadata entries between all spectra in spectra_1 and spectra_2.
- Parameters:
spectra_1 – First collection of input spectra.
spectra_2 – Second collection of input spectra. If None, compare spectra_1 against itself.
score_fields – Requested score fields. Only
("score",)is supported.progress_bar – Included for API compatibility. Not used here because this optimized implementation does not iterate pairwise in Python.
- pair(spectrum_1: Spectrum, spectrum_2: Spectrum)[source]
Compare metadata entries between two spectra.
- Parameters:
spectrum_1 – First spectrum.
spectrum_2 – Second spectrum.
- sparse_matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, idx_row=None, idx_col=None, score_fields: Sequence[str] | None = None, score_filter: Callable[[ndarray], bool] | None = None, progress_bar: bool = True) Scores[source]
Compare metadata entries and return sparse scores.
This method uses optimized metadata matching when no explicit indices are provided. If explicit idx_row and idx_col are given, it falls back to the generic sparse implementation from BaseSimilarityWithSparse.