matchms.filtering package

Processing (or: filtering) mass spectra

Provided functions will usually only perform a single action to a spectrum. This can be changes or corrections of metadata, or peak filtering. More complicated processing pipelines can be build by stacking several of the provided filters.

Because there are numerous filter functions in matchms and because they often need to be applied in a specific order, the most feasible workflow for users is to use the SpectrumProcessor class to define a spetrum processing pipeline. Here is an example:

import numpy as np
from matchms import Spectrum
from matchms import SpectrumProcessor

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})

# Users can pick a predefined pipeline from default pipelines, or specify a list of filters
processing = SpectrumProcessor(["normalize_intensities"])

# Run the processing pipeline:
spectrum_filtered = processing.process_spectrum(spectrum)
max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00

It is also possible to run each filter function individually. This for instance makes sense if users want to develop a highly customized spectrum processing routine. Example of how to use a single filter function:

import numpy as np
from matchms import Spectrum
from matchms.filtering import normalize_intensities

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})
spectrum_filtered = normalize_intensities(spectrum)

max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00
matchms filtering sketch

Sketch of matchms spectrum processing.

class matchms.filtering.SpeciesString(dirty: str)[source]

Bases: object

A class to process and clean different types of chemical structure strings including InChI, InChIKey, and SMILES.

The class takes a raw input string, determines the intended structure type, and then cleans the string based on its type.

dirty

Raw input string representing a chemical structure.

Type:

str

target

The intended structure type determined from the input string. Could be ‘inchi’, ‘inchikey’, ‘smiles’, or None if no valid type was identified.

Type:

str

cleaned

The cleaned structure string.

Type:

str

__init__(dirty: str)[source]

Constructs a new instance of the SpeciesString class.

Parameters:

dirty (str) – The raw input string representing a chemical structure.

clean()[source]

Clean the input string based on its determined structure type.

clean_as_inchi()[source]

Search for valid inchi and harmonize it.

clean_as_inchikey()[source]

Search for valid inchikey and harmonize it.

clean_as_smiles()[source]

Search for valid smiles and harmonize it.

guess_target()[source]

Determine the intended structure type of the input string.

looks_like_a_smiles()[source]

Return True if string is made of allowed charcters for smiles.

looks_like_an_inchi()[source]

Search for first piece of InChI.

looks_like_an_inchikey()[source]

Return True if string has format of inchikey.

matchms.filtering.add_compound_name(spectrum_in, *, clone: bool | None = True) dict

Add compound name to the compound_name metadata field.

If compound_name is missing, this filter tries to copy the value from name first and then from title.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added compound_name metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.add_parent_mass(spectrum_in, estimate_from_adduct: bool = True, overwrite_existing_entry: bool = False, estimate_from_charge: bool = True, *, clone: bool | None = True) dict

Add estimated parent mass to metadata if not present yet.

Method to calculate the parent mass from given precursor m/z together with charge and/or adduct. Will take precursor m/z from precursor_mz as provided by running add_precursor_mz.

For estimate_from_adduct=True this function estimates the parent mass based on the mass and charge of known adducts. The table of known adduct properties can be found in matchms/data/known_adducts_table.csv.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • estimate_from_adduct – When set to True, use adduct to estimate actual molecular mass (parent_mass). Switches back to charge-based estimate if adduct does not match a known adduct. Default is True.

  • overwrite_existing_entry – If False, an existing parent-mass entry is kept. If True, a newly computed value will replace existing ones. Default is False.

  • estimate_from_charge – If True, charge will be used to estimate the parent mass when adduct information is insufficient. Adducts of the form [M+H]+, [M+H]2+, [M-H]- etc. are assumed. Default is True.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added or updated parent_mass metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.add_precursor_formula(spectrum_in, *, clone: bool | None = True) dict

Derive and set precursor_formula from neutral formula and adduct.

Requirements

  • Input metadata must contain formula and adduct.

  • formula must be a simple concatenation of element symbols and counts, without parentheses, hydrates, or isotope notation.

param spectrum_in:

Input spectrum or spectra collection.

param clone:

Optionally clone the input before applying the filter. If False, the input object may be modified in place.

returns:

Input object with added precursor_formula metadata, or None if the input was None.

rtype:

Spectrum, SpectraCollection, or None

matchms.filtering.add_precursor_mz(spectrum_in, *, clone: bool | None = True) dict

Add precursor_mz to correct field and make it a float.

For missing precursor_mz field: check if there is a pepmass entry instead. For strings parsed as precursor m/z, convert to float.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added precursor m/z metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.add_retention_index(spectrum_in, *, clone: bool | None = True) dict

Add retention index information to the retention_index key as float.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with harmonized retention index metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.add_retention_time(spectrum_in, *, clone: bool | None = True) dict

Add retention time information to the retention_time key as float.

Negative values and values that cannot be converted to float result in no update for retention_time.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with harmonized retention time metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.clean_adduct(spectrum_in, *, clone: bool | None = True) dict

Clean adduct and make it consistent in style.

Will transform adduct strings of type M+H+ to [M+H]+.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with cleaned adduct metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.clean_compound_name(spectrum_in, *, clone: bool | None = True) dict

Clean compound name.

A list of frequently seen name additions that do not belong to the compound name will be removed.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with cleaned compound_name metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.correct_charge(spectrum_in, *, clone: bool | None = True) dict

Correct charge values based on given ionmode.

For some spectra, the charge value is either undefined or inconsistent with its ionmode, which is corrected by this filter.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with corrected charge metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.default_filters(spectrum: Spectrum) Spectrum[source]

Collection of filters that are considered default and that do no require any (factory) arguments.

Collection is

  1. make_charge_int()

  2. add_compound_name()

  3. derive_adduct_from_name()

  4. derive_formula_from_name()

  5. clean_compound_name()

  6. interpret_pepmass()

  7. add_precursor_mz()

  8. derive_ionmode()

  9. correct_charge()

matchms.filtering.derive_adduct_from_name(spectrum_in, remove_adduct_from_name: bool = True, *, clone: bool | None = True) dict

Find adduct in compound name and add it to metadata if not present yet.

Method to interpret the given compound name to find the adduct.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • remove_adduct_from_name – Remove found adducts from compound name if set to True. Default is True.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added adduct metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.derive_annotation_from_compound_name(spectrum_in, annotated_compound_names_file: str | None = None, mass_tolerance: float = 0.1, *, clone: bool | None = True) dict

Add molecular annotations based on compound name by searching PubChem.

This filter adds smiles, inchi, and/or inchikey metadata based on a PubChem compound-name lookup. SMILES lookup is not supported directly by pubchempy anymore, see https://github.com/matchms/matchms/issues/823. SMILES can alternatively be derived from InChI by running derive_smiles_from_inchi.

The filter is only run if there is not yet a valid SMILES or InChI entry in the metadata. The annotation is only added if the PubChem result has a monoisotopic mass close enough to the spectrum’s parent_mass.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • annotated_compound_names_file

    Optional CSV file used as a persistent cache. Any compound name searched on PubChem will be added to this file. If a compound name is already present in the file, the cached annotation is used instead of querying PubChem again.

    The CSV file should contain the columns compound_name, smiles, inchi, inchikey, and monoisotopic_mass.

  • mass_tolerance – Acceptable mass difference between query compound and PubChem result. Default is 0.1.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added annotation metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.derive_formula_from_name(spectrum_in, remove_formula_from_name: bool = True, *, clone: bool | None = True) dict

Detect and remove misplaced formula in compound name and add to metadata.

Method to find misplaced formulas in compound name based on regular expression. This will not chemically test the detected formula, so the search is limited to frequently occurring types of shape C47H83N1O8P1.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • remove_formula_from_name – Remove found formula from compound name if set to True. Default is True.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added formula metadata and optionally cleaned compound_name, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.derive_formula_from_smiles(spectrum_in, overwrite: bool = True, *, clone: bool | None = True) dict

Add molecular formula metadata derived from SMILES.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • overwrite – If True, an existing formula entry will be replaced when the formula derived from SMILES differs from the current value. If False, an existing formula entry will be kept unchanged. Default is True.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with updated formula metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.derive_inchi_from_smiles(spectrum_in, *, clone: bool | None = True) dict

Find missing InChI and derive from smiles where possible.

matchms.filtering.derive_inchikey_from_inchi(spectrum_in, *, clone: bool | None = True) dict

Find missing InChIKey and derive from InChI where possible.

matchms.filtering.derive_ionmode(spectrum_in, *, clone: bool | None = True) dict

Derive missing ionmode based on charge and/or adduct.

Some input formats, for example MGF files, do not always provide a correct ionmode. This filter reads charge and adduct metadata and uses them to fill in the ionmode where missing.

matchms.filtering.derive_smiles_from_inchi(spectrum_in, *, clone: bool | None = True) dict

Find missing smiles and derive from InChI where possible.

matchms.filtering.harmonize_missing_entries(spectrum_in, keys: str | Iterable[str] | None = None, undefined=None, aliases: Iterable | None = None, *, clone: bool | None = True) dict

Replace aliases for missing metadata entries.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • keys – Metadata key or keys to harmonize. If None, all existing metadata keys are harmonized.

  • undefined – Replacement value for missing entries. Default is None.

  • aliases – Values that should be interpreted as missing. If None, ALIASES_FOR_NONE is used.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with harmonized missing metadata entries, or None if input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.harmonize_undefined_inchi(spectrum_in: Spectrum, undefined: str = '', aliases: list[str] = None, clone: bool | None = True) Spectrum | None[source]

Replace all aliases for empty/undefined inchi entries by value of undefined argument.

matchms.filtering.harmonize_undefined_inchikey(spectrum_in: Spectrum, undefined: str = '', aliases: list[str] = None, clone: bool | None = True) Spectrum | None[source]

Replace all aliases for empty/undefined inchikey entries by undefined.

matchms.filtering.harmonize_undefined_smiles(spectrum_in: Spectrum, undefined: str = '', aliases: list[str] = None, clone: bool | None = True) Spectrum | None[source]

Replace all aliases for empty/undefined smiles entries by undefined.

matchms.filtering.interpret_pepmass(spectrum_in, clone: bool | None = True) Spectrum | None

Reads pepmass field, if present, and adds values to correct fields.

The field pepmass or PEPMASS is often used to describe the precursor ion. This function interprets the values as (mz, intensity, charge) and stores them in precursor_mz, precursor_intensity, and charge.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with interpreted pepmass metadata, or None if not present.

Return type:

Spectrum or None

matchms.filtering.make_charge_int(spectrum_in, *, clone: bool | None = True) dict

Convert charge field to integer if possible.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with converted charge metadata, or None if the input was None.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.normalize_intensities(spectrum_in: Spectrum, clone: bool | None = True, scale_to_max: float = 1.0) Spectrum | None

Normalize peak intensities relative to the maximum peak intensity.

Intensities are divided by the maximum intensity of the spectrum and then multiplied by scale_to_max. By default, this normalizes spectra to unit height, i.e. the most intense peak receives intensity 1.0.

Peaks with zero intensity are removed. Negative peak intensities are not allowed and raise a ValueError.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

  • scale_to_max – Desired intensity of the most intense peak after normalization. Default is 1.0. For example, scale_to_max=1000.0 scales the base peak to intensity 1000.

Returns:

Spectrum with normalized intensities, or None if input is None.

Return type:

Spectrum or None

matchms.filtering.reduce_to_number_of_peaks(spectrum_in: Spectrum, n_required: int = 0, n_max: int = inf, ratio_desired: float | None = None, clone: bool | None = True) Spectrum | None

Lowest intensity peaks will be removed when it has more peaks than desired.

Parameters:
  • spectrum_in – Input spectrum.

  • n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’. Default is 1.

  • n_max – Maximum number of peaks. Remove peaks if more peaks are found. Default is inf.

  • ratio_desired – Set desired ratio between maximum number of peaks and parent mass. For spectra without parent mass (e.g. GCMS spectra) this will raise an error when ratio_desired is used. Default is None.

  • clone – Optionally clone the Spectrum.

matchms.filtering.remove_noise_below_frequent_intensities(spectrum_in: Spectrum, min_count_of_frequent_intensities: int = 5, noise_level_multiplier: float = 2.0, clone: bool | None = True) Spectrum | None

Removes noise if intensities exactly match frequently

When no noise filtering has been applied to a spectrum, many spectra show repeating intensities. From all intensities that repeat more than min_count_of_frequent_intensities the highest is selected. The noise level is set to this intensity * noise_level_multiplier. All fragments with an intensity below the noise level are removed.

This filter was suggested by Tytus Mak.

Parameters:
  • spectrum_in – Input spectrum.

  • min_count_of_frequent_intensities – Minimum number of repeating intensities.

  • noise_level_multiplier – From all intensities that repeat more than min_count_of_frequent_intensities the highest is selected. The noise level is set to this intensity * noise_level_multiplier.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed intensities, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_peaks_around_precursor_mz(spectrum_in: Spectrum, mz_tolerance: float = 17, clone: bool | None = True) Spectrum | None
Remove peaks that are within mz_tolerance (in Da) of

the precursor mz, excluding the precursor peak.

Parameters:
  • spectrum_in – Input spectrum.

  • mz_tolerance – Tolerance of mz values that are not allowed to lie within the precursor mz. Default is 17 Da.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_peaks_outside_top_k(spectrum_in: Spectrum, k: int = 6, mz_window: float = 50, clone: bool | None = True) Spectrum | None
Remove all peaks which are not within mz_window of at least one

of the k highest intensity peaks of the spectrum.

Parameters:
  • spectrum_in – Input spectrum.

  • k – The number of most intense peaks to compare to. Default is 6.

  • mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_peaks_relative_to_precursor_mz(spectrum_in: Spectrum, offset_to_precursor: float = -1.6, clone: bool | None = True) Spectrum | None

Remove all peaks with m/z values > precursor-m/z + offset_to_precursor.

If offset_to_precursor is negative, this means that all peaks with m/z values greater than (precursor_mz - |offset_to_precursor|). If offset_to_precursor is positive, the precursor_mz peak itself will remain.

Parameters:
  • spectrum_in – Input spectrum.

  • offset_to_precursor – All peaks with mz values > precursor_mz + offset_to_precursor will be removed. Default is -1.6 Da based Flash Entropy article by Li and Fiehn, 2023, Nat. Comm. (see https://www.nature.com/articles/s41592-023-02012-9)

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_profiled_spectra(spectrum_in: Spectrum, mz_window=0.5, clone: bool | None = True) Spectrum | None

Remove profiled spectra

Spectra are removed if within the mz_window of 0.5 of the highest peak at least 2 peaks next to the main peak are of intensity > max_intensity/2.

Reproduced from MZmine. https://github.com/mzmine/mzmine3/blob/master/src/main/java/io/github/mzmine/util/scans/ScanUtils.java#L609

Parameters:
  • spectrum_in – Input spectrum.

  • mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.

  • clone – Optionally clone the Spectrum.

Returns:

None if the spectrum is likely profile data, else the input spectrum.

Return type:

Spectrum or None

matchms.filtering.repair_adduct_and_parent_mass_based_on_smiles(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict

Correct adduct and parent mass based on smiles and precursor_mz.

matchms.filtering.repair_adduct_based_on_parent_mass(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict

Correct adduct based on parent_mass and precursor_mz.

matchms.filtering.repair_inchi_inchikey_smiles(spectrum_in, *, clone: bool | None = True) dict[str, str]

Check if inchi, inchikey, and smiles entries seem correct.

Detect and correct if any of those entries clearly belongs into one of the other two fields, for example if an inchikey is found in the inchi field.

matchms.filtering.repair_not_matching_annotation(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/development/matchms/Spectrum.py'>, clone: bool | None = True) Spectrum | None

Repairs mismatches in a spectrum’s annotations related to SMILES, InChI, and InChIKey.

Given a spectrum, this function ensures that the provided SMILES, InChI, and InChIKey annotations are consistent with one another. If there are discrepancies, they are resolved as follows:

  1. If the SMILES and InChI do not match:
    • Both SMILES and InChI are checked against the parent mass.

    • The annotation that matches the parent mass is retained, and the other is regenerated.

  2. If the InChIKey does not match the InChI:
    • A new InChIKey is generated from the InChI and replaces the old one.

Warnings and information logs are generated to track changes and potential issues. For correctness of InChIKey entries, only the first 14 characters are considered.

Parameters:

spectrum_inSpectrum

The input spectrum containing annotations to be checked and repaired.

clone:

Optionally clone the Spectrum.

Returns:

Spectrum

A cloned version of the input spectrum with corrected annotations. If the input spectrum is None, it returns None.

matchms.filtering.repair_parent_mass_from_smiles(spectrum_in, mass_tolerance: float = 0.1, *, clone: bool | None = True) dict

Set parent mass to match smiles mass if not already close.

matchms.filtering.repair_parent_mass_is_molar_mass(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict

Change parent mass from molar mass into monoisotopic mass where applicable.

matchms.filtering.repair_parent_mass_match_smiles_wrapper(spectrum_in: Spectrum, mass_tolerance: float = 0.2, clone: bool | None = True) Spectrum | None

Repair a mismatch between parent mass and smiles mass.

The filter tries several increasingly involved repair steps: first salt removal from SMILES, then correction of molar mass to monoisotopic mass, then adduct/parent mass repair based on SMILES.

matchms.filtering.repair_smiles_of_salts(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict

Repair salt SMILES to match parent mass.

matchms.filtering.require_compound_name(spectrum_in) bool

Ensure that the compound name is present in the spectrum metadata.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if it contains a compound name, otherwise None. SpectraCollection input is returned with rows lacking compound names removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_correct_ionmode(spectrum_in, ion_mode_to_keep) bool

Validate that the spectrum ionmode matches the requested ionmode.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • ion_mode_to_keep – Desired ionmode: "positive", "negative", or "both". If "both", spectra are kept when ionmode is either "positive" or "negative".

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if its ionmode matches the requirement, otherwise None. SpectraCollection input is returned with non-matching rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_correct_ms_level(spectrum_in, required_ms_level: int = 2) bool

Remove spectra where the ms_level does not match the required_ms_level.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • required_ms_level – Required MS level. Default is 2.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if the MS level matches, otherwise None. SpectraCollection input is returned with non-matching rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_formula(spectrum_in) bool

Ensure that the molecular formula is present and looks valid.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if it contains a valid molecular formula, otherwise None. SpectraCollection input is returned with rows lacking a valid formula removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_matching_adduct_and_ionmode(spectrum_in) bool

Remove spectra where the adduct and ionmode do not match.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if adduct and ionmode match, otherwise None. SpectraCollection input is returned with non-matching rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_matching_adduct_precursor_mz_parent_mass(spectrum_in, tolerance=0.1) bool

Check if adduct, precursor m/z, and parent mass match within tolerance.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • tolerance – Absolute tolerance used to compare the given parent mass to the parent mass implied by precursor_mz and adduct. Default is 0.1.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if adduct, precursor m/z, and parent mass match, otherwise None. SpectraCollection input is returned with non-matching rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_maximum_number_of_peaks(spectrum_in: Spectrum, maximum_number_of_fragments: int = 1000, clone: bool | None = True) Spectrum | None

Spectrum will be removed when it has more peaks than maximum_number_of_fragments.

For single Spectrum import this will return ‘None’ when the number of peaks exceeds the maximum_number_of_fragments. For SpectraCollection import, spectra with more peaks than maximum_number_of_fragments will be removed from the collection.

Parameters:
  • spectrum_in – Input spectrum.

  • maximum_number_of_fragments – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.

  • clone – Optionally clone the Spectrum.

Returns:

Untouched Spectrum or ‘None’.

Return type:

Spectrum or None

matchms.filtering.require_minimum_number_of_high_peaks(spectrum_in: Spectrum, no_peaks: int = 5, intensity_percent: float = 2.0, clone: bool | None = True) Spectrum | None

Removes spectra if the number of peaks with relative intensity above or equal to intensity_percent is less than no_peaks.

For single Spectrum import this will return ‘None’ when the number of peaks with relative intensity above or equal to intensity_percent is less than no_peaks. For SpectraCollection import, spectra with fewer peaks with relative intensity above or equal to intensity_percent than no_peaks will be removed from the collection.

Parameters:
  • spectrum_in – Input spectrum.

  • no_peaks – Minimum number of peaks allowed to have relative intensity above intensity_percent. Less peaks will return none. Default is 5.

  • intensity_percent – Minimum relative intensity (as a percentage between 0-100) for peaks that are searched. Default is 2.

  • clone – Optionally clone the Spectrum.

Returns:

Untouched Spectrum or ‘None’.

Return type:

Spectrum or None

matchms.filtering.require_minimum_number_of_peaks(spectrum_in: Spectrum, n_required: int = 10, ratio_required: float | None = None, clone: bool | None = True) Spectrum | None

Spectrum will be set to None when it has fewer peaks than required.

Parameters:
  • spectrum_in – Input spectrum.

  • n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.

  • ratio_required – Set desired ratio between minimum number of peaks and parent mass. Default is None.

  • clone – Optionally clone the Spectrum.

Returns:

Untouched Spectrum or ‘None’.

Return type:

Spectrum or None

matchms.filtering.require_parent_mass_match_smiles(spectrum_in, mass_tolerance) bool

Validate that parent mass matches the mass calculated from SMILES.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • mass_tolerance – Allowed absolute mass difference between parent_mass and the monoisotopic neutral mass calculated from smiles.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if parent_mass matches the SMILES-derived mass, otherwise None. SpectraCollection input is returned with non-matching rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_precursor_mz(spectrum_in, minimum_accepted_mz: float | None = 10.0, maximum_mz: float | None = None) bool

Require precursor m/z to be present and within optional bounds.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • minimum_accepted_mz – Minimum accepted precursor m/z. Default is 10.0. Use None to disable the lower bound.

  • maximum_mz – Maximum accepted precursor m/z. Default is None.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if precursor m/z passes the checks, otherwise None. SpectraCollection input is returned with failing rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_retention_index(spectrum_in) bool

Require retention index to be present.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if retention_index is present, otherwise None. SpectraCollection input is returned with rows lacking retention index removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.require_valid_annotation(spectrum_in) bool

Require valid and matching SMILES, InChI, and InChIKey annotations.

Parameters:
  • spectrum_in – Input spectrum or spectra collection.

  • clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Spectrum input is returned unchanged if annotations are valid and matching, otherwise None. SpectraCollection input is returned with invalid rows removed.

Return type:

Spectrum, SpectraCollection, or None

matchms.filtering.select_by_intensity(spectrum_in: Spectrum, intensity_from: float = 0.01, intensity_to: float = 1.0, clone: bool | None = True) Spectrum | None

Keep only peaks within set intensity range (keep if intensity_from >= intensity >= intensity_to). In most cases it is adviced to use select_by_relative_intensity() function instead.

Parameters:
  • spectrum_in – Input spectrum.

  • intensity_from – Set lower threshold for peak intensity. Default is 0.01.

  • intensity_to – Set upper threshold for peak intensity. Default is 1.0.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with peaks within the specified intensity range, or None if not present.

Return type:

Spectrum or None

matchms.filtering.select_by_mz(spectrum_in: Spectrum, mz_from: float = 0.0, mz_to: float = 1000.0, clone: bool | None = True) Spectrum | None

Keep only peaks between mz_from and mz_to.

Peaks are kept if mz_from <= m/z <= mz_to.

Parameters:
  • spectrum_in – Input spectrum.

  • mz_from – Set lower threshold for m/z peak positions. Default is 0.0.

  • mz_to – Set upper threshold for m/z peak positions. Default is 1000.0.

  • clone – Optionally clone the Spectrum.

matchms.filtering.select_by_relative_intensity(spectrum_in: Spectrum, intensity_from: float = 0.0, intensity_to: float = 1.0, clone: bool | None = True) Spectrum | None

Keep only peaks within set relative intensity range (keep if intensity_from >= intensity >= intensity_to).

Parameters:
  • spectrum_in – Input spectrum.

  • intensity_from – Set lower threshold for relative peak intensity. Default is 0.0.

  • intensity_to – Set upper threshold for relative peak intensity. Default is 1.0.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with peaks within the relative intensity range, or None if not present.

Return type:

Spectrum or None

Subpackages

Submodules