matchms.filtering package

Processing (or: filtering) mass spectra

Provided functions will usually only perform a single action to a spectrum. This can be changes or corrections of metadata, or peak filtering. More complicated processing pipelines can be build by stacking several of the provided filters.

Because there are numerous filter functions in matchms and because they often need to be applied in a specific order, the most feasible workflow for users is to use the SpectrumProcessor class to define a spetrum processing pipeline. Here is an example:

import numpy as np
from matchms import Spectrum
from matchms import SpectrumProcessor

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})

# Users can pick a predefined pipeline from default pipelines, or specify a list of filters
processing = SpectrumProcessor(["normalize_intensities"])

# Run the processing pipeline:
spectrum_filtered = processing.process_spectrum(spectrum)
max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00

It is also possible to run each filter function individually. This for instance makes sense if users want to develop a highly customized spectrum processing routine. Example of how to use a single filter function:

import numpy as np
from matchms import Spectrum
from matchms.filtering import normalize_intensities

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})
spectrum_filtered = normalize_intensities(spectrum)

max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00
matchms filtering sketch

Sketch of matchms spectrum processing.

class matchms.filtering.SpeciesString(dirty: str)[source]

Bases: object

A class to process and clean different types of chemical structure strings including InChI, InChIKey, and SMILES.

The class takes a raw input string, determines the intended structure type, and then cleans the string based on its type.

dirty

Raw input string representing a chemical structure.

Type:

str

target

The intended structure type determined from the input string. Could be ‘inchi’, ‘inchikey’, ‘smiles’, or None if no valid type was identified.

Type:

str

cleaned

The cleaned structure string.

Type:

str

__init__(dirty: str)[source]

Constructs a new instance of the SpeciesString class.

Parameters:

dirty (str) – The raw input string representing a chemical structure.

clean()[source]

Clean the input string based on its determined structure type.

clean_as_inchi()[source]

Search for valid inchi and harmonize it.

clean_as_inchikey()[source]

Search for valid inchikey and harmonize it.

clean_as_smiles()[source]

Search for valid smiles and harmonize it.

guess_target()[source]

Determine the intended structure type of the input string.

looks_like_a_smiles()[source]

Return True if string is made of allowed charcters for smiles.

looks_like_an_inchi()[source]

Search for first piece of InChI.

looks_like_an_inchikey()[source]

Return True if string has format of inchikey.

matchms.filtering.add_compound_name(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Add compound_name to correct field: “compound_name” in metadata.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added compound name, or None if not present.

Return type:

Spectrum or None

matchms.filtering.add_fingerprint(spectrum_in: Spectrum | None, fingerprint_type: str = 'daylight', nbits: int = 2048, clone: bool | None = True) Spectrum | None[source]

Add molecular finterprint to spectrum.

If smiles or inchi present in metadata, derive a molecular finterprint and add it to the spectrum.

Parameters:
  • spectrum_in – Input spectrum.

  • fingerprint_type – Determine method for deriving molecular fingerprints. Supported choices are “daylight”, “morgan1”, “morgan2”, “morgan3”. Default is “daylight”.

  • nbits – Dimension or number of bits of generated fingerprint. Default is 2048.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added fingerprint derived from SMILES or INCHI, or None if not present.

Return type:

Spectrum or None

matchms.filtering.add_parent_mass(spectrum_in: Spectrum, estimate_from_adduct: bool = True, overwrite_existing_entry: bool = False, estimate_from_charge: bool = True, clone: bool | None = True) Spectrum | None[source]

Add estimated parent mass to metadata (if not present yet).

Method to calculate the parent mass from given precursor m/z together with charge and/or adduct. Will take precursor m/z from “precursor_mz” as provided by running add_precursor_mz. For estimate_from_adduct=True this function will estimate the parent mass based on the mass and charge of known adducts. The table of known adduct properties can be found under matchms/data/known_adducts_table.csv.

Parameters:
  • spectrum_in – Input spectrum.

  • estimate_from_adduct – When set to True, use adduct to estimate actual molecular mass (“parent mass”). Default is True. Switches back to charge-based estimate if adduct does not match a known adduct.

  • overwrite_existing_entry – Default is False. If set to True, a newly computed value will replace existing ones.

  • estimate_from_charge – Default is True. If set to True, the charge will be used to estimate the parent mass. Adduct of the form [M+H]+, [M+H]2+, [M-H]- etc are assumed.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added parent mass, or None if not present.

Return type:

Spectrum or None

matchms.filtering.add_precursor_formula(spectrum_in, clone: bool | None = True)[source]

Derive and set ‘precursor_formula’ from neutral ‘formula’ and ‘adduct’.

Requirements:
  • spectrum_in must have metadata keys: ‘formula’ (neutral) and ‘adduct’.

  • ‘formula’ must be a simple concatenation of element symbols and counts (no parentheses/hydrates/isotopes).

matchms.filtering.add_precursor_mz(spectrum_in, clone: bool | None = True) Spectrum | None[source]

Add precursor_mz to correct field and make it a float.

For missing precursor_mz field: check if there is “pepmass”” entry instead. For string parsed as precursor_mz: convert to float.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added precursor mz metadata, or None if not present.

Return type:

Spectrum or None

matchms.filtering.add_retention_index(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Add retention index into ‘retention_index’ key if present.

Parameters:
  • spectrum_in – Spectrum with RI information.

  • clone – Optionally clone the Spectrum.

Return type:

Spectrum with RI info stored under ‘retention_index’.

matchms.filtering.add_retention_time(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Add retention time information to the ‘retention_time’ key as float. Negative values and those not convertible to a float result in ‘retention_time’ being ‘None’.

Parameters:
  • spectrum_in – Spectrum with retention time information.

  • clone – Optionally clone the Spectrum.

Return type:

Spectrum with harmonized retention time information.

matchms.filtering.clean_adduct(spectrum_in, clone: bool | None = True) Spectrum | None[source]

Clean adduct and make it consistent in style. Will transform adduct strings of type ‘M+H+’ to ‘[M+H]+’.

Parameters:
  • spectrum_in – Matchms Spectrum object.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with cleaned adduct, or None if not present.

Return type:

Spectrum or None

matchms.filtering.clean_compound_name(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Clean compound name.

A list of frequently seen name additions that do not belong to the compound name will be removed.

Parameters:
  • spectrum_in – Matchms Spectrum object.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with cleaned compound name, or None if not present.

Return type:

Spectrum or None

matchms.filtering.correct_charge(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Correct charge values based on given ionmode.

For some spectra, the charge value is either undefined or inconsistent with its ionmode, which is corrected by this filter.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with corrected charge derived from ionmode, or None if not present.

Return type:

Spectrum or None

matchms.filtering.default_filters(spectrum: Spectrum) Spectrum[source]

Collection of filters that are considered default and that do no require any (factory) arguments.

Collection is

  1. make_charge_int()

  2. add_compound_name()

  3. derive_adduct_from_name()

  4. derive_formula_from_name()

  5. clean_compound_name()

  6. interpret_pepmass()

  7. add_precursor_mz()

  8. derive_ionmode()

  9. correct_charge()

matchms.filtering.derive_adduct_from_name(spectrum_in: Spectrum, remove_adduct_from_name: bool = True, clone: bool | None = True) Spectrum | None[source]

Find adduct in compound name and add to metadata (if not present yet).

Method to interpret the given compound name to find the adduct.

Parameters:
  • spectrum_in – Input spectrum.

  • remove_adduct_from_name – Remove found adducts from compound name if set to True. Default is True.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added adduct, or None if not present.

Return type:

Spectrum or None

matchms.filtering.derive_annotation_from_compound_name(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>, annotated_compound_names_file: str | None = None, mass_tolerance: float = 0.1, clone: bool | None = True) Spectrum | None[source]

Adds inchi, inchikey based on compound name by searching pubchem smiles is not supported anymore by pubchempy, see https://github.com/matchms/matchms/issues/823 Smiles can be derived from inchi, by running the filter derive_smiles_from_inchi

This filter is only run, if there is not yet a valid smiles or inchi in the metadata. The inchi and inchikey are only added if the found annotation is close enough to the parent mass.

Parameters:
  • spectrum_in – The input spectrum.

  • annotated_compound_names_file (Optional[str]) – Any compound name that was searched for on pubchem will be added to this file. If a compound name is already in this file it will be used instead of looking up at pubchem. This file can be reused for future runs, speeding up the process. If None. The compound names found will still be cached for this run, but won’t be reusable for future runs. The csv file should contain the columns [“compound_name”, “smiles”, “inchi”, “inchikey”, “monoisotopic_mass”]

  • mass_tolerance – Acceptable mass difference between query compound and pubchem result.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added annotation, or None if not present.

Return type:

Spectrum or None

matchms.filtering.derive_formula_from_name(spectrum_in: Spectrum, remove_formula_from_name: bool = True, clone: bool | None = True) Spectrum | None[source]

Detect and remove misplaced formula in compound name and add to metadata.

Method to find misplaced formulas in compound name based on regular expression. This will not chemically test the detected formula, so the search is limited to frequently occuring types of shape ‘C47H83N1O8P1’.

Parameters:
  • spectrum_in – Input spectrum.

  • remove_formula_from_name – Remove found formula from compound name if set to True. Default is True.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with cleaned formula, or None if not present.

Return type:

Spectrum or None

matchms.filtering.derive_formula_from_smiles(spectrum_in, overwrite=True, clone: bool | None = True) Spectrum | None[source]

Adds the molecule’s formula from SMILES.

Parameters:
  • spectrum_in – Input spectrum.

  • overwrite – If True, will overwrite the formula. Default is True.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added molecular formular, or None if not present.

Return type:

Spectrum or None

matchms.filtering.derive_inchi_from_smiles(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Find missing Inchi and derive from smiles where possible.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added INCHI, or None if not present.

Return type:

Spectrum or None

matchms.filtering.derive_inchikey_from_inchi(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Find missing InchiKey and derive from Inchi where possible.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added INCHIKEY, or None if not present.

Return type:

Spectrum or None

matchms.filtering.derive_ionmode(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Derive missing ionmode based on adduct.

Some input formates (e.g. MGF files) do not always provide a correct ionmode. This function reads the adduct from the metadata and uses this to fill in the correct ionmode where missing.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Return type:

Spectrum object with ionmode attribute set.

matchms.filtering.derive_smiles_from_inchi(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Find missing smiles and derive from Inchi where possible.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.#

Returns:

Spectrum with added SMILES, or None if not present.

Return type:

Spectrum or None

matchms.filtering.harmonize_undefined_inchi(spectrum_in: Spectrum, undefined: str = '', aliases: List[str] = None, clone: bool | None = True) Spectrum | None[source]

Replace all aliases for empty/undefined inchi entries by value of undefined argument.

Parameters:
  • spectrum_in – Input spectrum.

  • undefined – Give desired entry for undefined inchi fields. Default is “”.

  • aliases – Enter list of strings that are expected to represent undefined entries. Default is [“”, “N/A”, “NA”, “n/a”].

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with undefined INCHI if not present or N/A, or None if not present.

Return type:

Spectrum or None

matchms.filtering.harmonize_undefined_inchikey(spectrum_in: Spectrum, undefined: str = '', aliases: List[str] = None, clone: bool | None = True) Spectrum | None[source]

Replace all aliases for empty/undefined inchikey entries by undefined.

Parameters:
  • spectrum_in – Input spectrum.

  • undefined – Give desired entry for undefined inchikey fields. Default is “”.

  • aliases – Enter list of strings that are expected to represent undefined entries. Default is [“”, “N/A”, “NA”, “n/a”, “no data”].

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with undefined INCHIKEY if not present or N/A, or None if not present.

Return type:

Spectrum or None

matchms.filtering.harmonize_undefined_smiles(spectrum_in: Spectrum, undefined: str = '', aliases: List[str] = None, clone: bool | None = True) Spectrum | None[source]

Replace all aliases for empty/undefined smiles entries by undefined.

Parameters:
  • spectrum_in – Input spectrum.

  • undefined – Give desired entry for undefined smiles fields. Default is “”.

  • aliases – Enter list of strings that are expected to represent undefined entries. Default is [“”, “N/A”, “NA”, “n/a”, “no data”].

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with undefined SMILES if not present or N/A, or None if not present.

Return type:

Spectrum or None

matchms.filtering.interpret_pepmass(spectrum_in, clone: bool | None = True) Spectrum | None[source]

Reads pepmass field (if present) and adds values to correct field(s).

The field “pepmass” or “PEPMASS” is often used to describe the precursor ion. This function will interpret the values as (mz, intensity, charge) tuple. Those will be splitted (if present) added to the fields “precursor_mz”, “precursor_intensity”, and “charge”.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with added pepmass, or None if not present.

Return type:

Spectrum or None

matchms.filtering.make_charge_int(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Convert charge field to integer (if possible).

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with converted charge, or None if not present.

Return type:

Spectrum or None

matchms.filtering.normalize_intensities(spectrum_in: Spectrum, clone: bool | None = True, scaling: tuple[float, float] | None = None) Spectrum | None[source]

Normalize intensities of peaks to unit height.

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

  • scaling – Optional tuple (min, max) to scale intensities to specific range. If None, normalizes to 0-1 range.

Returns:

Spectrum with mormalized Intensities, or None if not present.

Return type:

Spectrum or None

matchms.filtering.reduce_to_number_of_peaks(spectrum_in: Spectrum, n_required: int = 0, n_max: int = inf, ratio_desired: float | None = None, clone: bool | None = True) Spectrum | None[source]

Lowest intensity peaks will be removed when it has more peaks than desired.

Parameters:
  • spectrum_in – Input spectrum.

  • n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’. Default is 1.

  • n_max – Maximum number of peaks. Remove peaks if more peaks are found. Default is inf.

  • ratio_desired – Set desired ratio between maximum number of peaks and parent mass. For spectra without parent mass (e.g. GCMS spectra) this will raise an error when ratio_desired is used. Default is None.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with reduced lowest peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_noise_below_frequent_intensities(spectrum_in: Spectrum, min_count_of_frequent_intensities: int = 5, noise_level_multiplier: float = 2.0, clone: bool | None = True) Spectrum | None[source]

Removes noise if intensities exactly match frequently

When no noise filtering has been applied to a spectrum, many spectra show repeating intensities. From all intensities that repeat more than min_count_of_frequent_intensities the highest is selected. The noise level is set to this intensity * noise_level_multiplier. All fragments with an intensity below the noise level are removed.

This filter was suggested by Tytus Mak.

Parameters:
  • spectrum_in – Input spectrum.

  • min_count_of_frequent_intensities – Minimum number of repeating intensities.

  • noise_level_multiplier – From all intensities that repeat more than min_count_of_frequent_intensities the highest is selected. The noise level is set to this intensity * noise_level_multiplier.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed intensities, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_peaks_around_precursor_mz(spectrum_in: Spectrum, mz_tolerance: float = 17, clone: bool | None = True) Spectrum | None[source]
Remove peaks that are within mz_tolerance (in Da) of

the precursor mz, excluding the precursor peak.

Parameters:
  • spectrum_in – Input spectrum.

  • mz_tolerance – Tolerance of mz values that are not allowed to lie within the precursor mz. Default is 17 Da.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_peaks_outside_top_k(spectrum_in: Spectrum, k: int = 6, mz_window: float = 50, clone: bool | None = True) Spectrum | None[source]
Remove all peaks which are not within mz_window of at least one

of the k highest intensity peaks of the spectrum.

Parameters:
  • spectrum_in – Input spectrum.

  • k – The number of most intense peaks to compare to. Default is 6.

  • mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_peaks_relative_to_precursor_mz(spectrum_in: Spectrum, offset_to_precursor: float = -1.6, clone: bool | None = True) Spectrum | None[source]

Remove all peaks with m/z values > precursor-m/z + offset_to_precursor.

If offset_to_precursor is negative, this means that all peaks with m/z values greater than (precursor_mz - |offset_to_precursor|). If offset_to_precursor is positive, the precursor_mz peak itself will remain.

Parameters:
  • spectrum_in – Input spectrum.

  • offset_to_precursor – All peaks with mz values > precursor_mz + offset_to_precursor will be removed. Default is -1.6 Da based Flash Entropy article by Li and Fiehn, 2023, Nat. Comm. (see https://www.nature.com/articles/s41592-023-02012-9)

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with removed peaks, or None if not present.

Return type:

Spectrum or None

matchms.filtering.remove_profiled_spectra(spectrum_in: Spectrum, mz_window=0.5, clone: bool | None = True) Spectrum | None[source]

Remove profiled spectra

Spectra are removed if within the mz_window of 0.5 of the highest peak at least 2 peaks next to the main peak are of intensity > max_intensity/2.

Reproduced from MZmine. https://github.com/mzmine/mzmine3/blob/master/src/main/java/io/github/mzmine/util/scans/ScanUtils.java#L609

Parameters:
  • spectrum_in – Input spectrum.

  • mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.

  • clone – Optionally clone the Spectrum.

Returns:

None if the spectrum is likely profile data, else the input spectrum.

Return type:

Spectrum or None

matchms.filtering.repair_adduct_and_parent_mass_based_on_smiles(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>, mass_tolerance: float, clone: bool | None = True) Spectrum | None[source]

Corrects the adduct and parent mass of a spectrum based on its SMILES representation and the precursor m/z.

Given a spectrum, this function tries to match the spectrum’s parent mass, derived from its precursor m/z and known adducts, to the neutral monoisotopic mass of the molecule derived from its SMILES representation. If a match is found within a given mass tolerance, the adduct and parent mass of the spectrum are updated.

Parameters:

spectrum_inSpectrum

The input spectrum whose adduct needs to be repaired.

mass_tolerancefloat

Maximum allowed mass difference between the calculated parent mass and the neutral monoisotopic mass derived from the SMILES.

clone:

Optionally clone the Spectrum.

returns:

Spectrum with repaired parent mass, or None if not present.

rtype:

Spectrum or None

matchms.filtering.repair_adduct_based_on_parent_mass(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>, mass_tolerance: float, clone: bool | None = True) Spectrum | None[source]

Corrects the adduct of a spectrum based on its parent_mass representation and the precursor m/z.

Parameters:

spectrum_inSpectrum

The input spectrum whose adduct needs to be repaired.

mass_tolerancefloat

Maximum allowed mass difference between the parent mass and the parent mass based on the adduct.

clone:

Optionally clone the Spectrum.

returns:

Spectrum with repaired parent adduct, or None if not present.

rtype:

Spectrum or None

matchms.filtering.repair_inchi_inchikey_smiles(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

Check if inchi, inchikey, and smiles entries seem correct. Detect and correct if any of those entries clearly belongs into one of the other two fields (e.g. inchikey found in inchi field).

Parameters:
  • spectrum_in – Input spectrum.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with repaired INCHI, INCHIKEY and SMILES, or None if not present.

Return type:

Spectrum or None

matchms.filtering.repair_not_matching_annotation(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>, clone: bool | None = True) Spectrum | None[source]

Repairs mismatches in a spectrum’s annotations related to SMILES, InChI, and InChIKey.

Given a spectrum, this function ensures that the provided SMILES, InChI, and InChIKey annotations are consistent with one another. If there are discrepancies, they are resolved as follows:

  1. If the SMILES and InChI do not match:
    • Both SMILES and InChI are checked against the parent mass.

    • The annotation that matches the parent mass is retained, and the other is regenerated.

  2. If the InChIKey does not match the InChI:
    • A new InChIKey is generated from the InChI and replaces the old one.

Warnings and information logs are generated to track changes and potential issues. For correctness of InChIKey entries, only the first 14 characters are considered.

Parameters:

spectrum_inSpectrum

The input spectrum containing annotations to be checked and repaired.

clone:

Optionally clone the Spectrum.

Returns:

Spectrum

A cloned version of the input spectrum with corrected annotations. If the input spectrum is None, it returns None.

matchms.filtering.repair_parent_mass_from_smiles(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>, mass_tolerance: float = 0.1, clone: bool | None = True) Spectrum | None[source]

Sets the parent mass to match the smiles mass, if not already close to smiles mass

Parameters:

spectrum_inSpectrum

The input spectrum containing annotations to be checked and repaired.

clone:

Optionally clone the Spectrum.

returns:

Spectrum with repaired parent mass, or None if not present.

rtype:

Spectrum or None

matchms.filtering.repair_parent_mass_is_molar_mass(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>, mass_tolerance: float, clone: bool | None = True) Spectrum | None[source]

Changes the parent mass from molar mass into monoistopic mass

Manual entered parent mass is sometimes wrongly added as Molar mass instead of monoisotopic mass We check if the given parent mass is equal to the Molar mass (based on the smiles) and correct it to the monoisotopic mass in these cases.

The molar mass is an average mass based on the average of all common isotopes and will therefore differ from what is measured in mass spectrometry.

Parameters:

spectrum_inSpectrum

The input spectrum containing annotations to be checked and repaired.

mass_tolerance:

Maximum allowed mass difference between the calculated parent mass and the neutral monoisotopic mass derived from the SMILES.

clone:

Optionally clone the Spectrum.

returns:

Spectrum with repaired parent mass, or None if not present.

rtype:

Spectrum or None

matchms.filtering.repair_parent_mass_match_smiles_wrapper(spectrum_in: Spectrum, mass_tolerance: float = 0.2, clone: bool | None = True) Spectrum | None[source]

Wrapper function for repairing a mismatch between parent mass and smiles mass

Parameters:

spectrum_inSpectrum

The input spectrum containing annotations to be checked and repaired.

mass_tolerance:

Maximum allowed mass difference between the calculated parent mass and the neutral monoisotopic mass derived from the SMILES. Defaults to 0.2.

clone:

Optionally clone the Spectrum.

returns:

Spectrum with repaired parent mass, or None if not present.

rtype:

Spectrum or None

matchms.filtering.repair_smiles_of_salts(spectrum_in, mass_tolerance, clone: bool | None = True) Spectrum | None[source]

Repairs the smiles of a salt to match the parent mass. E.g. C1=NC2=NC=NC(=C2N1)N.Cl is converted to 1=NC2=NC=NC(=C2N1)N if this matches the parent mass Checks if parent mass matches one of the ions

Parameters:
  • spectrum_in – Input spectrum.

  • mass_tolerance – Maximum allowed mass difference between the calculated parent mass and the neutral monoisotopic mass derived from the SMILES.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with repaired SMILES, or None if not present.

Return type:

Spectrum or None

matchms.filtering.require_compound_name(spectrum: Spectrum) Spectrum | None[source]

Ensure that the compound name is present in the spectrum metadata.

matchms.filtering.require_correct_ionmode(spectrum_in: Spectrum, ion_mode_to_keep) Spectrum | None[source]

Validates the ion mode of a given spectrum. If the spectrum’s ion mode doesn’t match the ion_mode_to_keep, it will be removed and a log message will be generated.

Parameters:
  • spectrum_in (Spectrum) – The input spectrum to be validated. If None, the function will return None.

  • ion_mode_to_keep (str) – Desired ion mode (‘positive’, ‘negative’, or ‘both’). If not one of these, a ValueError is raised.

Returns:

The validated spectrum if its ion mode matches the desired one, or None otherwise.

Return type:

Spectrum or None

matchms.filtering.require_formula(spectrum: Spectrum) Spectrum | None[source]

Ensure that the molecular formula is present and looks like a valid formula.

matchms.filtering.require_matching_adduct_precursor_mz_parent_mass(spectrum, tolerance=0.1) Spectrum | None[source]

Checks if the adduct precursor mz and parent mass match within the tolerance

matchms.filtering.require_maximum_number_of_peaks(spectrum_in: Spectrum, maximum_number_of_fragments: int = 1000, clone: bool | None = True) Spectrum | None[source]

Spectrum will be set to None when it has more peaks than maximum_number_of_fragments.

Parameters:
  • spectrum_in – Input spectrum.

  • maximum_number_of_fragments – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.

  • clone – Optionally clone the Spectrum.

Returns:

Untouched Spectrum or ‘None’.

Return type:

Spectrum or None

matchms.filtering.require_minimum_number_of_high_peaks(spectrum_in: Spectrum, no_peaks: int = 5, intensity_percent: float = 2.0, clone: bool | None = True) Spectrum | None[source]
Returns None if the number of peaks with relative intensity

above or equal to intensity_percent is less than no_peaks.

Parameters:
  • spectrum_in – Input spectrum.

  • no_peaks – Minimum number of peaks allowed to have relative intensity above intensity_percent. Less peaks will return none. Default is 5.

  • intensity_percent – Minimum relative intensity (as a percentage between 0-100) for peaks that are searched. Default is 2.

  • clone – Optionally clone the Spectrum.

Returns:

Untouched Spectrum or ‘None’.

Return type:

Spectrum or None

matchms.filtering.require_minimum_number_of_peaks(spectrum_in: Spectrum, n_required: int = 10, ratio_required: float | None = None, clone: bool | None = True) Spectrum | None[source]

Spectrum will be set to None when it has fewer peaks than required.

Parameters:
  • spectrum_in – Input spectrum.

  • n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.

  • ratio_required – Set desired ratio between minimum number of peaks and parent mass. Default is None.

  • clone – Optionally clone the Spectrum.

Returns:

Untouched Spectrum or ‘None’.

Return type:

Spectrum or None

matchms.filtering.require_parent_mass_match_smiles(spectrum_in: Spectrum, mass_tolerance) Spectrum | None[source]

Validates if the parent mass of the given spectrum matches the mass calculated from its associated SMILES string within a specified tolerance.

Parameters:
  • spectrum_in (Spectrum) – The input spectrum to be validated. If None, the function will return None.

  • mass_tolerance (float) – The tolerance for the mass difference between the spectrum’s parent mass and the mass calculated from its SMILES string.

Returns:

The validated spectrum if its parent mass matches the SMILES mass within the specified tolerance, or None otherwise.

Return type:

Spectrum or None

matchms.filtering.require_precursor_below_mz(spectrum_in: Spectrum, max_mz: float = 1000) Spectrum[source]
Returns None if the precursor_mz of a spectrum is above

max_mz.

Parameters:
  • spectrum_in – Input spectrum.

  • max_mz – Maximum mz value for the precursor mz of a spectrum. All precursor mz values greater or equal to this will return none. Default is 1000.

matchms.filtering.require_precursor_mz(spectrum_in: Spectrum, minimum_accepted_mz: float | None = 10.0, maximum_mz: float | None = None, clone: bool | None = True) Spectrum | None[source]

Returns None if there is no precursor_mz or if <= minimum_accepted_mz

Parameters:
  • spectrum_in – Input spectrum.

  • minimum_accepted_mz – Set to minimum acceptable value for precursor m/z. Default is set to 10.0.

  • maximum_mz – Set the maximum value for precursor m/z.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with precursor_mz, or None if not present.

Return type:

Spectrum or None

matchms.filtering.require_retention_index(spectrum_in: Spectrum, clone: bool | None = True) Spectrum | None[source]

This function checks if the input spectrum has a ‘retention_index’ in its metadata. If the input spectrum is None or doesn’t have a ‘retention_index’, the function returns None. Otherwise, it returns a clone of the input spectrum.

Parameters:
  • (SpectrumType) (spectrum_in) – The input spectrum to check.

  • clone – Optionally clone the Spectrum.

  • Returns

  • SpectrumType (A clone of the input spectrum if it has a 'retention_index', None otherwise.)

matchms.filtering.require_retention_time(spectrum_in: Spectrum, minimum_rt=None, maximum_rt=None, clone: bool | None = True) Spectrum | None[source]

This function checks if the input spectrum has a ‘retention_time’ in its metadata. If the input spectrum is None or doesn’t have a ‘retention_time’, the function returns None. Otherwise, it returns a clone of the input spectrum.

Parameters:
  • (SpectrumType) (spectrum_in) – The input spectrum to check.

  • clone – Optionally clone the Spectrum.

  • Returns

  • SpectrumType (A clone of the input spectrum if it has a 'retention_time', None otherwise.)

matchms.filtering.require_valid_annotation(spectrum: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/matchms/Spectrum.py'>) Spectrum | None[source]

Removes spectra that are not fully annotated (correct and matching, smiles, inchi and inchikey)

matchms.filtering.select_by_intensity(spectrum_in: Spectrum, intensity_from: float = 10.0, intensity_to: float = 200.0, clone: bool | None = True) Spectrum | None[source]

Keep only peaks within set intensity range (keep if intensity_from >= intensity >= intensity_to). In most cases it is adviced to use select_by_relative_intensity() function instead.

Parameters:
  • spectrum_in – Input spectrum.

  • intensity_from – Set lower threshold for peak intensity. Default is 10.0.

  • intensity_to – Set upper threshold for peak intensity. Default is 200.0.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with peaks within the specified intensity range, or None if not present.

Return type:

Spectrum or None

matchms.filtering.select_by_mz(spectrum_in: Spectrum, mz_from: float = 0.0, mz_to: float = 1000.0, clone: bool | None = True) Spectrum | None[source]

Keep only peaks between mz_from and mz_to (keep if mz_from >= m/z >= mz_to).

Parameters:
  • spectrum_in – Input spectrum.

  • mz_from – Set lower threshold for m/z peak positions. Default is 0.0.

  • mz_to – Set upper threshold for m/z peak positions. Default is 1000.0.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with peaks within the specified mz range, or None if not present.

Return type:

Spectrum or None

matchms.filtering.select_by_relative_intensity(spectrum_in: Spectrum, intensity_from: float = 0.0, intensity_to: float = 1.0, clone: bool | None = True) Spectrum | None[source]

Keep only peaks within set relative intensity range (keep if intensity_from >= intensity >= intensity_to).

Parameters:
  • spectrum_in – Input spectrum.

  • intensity_from – Set lower threshold for relative peak intensity. Default is 0.0.

  • intensity_to – Set upper threshold for relative peak intensity. Default is 1.0.

  • clone – Optionally clone the Spectrum.

Returns:

Spectrum with peaks within the relative intensity range, or None if not present.

Return type:

Spectrum or None

Subpackages

Submodules