matchms.filtering package

Functions for processing mass spectra

Provided functions will usually only perform a single action to a spectrum. This can be changes or corrections of metadata, or peak filtering. More complicated processing pipelines can be build by stacking several of the provided filters.

Example of how to use a single filter function:

import numpy as np
from matchms import Spectrum
from matchms.filtering import normalize_intensities

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})
spectrum_filtered = normalize_intensities(spectrum)

max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00
matchms filtering sketch

Sketch of matchms spectrum processing.

matchms.filtering.add_compound_name(spectrum_in: Spectrum) Spectrum[source]

Add compound_name to correct field: “compound_name” in metadata.

matchms.filtering.add_fingerprint(spectrum_in: Spectrum, fingerprint_type: str = 'daylight', nbits: int = 2048) Spectrum[source]

Add molecular finterprint to spectrum.

If smiles or inchi present in metadata, derive a molecular finterprint and add it to the spectrum.

Parameters
  • spectrum_in – Input spectrum.

  • fingerprint_type – Determine method for deriving molecular fingerprints. Supported choices are “daylight”, “morgan1”, “morgan2”, “morgan3”. Default is “daylight”.

  • nbits – Dimension or number of bits of generated fingerprint. Default is 2048.

matchms.filtering.add_losses(spectrum_in: Spectrum, loss_mz_from=0.0, loss_mz_to=1000.0) Spectrum[source]

Derive losses based on precursor mass.

Parameters
  • spectrum_in – Input spectrum.

  • loss_mz_from – Minimum allowed m/z value for losses. Default is 0.0.

  • loss_mz_to – Maximum allowed m/z value for losses. Default is 1000.0.

matchms.filtering.add_parent_mass(spectrum_in: Spectrum, estimate_from_adduct: bool = True, overwrite_existing_entry: bool = False) Spectrum[source]

Add estimated parent mass to metadata (if not present yet).

Method to calculate the parent mass from given precursor m/z together with charge and/or adduct. Will take precursor m/z from “precursor_mz” as provided by running add_precursor_mz. For estimate_from_adduct=True this function will estimate the parent mass based on the mass and charge of known adducts. The table of known adduct properties can be found under matchms/data/known_adducts_table.csv.

Parameters
  • spectrum_in – Input spectrum.

  • estimate_from_adduct – When set to True, use adduct to estimate actual molecular mass (“parent mass”). Default is True. Switches back to charge-based estimate if adduct does not match a known adduct.

  • overwrite_existing_entry – Default is False. If set to True, a newly computed value will replace existing ones.

matchms.filtering.add_precursor_mz(spectrum_in)[source]

Add precursor_mz to correct field and make it a float.

For missing precursor_mz field: check if there is “pepmass”” entry instead. For string parsed as precursor_mz: convert to float.

matchms.filtering.add_retention_index(spectrum_in: Spectrum) Spectrum[source]

Add retention index into ‘retention_index’ key if present.

Parameters

spectrum – Spectrum with RI information.

Return type

Spectrum with RI info stored under ‘retention_index’.

matchms.filtering.add_retention_time(spectrum_in: Spectrum) Spectrum[source]

Add retention time information to the ‘retention_time’ key as float. Negative values and those not convertible to a float result in ‘retention_time’ being ‘None’.

Parameters

spectrum – Spectrum with retention time information.

Return type

Spectrum with harmonized retention time information.

matchms.filtering.clean_compound_name(spectrum_in: Spectrum) Spectrum[source]

Clean compound name.

A list of frequently seen name additions that do not belong to the compound name will be removed.

matchms.filtering.correct_charge(spectrum_in: Spectrum) Spectrum[source]

Correct charge values based on given ionmode.

For some spectrums, the charge value is either undefined or inconsistent with its ionmode, which is corrected by this filter.

Parameters

spectrum_in – Input spectrum.

matchms.filtering.default_filters(spectrum: Spectrum) Spectrum[source]

Collection of filters that are considered default and that do no require any (factory) arguments.

Collection is

  1. make_charge_int()

  2. make_ionmode_lowercase()

  3. set_ionmode_na_when_missing()

  4. add_compound_name()

  5. derive_adduct_from_name()

  6. derive_formula_from_name()

  7. clean_compound_name()

  8. interpret_pepmass()

  9. add_precursor_mz()

  10. derive_ionmode()

  11. correct_charge()

matchms.filtering.derive_adduct_from_name(spectrum_in: Spectrum, remove_adduct_from_name: bool = True) Spectrum[source]

Find adduct in compound name and add to metadata (if not present yet).

Method to interpret the given compound name to find the adduct.

Parameters
  • spectrum_in – Input spectrum.

  • remove_adduct_from_name – Remove found adducts from compound name if set to True. Default is True.

matchms.filtering.derive_formula_from_name(spectrum_in: Spectrum, remove_formula_from_name: bool = True) Spectrum[source]

Detect and remove misplaced formula in compound name and add to metadata.

Method to find misplaced formulas in compound name based on regular expression. This will not chemically test the detected formula, so the search is limited to frequently occuring types of shape ‘C47H83N1O8P1’.

Parameters
  • spectrum_in – Input spectrum.

  • remove_formula_from_name – Remove found formula from compound name if set to True. Default is True.

matchms.filtering.derive_inchi_from_smiles(spectrum_in: Spectrum) Spectrum[source]

Find missing Inchi and derive from smiles where possible.

matchms.filtering.derive_inchikey_from_inchi(spectrum_in: Spectrum) Spectrum[source]

Find missing InchiKey and derive from Inchi where possible.

matchms.filtering.derive_ionmode(spectrum_in: Spectrum) Spectrum[source]

Derive missing ionmode based on adduct.

Some input formates (e.g. MGF files) do not always provide a correct ionmode. This function reads the adduct from the metadata and uses this to fill in the correct ionmode where missing.

Parameters

spectrum – Input spectrum.

Return type

Spectrum object with ionmode attribute set.

matchms.filtering.derive_smiles_from_inchi(spectrum_in: Spectrum) Spectrum[source]

Find missing smiles and derive from Inchi where possible.

matchms.filtering.harmonize_undefined_inchi(spectrum_in: Spectrum, undefined: str = '', aliases: Optional[List[str]] = None) Spectrum[source]

Replace all aliases for empty/undefined inchi entries by value of undefined argument.

Parameters
  • undefined – Give desired entry for undefined inchi fields. Default is “”.

  • aliases – Enter list of strings that are expected to represent undefined entries. Default is [“”, “N/A”, “NA”, “n/a”].

matchms.filtering.harmonize_undefined_inchikey(spectrum_in: Spectrum, undefined: str = '', aliases: Optional[List[str]] = None) Spectrum[source]

Replace all aliases for empty/undefined inchikey entries by undefined.

Parameters
  • undefined – Give desired entry for undefined inchikey fields. Default is “”.

  • aliases – Enter list of strings that are expected to represent undefined entries. Default is [“”, “N/A”, “NA”, “n/a”, “no data”].

matchms.filtering.harmonize_undefined_smiles(spectrum_in: Spectrum, undefined: str = '', aliases: Optional[List[str]] = None) Spectrum[source]

Replace all aliases for empty/undefined smiles entries by undefined.

Parameters
  • undefined – Give desired entry for undefined smiles fields. Default is “”.

  • aliases – Enter list of strings that are expected to represent undefined entries. Default is [“”, “N/A”, “NA”, “n/a”, “no data”].

matchms.filtering.interpret_pepmass(spectrum_in)[source]

Reads pepmass field (if present) and adds values to correct field(s).

The field “pepmass” or “PEPMASS” is often used to describe the precursor ion. This function will interpret the values as (mz, intensity, charge) tuple. Those will be splitted (if present) added to the fields “precursor_mz”, “precursor_intensity”, and “charge”.

matchms.filtering.load_adducts_dict() Dict[str, dict][source]

Load dictionary of known adducts containing the adduct mass and charge. Makes sure that file loading is cached.

Adduct information is based on information from https://fiehnlab.ucdavis.edu/staff/kind/metabolomics/ms-adduct-calculator/ and was extended by F.Huber and JJJ.v.d.Hooft.

The full table can be found at https://github.com/matchms/matchms/blob/expand_adducts/matchms/data/known_adducts_table.csv

TODO: change to relative path link or update link

matchms.filtering.load_known_adduct_conversions() Dict[str, dict][source]

Load dictionary of known adduct conversions. Makes sure that file loading is cached.

matchms.filtering.make_charge_int(spectrum_in: Spectrum) Spectrum[source]

Convert charge field to integer (if possible).

matchms.filtering.make_charge_scalar(spectrum_in: Spectrum) Spectrum[source]

Convert charge field to scalar (if necessary).

Deprecated function, now replaced by make_charge_int().

Deprecated since version 0.8.2: Use expanded make_charge_int() instead.

matchms.filtering.make_ionmode_lowercase(spectrum_in: Spectrum) Spectrum[source]

Convert entry in field “ionmode” to lower case.

matchms.filtering.normalize_intensities(spectrum_in: Spectrum) Spectrum[source]

Normalize intensities of peaks (and losses) to unit height.

matchms.filtering.reduce_to_number_of_peaks(spectrum_in: Spectrum, n_required: int = 1, n_max: int = inf, ratio_desired: Optional[float] = None) Spectrum[source]

Lowest intensity peaks will be removed when it has more peaks than desired.

Parameters
  • spectrum_in – Input spectrum.

  • n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’. Default is 1.

  • n_max – Maximum number of peaks. Remove peaks if more peaks are found. Default is inf.

  • ratio_desired – Set desired ratio between maximum number of peaks and parent mass. For spectra without parent mass (e.g. GCMS spectra) this will raise an error when ratio_desired is used. Default is None.

matchms.filtering.remove_peaks_around_precursor_mz(spectrum_in: Spectrum, mz_tolerance: float = 17) Spectrum[source]
Remove peaks that are within mz_tolerance (in Da) of

the precursor mz, exlcuding the precursor peak.

Parameters
  • spectrum_in – Input spectrum.

  • mz_tolerance – Tolerance of mz values that are not allowed to lie within the precursor mz. Default is 17 Da.

matchms.filtering.remove_peaks_outside_top_k(spectrum_in: Spectrum, k: int = 6, mz_window: float = 50) Spectrum[source]
Remove all peaks which are not within mz_window of at least one

of the k highest intensity peaks of the spectrum.

Parameters
  • spectrum_in – Input spectrum.

  • k – The number of most intense peaks to compare to. Default is 6.

  • mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.

matchms.filtering.repair_inchi_inchikey_smiles(spectrum_in: Spectrum) Spectrum[source]

Check if inchi, inchikey, and smiles entries seem correct. Detect and correct if any of those entries clearly belongs into one of the other two fields (e.g. inchikey found in inchi field).

matchms.filtering.require_minimum_number_of_peaks(spectrum_in: Spectrum, n_required: int = 10, ratio_required: Optional[float] = None) Spectrum[source]

Spectrum will be set to None when it has fewer peaks than required.

Parameters
  • spectrum_in – Input spectrum.

  • n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.

  • ratio_required – Set desired ratio between minimum number of peaks and parent mass. Default is None.

matchms.filtering.require_minimum_of_high_peaks(spectrum_in: Spectrum, no_peaks: int = 5, intensity_percent: float = 2.0) Spectrum[source]
Returns None if the number of peaks with relative intensity

above or equal to intensity_percent is less than no_peaks.

Parameters
  • spectrum_in – Input spectrum.

  • no_peaks – Minimum number of peaks allowed to have relative intensity above intensity_percent. Less peaks will return none. Default is 5.

  • intensity_percent – Minimum relative intensity (as a percentage between 0-100) for peaks that are searched. Default is 2

matchms.filtering.require_precursor_below_mz(spectrum_in: Spectrum, max_mz: float = 1000) Spectrum[source]
Returns None if the precursor_mz of a spectrum is above

max_mz.

Parameters
  • spectrum_in – Input spectrum.

  • max_mz – Maximum mz value for the precursor mz of a spectrum. All precursor mz values greater or equal to this will return none. Default is 1000.

matchms.filtering.require_precursor_mz(spectrum_in: Spectrum, minimum_accepted_mz: float = 10.0) Optional[Spectrum][source]

Returns None if there is no precursor_mz or if <= minimum_accepted_mz

Parameters
  • spectrum_in – Input spectrum.

  • minimum_accepted_mz – Set to minimum acceptable value for precursor m/z. Default is set to 10.0.

matchms.filtering.select_by_intensity(spectrum_in: Spectrum, intensity_from: float = 10.0, intensity_to: float = 200.0) Spectrum[source]

Keep only peaks within set intensity range (keep if intensity_from >= intensity >= intensity_to). In most cases it is adviced to use select_by_relative_intensity() function instead.

Parameters
  • intensity_from – Set lower threshold for peak intensity. Default is 10.0.

  • intensity_to – Set upper threshold for peak intensity. Default is 200.0.

matchms.filtering.select_by_mz(spectrum_in: Spectrum, mz_from: float = 0.0, mz_to: float = 1000.0) Spectrum[source]

Keep only peaks between mz_from and mz_to (keep if mz_from >= m/z >= mz_to).

Parameters
  • mz_from – Set lower threshold for m/z peak positions. Default is 0.0.

  • mz_to – Set upper threshold for m/z peak positions. Default is 1000.0.

matchms.filtering.select_by_relative_intensity(spectrum_in: Spectrum, intensity_from: float = 0.0, intensity_to: float = 1.0) Spectrum[source]

Keep only peaks within set relative intensity range (keep if intensity_from >= intensity >= intensity_to).

Parameters
  • intensity_from – Set lower threshold for relative peak intensity. Default is 0.0.

  • intensity_to – Set upper threshold for relative peak intensity. Default is 1.0.

matchms.filtering.set_ionmode_na_when_missing(spectrum_in: Spectrum) Spectrum[source]

Create “ionmode” entry of “n/a” if field does not yet exist in metadata.

Submodules