matchms.filtering package

Processing (or: filtering) mass spectra

Provided functions will usually only perform a single action to a spectrum. This can be changes or corrections of metadata, or peak filtering. More complicated processing pipelines can be build by stacking several of the provided filters.

Because there are numerous filter functions in matchms and because they often need to be applied in a specific order, the most feasible workflow for users is to use the SpectrumProcessor class to define a spetrum processing pipeline. Here is an example:

import numpy as np
from matchms import Spectrum
from matchms import SpectrumProcessor

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})

# Users can pick a predefined pipeline from default pipelines, or specify a list of filters
processing = SpectrumProcessor(["normalize_intensities"])

# Run the processing pipeline:
spectrum_filtered = processing.process_spectrum(spectrum)
max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00

It is also possible to run each filter function individually. This for instance makes sense if users want to develop a highly customized spectrum processing routine. Example of how to use a single filter function:

import numpy as np
from matchms import Spectrum
from matchms.filtering import normalize_intensities

spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
                    intensities=np.array([200.0, 300.0, 50.0, 1.0]),
                    metadata={'id': 'spectrum1'})
spectrum_filtered = normalize_intensities(spectrum)

max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")

Should output

Maximum intensity is 1.00

matchms filtering sketch — Sketch of matchms spectrum processing.

class matchms.filtering.SpeciesString(dirty: str)[source]

Bases: object

A class to process and clean different types of chemical structure strings including InChI, InChIKey, and SMILES.

The class takes a raw input string, determines the intended structure type, and then cleans the string based on its type.

dirty

Raw input string representing a chemical structure.

Type:: str

target

The intended structure type determined from the input string. Could be ‘inchi’, ‘inchikey’, ‘smiles’, or None if no valid type was identified.

Type:: str

cleaned

The cleaned structure string.

Type:: str

__init__(dirty: str)[source]

Constructs a new instance of the SpeciesString class.

Parameters:: dirty (str) – The raw input string representing a chemical structure.

clean()[source]: Clean the input string based on its determined structure type.

clean_as_inchi()[source]: Search for valid inchi and harmonize it.

clean_as_inchikey()[source]: Search for valid inchikey and harmonize it.

clean_as_smiles()[source]: Search for valid smiles and harmonize it.

guess_target()[source]: Determine the intended structure type of the input string.

looks_like_a_smiles()[source]: Return True if string is made of allowed charcters for smiles.

looks_like_an_inchi()[source]: Search for first piece of InChI.

looks_like_an_inchikey()[source]: Return True if string has format of inchikey.

matchms.filtering.add_compound_name(spectrum_in, *, clone: bool | None = True) → dict

Add compound name to the compound_name metadata field.

If compound_name is missing, this filter tries to copy the value from name first and then from title.

Parameters:

spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If False, the input object may be modified in place.

Returns:

Input object with added compound_name metadata, or None if the input was None.

Return type:

matchms.filtering package

Processing (or: filtering) mass spectra

Requirements

Parameters:

Returns:

Subpackages

Submodules