matchms.filtering package
Processing (or: filtering) mass spectra
Provided functions will usually only perform a single action to a spectrum. This can be changes or corrections of metadata, or peak filtering. More complicated processing pipelines can be build by stacking several of the provided filters.
Because there are numerous filter functions in matchms and because they often need to be applied in a specific order, the most feasible workflow for users is to use the SpectrumProcessor class to define a spetrum processing pipeline. Here is an example:
import numpy as np
from matchms import Spectrum
from matchms import SpectrumProcessor
spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
intensities=np.array([200.0, 300.0, 50.0, 1.0]),
metadata={'id': 'spectrum1'})
# Users can pick a predefined pipeline from default pipelines, or specify a list of filters
processing = SpectrumProcessor(["normalize_intensities"])
# Run the processing pipeline:
spectrum_filtered = processing.process_spectrum(spectrum)
max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")
Should output
Maximum intensity is 1.00
It is also possible to run each filter function individually. This for instance makes sense if users want to develop a highly customized spectrum processing routine. Example of how to use a single filter function:
import numpy as np
from matchms import Spectrum
from matchms.filtering import normalize_intensities
spectrum = Spectrum(mz=np.array([100, 120, 150, 200.]),
intensities=np.array([200.0, 300.0, 50.0, 1.0]),
metadata={'id': 'spectrum1'})
spectrum_filtered = normalize_intensities(spectrum)
max_intensity = spectrum_filtered.peaks.intensities.max()
print(f"Maximum intensity is {max_intensity:.2f}")
Should output
Maximum intensity is 1.00
Sketch of matchms spectrum processing.
- class matchms.filtering.SpeciesString(dirty: str)[source]
Bases:
objectA class to process and clean different types of chemical structure strings including InChI, InChIKey, and SMILES.
The class takes a raw input string, determines the intended structure type, and then cleans the string based on its type.
- target
The intended structure type determined from the input string. Could be ‘inchi’, ‘inchikey’, ‘smiles’, or None if no valid type was identified.
- Type:
- matchms.filtering.add_compound_name(spectrum_in, *, clone: bool | None = True) dict
Add compound name to the
compound_namemetadata field.If
compound_nameis missing, this filter tries to copy the value fromnamefirst and then fromtitle.- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with added
compound_namemetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.add_parent_mass(spectrum_in, estimate_from_adduct: bool = True, overwrite_existing_entry: bool = False, estimate_from_charge: bool = True, *, clone: bool | None = True) dict
Add estimated parent mass to metadata if not present yet.
Method to calculate the parent mass from given precursor m/z together with charge and/or adduct. Will take precursor m/z from
precursor_mzas provided by runningadd_precursor_mz.For
estimate_from_adduct=Truethis function estimates the parent mass based on the mass and charge of known adducts. The table of known adduct properties can be found inmatchms/data/known_adducts_table.csv.- Parameters:
spectrum_in – Input spectrum or spectra collection.
estimate_from_adduct – When set to
True, use adduct to estimate actual molecular mass (parent_mass). Switches back to charge-based estimate if adduct does not match a known adduct. Default isTrue.overwrite_existing_entry – If
False, an existing parent-mass entry is kept. IfTrue, a newly computed value will replace existing ones. Default isFalse.estimate_from_charge – If
True, charge will be used to estimate the parent mass when adduct information is insufficient. Adducts of the form[M+H]+,[M+H]2+,[M-H]-etc. are assumed. Default isTrue.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with added or updated
parent_massmetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.add_precursor_formula(spectrum_in, *, clone: bool | None = True) dict
Derive and set
precursor_formulafrom neutralformulaandadduct.Requirements
Input metadata must contain
formulaandadduct.formulamust be a simple concatenation of element symbols and counts, without parentheses, hydrates, or isotope notation.
- param spectrum_in:
Input spectrum or spectra collection.
- param clone:
Optionally clone the input before applying the filter. If
False, the input object may be modified in place.- returns:
Input object with added
precursor_formulametadata, orNoneif the input wasNone.- rtype:
Spectrum, SpectraCollection, or None
- matchms.filtering.add_precursor_mz(spectrum_in, *, clone: bool | None = True) dict
Add precursor_mz to correct field and make it a float.
For missing
precursor_mzfield: check if there is apepmassentry instead. For strings parsed as precursor m/z, convert to float.- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with added precursor m/z metadata, or
Noneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.add_retention_index(spectrum_in, *, clone: bool | None = True) dict
Add retention index information to the
retention_indexkey as float.- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with harmonized retention index metadata, or
Noneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.add_retention_time(spectrum_in, *, clone: bool | None = True) dict
Add retention time information to the
retention_timekey as float.Negative values and values that cannot be converted to float result in no update for
retention_time.- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with harmonized retention time metadata, or
Noneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.clean_adduct(spectrum_in, *, clone: bool | None = True) dict
Clean adduct and make it consistent in style.
Will transform adduct strings of type
M+H+to[M+H]+.- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with cleaned
adductmetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.clean_compound_name(spectrum_in, *, clone: bool | None = True) dict
Clean compound name.
A list of frequently seen name additions that do not belong to the compound name will be removed.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with cleaned
compound_namemetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.correct_charge(spectrum_in, *, clone: bool | None = True) dict
Correct charge values based on given ionmode.
For some spectra, the charge value is either undefined or inconsistent with its ionmode, which is corrected by this filter.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with corrected
chargemetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.default_filters(spectrum: Spectrum) Spectrum[source]
Collection of filters that are considered default and that do no require any (factory) arguments.
Collection is
- matchms.filtering.derive_adduct_from_name(spectrum_in, remove_adduct_from_name: bool = True, *, clone: bool | None = True) dict
Find adduct in compound name and add it to metadata if not present yet.
Method to interpret the given compound name to find the adduct.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
remove_adduct_from_name – Remove found adducts from compound name if set to
True. Default isTrue.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with added
adductmetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.derive_annotation_from_compound_name(spectrum_in, annotated_compound_names_file: str | None = None, mass_tolerance: float = 0.1, *, clone: bool | None = True) dict
Add molecular annotations based on compound name by searching PubChem.
This filter adds
smiles,inchi, and/orinchikeymetadata based on a PubChem compound-name lookup. SMILES lookup is not supported directly by pubchempy anymore, see https://github.com/matchms/matchms/issues/823. SMILES can alternatively be derived from InChI by runningderive_smiles_from_inchi.The filter is only run if there is not yet a valid SMILES or InChI entry in the metadata. The annotation is only added if the PubChem result has a monoisotopic mass close enough to the spectrum’s
parent_mass.- Parameters:
spectrum_in – Input spectrum or spectra collection.
annotated_compound_names_file –
Optional CSV file used as a persistent cache. Any compound name searched on PubChem will be added to this file. If a compound name is already present in the file, the cached annotation is used instead of querying PubChem again.
The CSV file should contain the columns
compound_name,smiles,inchi,inchikey, andmonoisotopic_mass.mass_tolerance – Acceptable mass difference between query compound and PubChem result. Default is
0.1.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with added annotation metadata, or
Noneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.derive_formula_from_name(spectrum_in, remove_formula_from_name: bool = True, *, clone: bool | None = True) dict
Detect and remove misplaced formula in compound name and add to metadata.
Method to find misplaced formulas in compound name based on regular expression. This will not chemically test the detected formula, so the search is limited to frequently occurring types of shape
C47H83N1O8P1.- Parameters:
spectrum_in – Input spectrum or spectra collection.
remove_formula_from_name – Remove found formula from compound name if set to
True. Default isTrue.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with added
formulametadata and optionally cleanedcompound_name, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.derive_formula_from_smiles(spectrum_in, overwrite: bool = True, *, clone: bool | None = True) dict
Add molecular formula metadata derived from SMILES.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
overwrite – If
True, an existingformulaentry will be replaced when the formula derived from SMILES differs from the current value. IfFalse, an existingformulaentry will be kept unchanged. Default isTrue.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with updated
formulametadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.derive_inchi_from_smiles(spectrum_in, *, clone: bool | None = True) dict
Find missing InChI and derive from smiles where possible.
- matchms.filtering.derive_inchikey_from_inchi(spectrum_in, *, clone: bool | None = True) dict
Find missing InChIKey and derive from InChI where possible.
- matchms.filtering.derive_ionmode(spectrum_in, *, clone: bool | None = True) dict
Derive missing ionmode based on charge and/or adduct.
Some input formats, for example MGF files, do not always provide a correct ionmode. This filter reads charge and adduct metadata and uses them to fill in the ionmode where missing.
- matchms.filtering.derive_smiles_from_inchi(spectrum_in, *, clone: bool | None = True) dict
Find missing smiles and derive from InChI where possible.
- matchms.filtering.harmonize_missing_entries(spectrum_in, keys: str | Iterable[str] | None = None, undefined=None, aliases: Iterable | None = None, *, clone: bool | None = True) dict
Replace aliases for missing metadata entries.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
keys – Metadata key or keys to harmonize. If
None, all existing metadata keys are harmonized.undefined – Replacement value for missing entries. Default is
None.aliases – Values that should be interpreted as missing. If
None,ALIASES_FOR_NONEis used.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with harmonized missing metadata entries, or
Noneif input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.harmonize_undefined_inchi(spectrum_in: Spectrum, undefined: str = '', aliases: list[str] = None, clone: bool | None = True) Spectrum | None[source]
Replace all aliases for empty/undefined inchi entries by value of
undefinedargument.
- matchms.filtering.harmonize_undefined_inchikey(spectrum_in: Spectrum, undefined: str = '', aliases: list[str] = None, clone: bool | None = True) Spectrum | None[source]
Replace all aliases for empty/undefined inchikey entries by
undefined.
- matchms.filtering.harmonize_undefined_smiles(spectrum_in: Spectrum, undefined: str = '', aliases: list[str] = None, clone: bool | None = True) Spectrum | None[source]
Replace all aliases for empty/undefined smiles entries by
undefined.
- matchms.filtering.interpret_pepmass(spectrum_in, clone: bool | None = True) Spectrum | None
Reads pepmass field, if present, and adds values to correct fields.
The field
pepmassorPEPMASSis often used to describe the precursor ion. This function interprets the values as(mz, intensity, charge)and stores them inprecursor_mz,precursor_intensity, andcharge.- Parameters:
spectrum_in – Input spectrum.
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with interpreted pepmass metadata, or
Noneif not present.- Return type:
Spectrum or None
- matchms.filtering.make_charge_int(spectrum_in, *, clone: bool | None = True) dict
Convert charge field to integer if possible.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Input object with converted
chargemetadata, orNoneif the input wasNone.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.normalize_intensities(spectrum_in: Spectrum, clone: bool | None = True, scale_to_max: float = 1.0) Spectrum | None
Normalize peak intensities relative to the maximum peak intensity.
Intensities are divided by the maximum intensity of the spectrum and then multiplied by
scale_to_max. By default, this normalizes spectra to unit height, i.e. the most intense peak receives intensity1.0.Peaks with zero intensity are removed. Negative peak intensities are not allowed and raise a
ValueError.- Parameters:
spectrum_in – Input spectrum.
clone – Optionally clone the Spectrum.
scale_to_max – Desired intensity of the most intense peak after normalization. Default is
1.0. For example,scale_to_max=1000.0scales the base peak to intensity 1000.
- Returns:
Spectrum with normalized intensities, or
Noneif input isNone.- Return type:
Spectrum or None
- matchms.filtering.reduce_to_number_of_peaks(spectrum_in: Spectrum, n_required: int = 0, n_max: int = inf, ratio_desired: float | None = None, clone: bool | None = True) Spectrum | None
Lowest intensity peaks will be removed when it has more peaks than desired.
- Parameters:
spectrum_in – Input spectrum.
n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’. Default is 1.
n_max – Maximum number of peaks. Remove peaks if more peaks are found. Default is inf.
ratio_desired – Set desired ratio between maximum number of peaks and parent mass. For spectra without parent mass (e.g. GCMS spectra) this will raise an error when ratio_desired is used. Default is None.
clone – Optionally clone the Spectrum.
- matchms.filtering.remove_noise_below_frequent_intensities(spectrum_in: Spectrum, min_count_of_frequent_intensities: int = 5, noise_level_multiplier: float = 2.0, clone: bool | None = True) Spectrum | None
Removes noise if intensities exactly match frequently
When no noise filtering has been applied to a spectrum, many spectra show repeating intensities. From all intensities that repeat more than min_count_of_frequent_intensities the highest is selected. The noise level is set to this intensity * noise_level_multiplier. All fragments with an intensity below the noise level are removed.
This filter was suggested by Tytus Mak.
- Parameters:
spectrum_in – Input spectrum.
min_count_of_frequent_intensities – Minimum number of repeating intensities.
noise_level_multiplier – From all intensities that repeat more than min_count_of_frequent_intensities the highest is selected. The noise level is set to this intensity * noise_level_multiplier.
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with removed intensities, or None if not present.
- Return type:
Spectrum or None
- matchms.filtering.remove_peaks_around_precursor_mz(spectrum_in: Spectrum, mz_tolerance: float = 17, clone: bool | None = True) Spectrum | None
- Remove peaks that are within mz_tolerance (in Da) of
the precursor mz, excluding the precursor peak.
- Parameters:
spectrum_in – Input spectrum.
mz_tolerance – Tolerance of mz values that are not allowed to lie within the precursor mz. Default is 17 Da.
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with removed peaks, or None if not present.
- Return type:
Spectrum or None
- matchms.filtering.remove_peaks_outside_top_k(spectrum_in: Spectrum, k: int = 6, mz_window: float = 50, clone: bool | None = True) Spectrum | None
- Remove all peaks which are not within mz_window of at least one
of the k highest intensity peaks of the spectrum.
- Parameters:
spectrum_in – Input spectrum.
k – The number of most intense peaks to compare to. Default is 6.
mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with removed peaks, or None if not present.
- Return type:
Spectrum or None
- matchms.filtering.remove_peaks_relative_to_precursor_mz(spectrum_in: Spectrum, offset_to_precursor: float = -1.6, clone: bool | None = True) Spectrum | None
Remove all peaks with m/z values > precursor-m/z + offset_to_precursor.
If offset_to_precursor is negative, this means that all peaks with m/z values greater than (precursor_mz - |offset_to_precursor|). If offset_to_precursor is positive, the precursor_mz peak itself will remain.
- Parameters:
spectrum_in – Input spectrum.
offset_to_precursor – All peaks with mz values > precursor_mz + offset_to_precursor will be removed. Default is -1.6 Da based Flash Entropy article by Li and Fiehn, 2023, Nat. Comm. (see https://www.nature.com/articles/s41592-023-02012-9)
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with removed peaks, or None if not present.
- Return type:
Spectrum or None
- matchms.filtering.remove_profiled_spectra(spectrum_in: Spectrum, mz_window=0.5, clone: bool | None = True) Spectrum | None
Remove profiled spectra
Spectra are removed if within the mz_window of 0.5 of the highest peak at least 2 peaks next to the main peak are of intensity > max_intensity/2.
Reproduced from MZmine. https://github.com/mzmine/mzmine3/blob/master/src/main/java/io/github/mzmine/util/scans/ScanUtils.java#L609
- Parameters:
spectrum_in – Input spectrum.
mz_window – Window of mz values (in Da) that are allowed to lie within the top k peaks. Default is 50 Da.
clone – Optionally clone the Spectrum.
- Returns:
None if the spectrum is likely profile data, else the input spectrum.
- Return type:
Spectrum or None
- matchms.filtering.repair_adduct_and_parent_mass_based_on_smiles(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict
Correct adduct and parent mass based on smiles and precursor_mz.
- matchms.filtering.repair_adduct_based_on_parent_mass(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict
Correct adduct based on parent_mass and precursor_mz.
- matchms.filtering.repair_inchi_inchikey_smiles(spectrum_in, *, clone: bool | None = True) dict[str, str]
Check if inchi, inchikey, and smiles entries seem correct.
Detect and correct if any of those entries clearly belongs into one of the other two fields, for example if an inchikey is found in the inchi field.
- matchms.filtering.repair_not_matching_annotation(spectrum_in: <module 'matchms.Spectrum' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/development/matchms/Spectrum.py'>, clone: bool | None = True) Spectrum | None
Repairs mismatches in a spectrum’s annotations related to SMILES, InChI, and InChIKey.
Given a spectrum, this function ensures that the provided SMILES, InChI, and InChIKey annotations are consistent with one another. If there are discrepancies, they are resolved as follows:
- If the SMILES and InChI do not match:
Both SMILES and InChI are checked against the parent mass.
The annotation that matches the parent mass is retained, and the other is regenerated.
- If the InChIKey does not match the InChI:
A new InChIKey is generated from the InChI and replaces the old one.
Warnings and information logs are generated to track changes and potential issues. For correctness of InChIKey entries, only the first 14 characters are considered.
Parameters:
- spectrum_inSpectrum
The input spectrum containing annotations to be checked and repaired.
- clone:
Optionally clone the Spectrum.
Returns:
- Spectrum
A cloned version of the input spectrum with corrected annotations. If the input spectrum is None, it returns None.
- matchms.filtering.repair_parent_mass_from_smiles(spectrum_in, mass_tolerance: float = 0.1, *, clone: bool | None = True) dict
Set parent mass to match smiles mass if not already close.
- matchms.filtering.repair_parent_mass_is_molar_mass(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict
Change parent mass from molar mass into monoisotopic mass where applicable.
- matchms.filtering.repair_parent_mass_match_smiles_wrapper(spectrum_in: Spectrum, mass_tolerance: float = 0.2, clone: bool | None = True) Spectrum | None
Repair a mismatch between parent mass and smiles mass.
The filter tries several increasingly involved repair steps: first salt removal from SMILES, then correction of molar mass to monoisotopic mass, then adduct/parent mass repair based on SMILES.
- matchms.filtering.repair_smiles_of_salts(spectrum_in, mass_tolerance: float, *, clone: bool | None = True) dict
Repair salt SMILES to match parent mass.
- matchms.filtering.require_compound_name(spectrum_in) bool
Ensure that the compound name is present in the spectrum metadata.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if it contains a compound name, otherwise
None. SpectraCollection input is returned with rows lacking compound names removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_correct_ionmode(spectrum_in, ion_mode_to_keep) bool
Validate that the spectrum ionmode matches the requested ionmode.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
ion_mode_to_keep – Desired ionmode:
"positive","negative", or"both". If"both", spectra are kept when ionmode is either"positive"or"negative".clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if its ionmode matches the requirement, otherwise
None. SpectraCollection input is returned with non-matching rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_correct_ms_level(spectrum_in, required_ms_level: int = 2) bool
Remove spectra where the ms_level does not match the required_ms_level.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
required_ms_level – Required MS level. Default is
2.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if the MS level matches, otherwise
None. SpectraCollection input is returned with non-matching rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_formula(spectrum_in) bool
Ensure that the molecular formula is present and looks valid.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if it contains a valid molecular formula, otherwise
None. SpectraCollection input is returned with rows lacking a valid formula removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_matching_adduct_and_ionmode(spectrum_in) bool
Remove spectra where the adduct and ionmode do not match.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if adduct and ionmode match, otherwise
None. SpectraCollection input is returned with non-matching rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_matching_adduct_precursor_mz_parent_mass(spectrum_in, tolerance=0.1) bool
Check if adduct, precursor m/z, and parent mass match within tolerance.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
tolerance – Absolute tolerance used to compare the given parent mass to the parent mass implied by
precursor_mzandadduct. Default is0.1.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if adduct, precursor m/z, and parent mass match, otherwise
None. SpectraCollection input is returned with non-matching rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_maximum_number_of_peaks(spectrum_in: Spectrum, maximum_number_of_fragments: int = 1000, clone: bool | None = True) Spectrum | None
Spectrum will be removed when it has more peaks than maximum_number_of_fragments.
For single Spectrum import this will return ‘None’ when the number of peaks exceeds the maximum_number_of_fragments. For SpectraCollection import, spectra with more peaks than maximum_number_of_fragments will be removed from the collection.
- Parameters:
spectrum_in – Input spectrum.
maximum_number_of_fragments – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.
clone – Optionally clone the Spectrum.
- Returns:
Untouched Spectrum or ‘None’.
- Return type:
Spectrum or None
- matchms.filtering.require_minimum_number_of_high_peaks(spectrum_in: Spectrum, no_peaks: int = 5, intensity_percent: float = 2.0, clone: bool | None = True) Spectrum | None
Removes spectra if the number of peaks with relative intensity above or equal to intensity_percent is less than no_peaks.
For single Spectrum import this will return ‘None’ when the number of peaks with relative intensity above or equal to intensity_percent is less than no_peaks. For SpectraCollection import, spectra with fewer peaks with relative intensity above or equal to intensity_percent than no_peaks will be removed from the collection.
- Parameters:
spectrum_in – Input spectrum.
no_peaks – Minimum number of peaks allowed to have relative intensity above intensity_percent. Less peaks will return none. Default is 5.
intensity_percent – Minimum relative intensity (as a percentage between 0-100) for peaks that are searched. Default is 2.
clone – Optionally clone the Spectrum.
- Returns:
Untouched Spectrum or ‘None’.
- Return type:
Spectrum or None
- matchms.filtering.require_minimum_number_of_peaks(spectrum_in: Spectrum, n_required: int = 10, ratio_required: float | None = None, clone: bool | None = True) Spectrum | None
Spectrum will be set to None when it has fewer peaks than required.
- Parameters:
spectrum_in – Input spectrum.
n_required – Number of minimum required peaks. Spectra with fewer peaks will be set to ‘None’.
ratio_required – Set desired ratio between minimum number of peaks and parent mass. Default is None.
clone – Optionally clone the Spectrum.
- Returns:
Untouched Spectrum or ‘None’.
- Return type:
Spectrum or None
- matchms.filtering.require_parent_mass_match_smiles(spectrum_in, mass_tolerance) bool
Validate that parent mass matches the mass calculated from SMILES.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
mass_tolerance – Allowed absolute mass difference between
parent_massand the monoisotopic neutral mass calculated fromsmiles.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if
parent_massmatches the SMILES-derived mass, otherwiseNone. SpectraCollection input is returned with non-matching rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_precursor_mz(spectrum_in, minimum_accepted_mz: float | None = 10.0, maximum_mz: float | None = None) bool
Require precursor m/z to be present and within optional bounds.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
minimum_accepted_mz – Minimum accepted precursor m/z. Default is
10.0. UseNoneto disable the lower bound.maximum_mz – Maximum accepted precursor m/z. Default is
None.clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if precursor m/z passes the checks, otherwise
None. SpectraCollection input is returned with failing rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_retention_index(spectrum_in) bool
Require retention index to be present.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if
retention_indexis present, otherwiseNone. SpectraCollection input is returned with rows lacking retention index removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.require_valid_annotation(spectrum_in) bool
Require valid and matching SMILES, InChI, and InChIKey annotations.
- Parameters:
spectrum_in – Input spectrum or spectra collection.
clone – Optionally clone the input before applying the filter. If
False, the input object may be modified in place.
- Returns:
Spectrum input is returned unchanged if annotations are valid and matching, otherwise
None. SpectraCollection input is returned with invalid rows removed.- Return type:
Spectrum, SpectraCollection, or None
- matchms.filtering.select_by_intensity(spectrum_in: Spectrum, intensity_from: float = 0.01, intensity_to: float = 1.0, clone: bool | None = True) Spectrum | None
Keep only peaks within set intensity range (keep if intensity_from >= intensity >= intensity_to). In most cases it is adviced to use
select_by_relative_intensity()function instead.- Parameters:
spectrum_in – Input spectrum.
intensity_from – Set lower threshold for peak intensity. Default is 0.01.
intensity_to – Set upper threshold for peak intensity. Default is 1.0.
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with peaks within the specified intensity range, or None if not present.
- Return type:
Spectrum or None
- matchms.filtering.select_by_mz(spectrum_in: Spectrum, mz_from: float = 0.0, mz_to: float = 1000.0, clone: bool | None = True) Spectrum | None
Keep only peaks between mz_from and mz_to.
Peaks are kept if
mz_from <= m/z <= mz_to.- Parameters:
spectrum_in – Input spectrum.
mz_from – Set lower threshold for m/z peak positions. Default is 0.0.
mz_to – Set upper threshold for m/z peak positions. Default is 1000.0.
clone – Optionally clone the Spectrum.
- matchms.filtering.select_by_relative_intensity(spectrum_in: Spectrum, intensity_from: float = 0.0, intensity_to: float = 1.0, clone: bool | None = True) Spectrum | None
Keep only peaks within set relative intensity range (keep if intensity_from >= intensity >= intensity_to).
- Parameters:
spectrum_in – Input spectrum.
intensity_from – Set lower threshold for relative peak intensity. Default is 0.0.
intensity_to – Set upper threshold for relative peak intensity. Default is 1.0.
clone – Optionally clone the Spectrum.
- Returns:
Spectrum with peaks within the relative intensity range, or None if not present.
- Return type:
Spectrum or None
Subpackages
- matchms.filtering.filter_utils package
- Submodules
- matchms.filtering.filter_utils.derive_precursor_mz_and_parent_mass module
- matchms.filtering.filter_utils.get_neutral_mass_from_smiles module
- matchms.filtering.filter_utils.interpret_unknown_adduct module
- matchms.filtering.filter_utils.load_known_adducts module
- matchms.filtering.filter_utils.metadata_conversions module
- matchms.filtering.filter_utils.smile_inchi_inchikey_conversions module
- Submodules
- matchms.filtering.metadata_processing package
- Submodules
- matchms.filtering.metadata_processing.add_compound_name module
- matchms.filtering.metadata_processing.add_parent_mass module
- matchms.filtering.metadata_processing.add_precursor_formula module
- matchms.filtering.metadata_processing.add_precursor_mz module
- matchms.filtering.metadata_processing.add_retention module
- matchms.filtering.metadata_processing.clean_adduct module
- matchms.filtering.metadata_processing.clean_compound_name module
- matchms.filtering.metadata_processing.correct_charge module
- matchms.filtering.metadata_processing.derive_adduct_from_name module
- matchms.filtering.metadata_processing.derive_annotation_from_compound_name module
- matchms.filtering.metadata_processing.derive_formula_from_name module
- matchms.filtering.metadata_processing.derive_formula_from_smiles module
- matchms.filtering.metadata_processing.derive_inchi_from_smiles module
- matchms.filtering.metadata_processing.derive_inchikey_from_inchi module
- matchms.filtering.metadata_processing.derive_ionmode module
- matchms.filtering.metadata_processing.derive_smiles_from_inchi module
- matchms.filtering.metadata_processing.harmonize_missing_entries module
- matchms.filtering.metadata_processing.harmonize_undefined_inchi module
- matchms.filtering.metadata_processing.harmonize_undefined_inchikey module
- matchms.filtering.metadata_processing.harmonize_undefined_smiles module
- matchms.filtering.metadata_processing.interpret_pepmass module
- matchms.filtering.metadata_processing.make_charge_int module
- matchms.filtering.metadata_processing.repair_adduct_and_parent_mass_based_on_smiles module
- matchms.filtering.metadata_processing.repair_adduct_based_on_parent_mass module
- matchms.filtering.metadata_processing.repair_inchi_inchikey_smiles module
- matchms.filtering.metadata_processing.repair_not_matching_annotation module
- matchms.filtering.metadata_processing.repair_parent_mass_from_smiles module
- matchms.filtering.metadata_processing.repair_parent_mass_is_molar_mass module
- matchms.filtering.metadata_processing.repair_parent_mass_match_smiles_wrapper module
- matchms.filtering.metadata_processing.repair_smiles_of_salts module
- matchms.filtering.metadata_processing.require_compound_name module
- matchms.filtering.metadata_processing.require_correct_ionmode module
- matchms.filtering.metadata_processing.require_correct_ms_level module
- matchms.filtering.metadata_processing.require_formula module
- matchms.filtering.metadata_processing.require_matching_adduct_and_ionmode module
- matchms.filtering.metadata_processing.require_matching_adduct_precursor_mz_parent_mass module
- matchms.filtering.metadata_processing.require_parent_mass_match_smiles module
- matchms.filtering.metadata_processing.require_precursor_mz module
- matchms.filtering.metadata_processing.require_retention_index module
- matchms.filtering.metadata_processing.require_retention_time module
- matchms.filtering.metadata_processing.require_valid_annotation module
- Submodules
- matchms.filtering.peak_processing package
- Submodules
- matchms.filtering.peak_processing.normalize_intensities module
- matchms.filtering.peak_processing.reduce_to_number_of_peaks module
- matchms.filtering.peak_processing.remove_noise_below_frequent_intensities module
- matchms.filtering.peak_processing.remove_peaks_around_precursor_mz module
- matchms.filtering.peak_processing.remove_peaks_outside_top_k module
- matchms.filtering.peak_processing.remove_peaks_relative_to_precursor_mz module
- matchms.filtering.peak_processing.remove_profiled_spectra module
- matchms.filtering.peak_processing.require_maximum_number_of_peaks module
- matchms.filtering.peak_processing.require_minimum_number_of_high_peaks module
- matchms.filtering.peak_processing.require_minimum_number_of_peaks module
- matchms.filtering.peak_processing.select_by_intensity module
- matchms.filtering.peak_processing.select_by_mz module
- matchms.filtering.peak_processing.select_by_relative_intensity module
- Submodules
Submodules
- matchms.filtering.SpeciesString module
SpeciesStringSpeciesString.dirtySpeciesString.targetSpeciesString.cleanedSpeciesString.__init__()SpeciesString.clean()SpeciesString.clean_as_inchi()SpeciesString.clean_as_inchikey()SpeciesString.clean_as_smiles()SpeciesString.guess_target()SpeciesString.looks_like_a_smiles()SpeciesString.looks_like_an_inchi()SpeciesString.looks_like_an_inchikey()
- matchms.filtering.SpectraCollectionProcessor module
- matchms.filtering.SpectrumProcessor module
- matchms.filtering.default_filters module
- matchms.filtering.default_pipelines module
- matchms.filtering.filter_order module