matchms.filtering.SpectraCollectionProcessor module
- class matchms.filtering.SpectraCollectionProcessor.SpectraCollectionProcessor(filters: Iterable[str | Callable | tuple[Callable | str, dict[str, object]]])[source]
Bases:
objectProcess a SpectraCollection using a series of filters.
This is the SpectraCollection equivalent of SpectrumProcessor, but it applies each filter to the full collection instead of processing spectra one by one.
- Parameters:
filters –
A list of filter functions. Allowed formats are the same as for SpectrumProcessor:
str
(str, dict)
Callable
(Callable, dict)
Examples
Create a SpectraCollection and process it with collection-compatible filters:
import numpy as np from matchms import Spectrum, SpectraCollection from matchms.filtering import SpectraCollectionProcessor spectra = [ Spectrum( mz=np.array([100.0, 150.0, 200.0]), intensities=np.array([5.0, 50.0, 500.0]), metadata={"smiles": "n/a", "compound_name": "example"}, ), Spectrum( mz=np.array([110.0, 160.0, 210.0]), intensities=np.array([10.0, 100.0, 1000.0]), metadata={"smiles": "CCCO", "compound_name": "other"}, ), ] collection = SpectraCollection(spectra) processor = SpectraCollectionProcessor( filters=[ "harmonize_missing_entries", ( "select_by_relative_intensity", {"intensity_from": 0.01, "intensity_to": 1.0}, ), ] ) processed = processor.process_collection(collection) assert isinstance(processed, SpectraCollection)
The same processor can also create a SpectraCollection from an iterable of Spectrum objects:
processed = processor.process_spectra(spectra)
- parse_and_add_filter(filter_description: str | Callable | tuple[Callable | str, dict[str, object]], filter_position: int | None = None)[source]
Add a filter by parsing the allowed filter description formats.
- process_collection(collection: SpectraCollection) SpectraCollection | None[source]
Process a SpectraCollection with all filters in the pipeline.
- Parameters:
collection – SpectraCollection to process.
- Returns:
The processed collection. If a filter removes all spectra and returns
None, processing stops andNoneis returned.- Return type:
SpectraCollection or None
- process_spectra(spectra, cleaned_spectra_file=None) SpectraCollection | None[source]
Process spectra as a SpectraCollection.
- Parameters:
spectra – Either a SpectraCollection or an iterable of Spectrum objects.
cleaned_spectra_file – Optional output path. The processed collection is materialized as Spectrum objects for saving.
- Returns:
Processed collection.
- Return type:
SpectraCollection or None