matchms.filtering.SpectraCollectionProcessor module

class matchms.filtering.SpectraCollectionProcessor.SpectraCollectionProcessor(filters: Iterable[str | Callable | tuple[Callable | str, dict[str, object]]])[source]

Bases: object

Process a SpectraCollection using a series of filters.

This is the SpectraCollection equivalent of SpectrumProcessor, but it applies each filter to the full collection instead of processing spectra one by one.

Parameters:

filters

A list of filter functions. Allowed formats are the same as for SpectrumProcessor:

  • str

  • (str, dict)

  • Callable

  • (Callable, dict)

Examples

Create a SpectraCollection and process it with collection-compatible filters:

import numpy as np

from matchms import Spectrum, SpectraCollection
from matchms.filtering import SpectraCollectionProcessor

spectra = [
    Spectrum(
        mz=np.array([100.0, 150.0, 200.0]),
        intensities=np.array([5.0, 50.0, 500.0]),
        metadata={"smiles": "n/a", "compound_name": "example"},
    ),
    Spectrum(
        mz=np.array([110.0, 160.0, 210.0]),
        intensities=np.array([10.0, 100.0, 1000.0]),
        metadata={"smiles": "CCCO", "compound_name": "other"},
    ),
]

collection = SpectraCollection(spectra)

processor = SpectraCollectionProcessor(
    filters=[
        "harmonize_missing_entries",
        (
            "select_by_relative_intensity",
            {"intensity_from": 0.01, "intensity_to": 1.0},
        ),
    ]
)

processed = processor.process_collection(collection)

assert isinstance(processed, SpectraCollection)

The same processor can also create a SpectraCollection from an iterable of Spectrum objects:

processed = processor.process_spectra(spectra)
__init__(filters: Iterable[str | Callable | tuple[Callable | str, dict[str, object]]])[source]
parse_and_add_filter(filter_description: str | Callable | tuple[Callable | str, dict[str, object]], filter_position: int | None = None)[source]

Add a filter by parsing the allowed filter description formats.

process_collection(collection: SpectraCollection) SpectraCollection | None[source]

Process a SpectraCollection with all filters in the pipeline.

Parameters:

collection – SpectraCollection to process.

Returns:

The processed collection. If a filter removes all spectra and returns None, processing stops and None is returned.

Return type:

SpectraCollection or None

process_spectra(spectra, cleaned_spectra_file=None) SpectraCollection | None[source]

Process spectra as a SpectraCollection.

Parameters:
  • spectra – Either a SpectraCollection or an iterable of Spectrum objects.

  • cleaned_spectra_file – Optional output path. The processed collection is materialized as Spectrum objects for saving.

Returns:

Processed collection.

Return type:

SpectraCollection or None