matchms.FragmentCollection module

class matchms.FragmentCollection.CSRFragmentCollection(spectra: list[~matchms.Spectrum.Spectrum] | ~collections.abc.Generator[~matchms.Spectrum.Spectrum, None, None] | None=None, *, array: csr_array | None = None, mz_precision: float = 1e-06, mz_rounding: str = 'round', index_dtype: dtype = <class 'numpy.int64'>)[source]

Bases: FragmentCollection

CSR-backed, m/z-grid fragment storage for a spectra dataset.

Stores all fragments of a dataset in a sparse matrix using CSR format:

  • rows correspond to spectra

  • columns correspond to discrete m/z grid positions

  • values correspond to peak intensities

The m/z values of input peaks are converted to integer grid/bin indices using mz_to_bin(). The grid width is controlled by mz_precision.

Parameters:
  • spectra – Spectra used to construct the sparse fragment collection.

  • array – Existing CSR sparse array. Pass either spectra or array, not both.

  • mz_precision – Width of one m/z grid step. For example, 0.01 stores m/z values at two decimal places, while 1e-6 stores values at six decimal places.

  • mz_rounding

    Strategy used to assign m/z values to grid positions. Supported values:

    • "floor": 123.456 with mz_precision=0.01 becomes 123.45.

    • "round": 123.456 with mz_precision=0.01 becomes 123.46.

Notes

This is a discretized representation and is therefore not necessarily lossless. If multiple peaks from the same spectrum map to the same m/z grid position, they are stored at the same sparse matrix coordinate. During sparse matrix construction, duplicate coordinates are combined by summing their intensities.

Smaller mz_precision values preserve m/z differences more closely but can create much larger sparse matrices. Larger values reduce the number of grid positions but increase the chance that neighboring peaks are merged.

__init__(spectra: list[~matchms.Spectrum.Spectrum] | ~collections.abc.Generator[~matchms.Spectrum.Spectrum, None, None] | None=None, *, array: csr_array | None = None, mz_precision: float = 1e-06, mz_rounding: str = 'round', index_dtype: dtype = <class 'numpy.int64'>)[source]
bin_to_mz(bin_idx: ndarray | int) ndarray[source]

Convert integer grid/bin indices back to m/z values.

No bin-center offset is added. Returned m/z values correspond directly to the discretized grid positions.

count(axis: int = 1)[source]

Count nonzero peaks per row or per bin.

count_peaks_above_relative_intensity(intensity_from: float) ndarray[source]

Return number of peaks per row with relative intensity >= intensity_from.

drop(indices: Iterable[int]) FragmentCollection[source]

Return a new collection with selected rows removed.

drop_empty() FragmentCollection[source]

Return a new collection without rows that have no peaks.

filter(mask: ndarray | list[bool]) FragmentCollection[source]

Return a new collection keeping rows where mask is True.

get_row(idx: int) tuple[ndarray, ndarray][source]

Return one spectrum row as (mz, intensities).

iter_peak_arrays()[source]

Yield rows as (mz, intensities) tuples.

keep_top_k_per_row_variable(k_per_row: ndarray, progress_bar: bool = False) FragmentCollection[source]

Keep the top-k highest-intensity peaks per row.

Parameters:
  • k_per_row – One integer value per spectrum row. For each row, only the k highest intensity peaks are retained. Remaining peaks are sorted by m/z/bin position, preserving normal sparse row order.

  • progress_bar – Whether to display a progress bar when processing large collections.

mz_to_bin(mz: ndarray | float) ndarray[source]

Convert m/z values to integer grid/bin indices.

The m/z values are first rounded or floored to the decimal precision specified by mz_precision and then converted to integer indices.

reorder(indices: Iterable[int]) FragmentCollection[source]

Alias for take().

select_by_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]

Return a new collection keeping peaks within an intensity range.

select_by_relative_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]

Return a new collection keeping peaks within a row-wise relative intensity range.

property shape: tuple[int, int]

Return (n_spectra, n_bins).

slice_mz(mz_min: float | None = None, mz_max: float | None = None)[source]

Return a new collection restricted to an m/z window.

Notes

This keeps the global bin coordinate system unchanged. Bins outside the requested m/z range are removed from the data, but the matrix shape and column numbering remain unchanged.

slice_rows(rows) FragmentCollection[source]

Return a row-sliced collection.

take(indices: Iterable[int]) FragmentCollection[source]

Return a new collection with selected rows in the given order.

to_peak_arrays() list[tuple[ndarray, ndarray]][source]

Return all rows as a list of (mz, intensities) tuples.

class matchms.FragmentCollection.FragmentCollection[source]

Bases: ABC

Abstract base class for a collection of spectra fragments.

abstractmethod count_peaks_above_relative_intensity(intensity_from: float) ndarray[source]

Return number of peaks per row with relative intensity >= intensity_from.

abstractmethod get_row(idx: int) tuple[ndarray, ndarray][source]

Return (mz, intensities) for a single row.

abstractmethod keep_top_k_per_row_variable(k_per_row: ndarray) FragmentCollection[source]

Return new collection with only the top-k intensity peaks per row.

abstractmethod select_by_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]

Return new collection with peaks restricted to an intensity range.

abstractmethod select_by_relative_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]

Return new collection with peaks restricted to a row-wise relative intensity range.

abstract property shape: tuple[int, int]

Return (n_spectra, n_bins).

abstractmethod slice_mz(mz_min: float | None = None, mz_max: float | None = None) FragmentCollection[source]

Return new collection with restricted m/z range.

abstractmethod take(indices: Iterable[int]) FragmentCollection[source]

Return new collection with selected rows.