matchms.FragmentCollection module
- class matchms.FragmentCollection.CSRFragmentCollection(spectra: list[~matchms.Spectrum.Spectrum] | ~collections.abc.Generator[~matchms.Spectrum.Spectrum, None, None] | None=None, *, array: csr_array | None = None, mz_precision: float = 1e-06, mz_rounding: str = 'round', index_dtype: dtype = <class 'numpy.int64'>)[source]
Bases:
FragmentCollectionCSR-backed, m/z-grid fragment storage for a spectra dataset.
Stores all fragments of a dataset in a sparse matrix using CSR format:
rows correspond to spectra
columns correspond to discrete m/z grid positions
values correspond to peak intensities
The m/z values of input peaks are converted to integer grid/bin indices using
mz_to_bin(). The grid width is controlled bymz_precision.- Parameters:
spectra – Spectra used to construct the sparse fragment collection.
array – Existing CSR sparse array. Pass either
spectraorarray, not both.mz_precision – Width of one m/z grid step. For example,
0.01stores m/z values at two decimal places, while1e-6stores values at six decimal places.mz_rounding –
Strategy used to assign m/z values to grid positions. Supported values:
"floor":123.456withmz_precision=0.01becomes123.45."round":123.456withmz_precision=0.01becomes123.46.
Notes
This is a discretized representation and is therefore not necessarily lossless. If multiple peaks from the same spectrum map to the same m/z grid position, they are stored at the same sparse matrix coordinate. During sparse matrix construction, duplicate coordinates are combined by summing their intensities.
Smaller
mz_precisionvalues preserve m/z differences more closely but can create much larger sparse matrices. Larger values reduce the number of grid positions but increase the chance that neighboring peaks are merged.- __init__(spectra: list[~matchms.Spectrum.Spectrum] | ~collections.abc.Generator[~matchms.Spectrum.Spectrum, None, None] | None=None, *, array: csr_array | None = None, mz_precision: float = 1e-06, mz_rounding: str = 'round', index_dtype: dtype = <class 'numpy.int64'>)[source]
- bin_to_mz(bin_idx: ndarray | int) ndarray[source]
Convert integer grid/bin indices back to m/z values.
No bin-center offset is added. Returned m/z values correspond directly to the discretized grid positions.
- count_peaks_above_relative_intensity(intensity_from: float) ndarray[source]
Return number of peaks per row with relative intensity >= intensity_from.
- drop(indices: Iterable[int]) FragmentCollection[source]
Return a new collection with selected rows removed.
- filter(mask: ndarray | list[bool]) FragmentCollection[source]
Return a new collection keeping rows where mask is True.
- keep_top_k_per_row_variable(k_per_row: ndarray, progress_bar: bool = False) FragmentCollection[source]
Keep the top-k highest-intensity peaks per row.
- Parameters:
k_per_row – One integer value per spectrum row. For each row, only the k highest intensity peaks are retained. Remaining peaks are sorted by m/z/bin position, preserving normal sparse row order.
progress_bar – Whether to display a progress bar when processing large collections.
- mz_to_bin(mz: ndarray | float) ndarray[source]
Convert m/z values to integer grid/bin indices.
The m/z values are first rounded or floored to the decimal precision specified by
mz_precisionand then converted to integer indices.
- select_by_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]
Return a new collection keeping peaks within an intensity range.
- select_by_relative_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]
Return a new collection keeping peaks within a row-wise relative intensity range.
- slice_mz(mz_min: float | None = None, mz_max: float | None = None)[source]
Return a new collection restricted to an m/z window.
Notes
This keeps the global bin coordinate system unchanged. Bins outside the requested m/z range are removed from the data, but the matrix shape and column numbering remain unchanged.
- class matchms.FragmentCollection.FragmentCollection[source]
Bases:
ABCAbstract base class for a collection of spectra fragments.
- abstractmethod count_peaks_above_relative_intensity(intensity_from: float) ndarray[source]
Return number of peaks per row with relative intensity >= intensity_from.
- abstractmethod get_row(idx: int) tuple[ndarray, ndarray][source]
Return (mz, intensities) for a single row.
- abstractmethod keep_top_k_per_row_variable(k_per_row: ndarray) FragmentCollection[source]
Return new collection with only the top-k intensity peaks per row.
- abstractmethod select_by_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]
Return new collection with peaks restricted to an intensity range.
- abstractmethod select_by_relative_intensity(intensity_from: float = 0.0, intensity_to: float = 1.0) FragmentCollection[source]
Return new collection with peaks restricted to a row-wise relative intensity range.
- abstractmethod slice_mz(mz_min: float | None = None, mz_max: float | None = None) FragmentCollection[source]
Return new collection with restricted m/z range.
- abstractmethod take(indices: Iterable[int]) FragmentCollection[source]
Return new collection with selected rows.