-
Notifications
You must be signed in to change notification settings - Fork 8
PBF Format Description
The PBF (PNNL Binary Format) file format was created to optimize data access for feature-finding and database searching.
Many mass spectrometry data formats (particularly those used for proteomics) focus on scans - each scan has metadata and peak data. This has worked well in the past, and tools are written to handle the data as such. The main limitation we encountered was the speed at which we could read peak data for an eXtracted Ion Chromatogram, which is heavily used in the feature-finding and database search libraries within Informed-Proteomics. With all of those existing formats, we have to check every scan in the desired range for peaks within the tolerance, requiring extra computational time and lots of little reads from the hard drive to get the desired eXtracted Ion Chromatogram. The PNNL Binary Format was designed to provide the needed scan data, as well as fast access to eXtracted Ion Chromatograms and fast searching for scans by precursor m/z, MSLevel and elution time.
The PBF file format has the following basic structure:
- Scan Information
- Scan metadata
- Scan peaks
- m/z and intensity pairs
- MS1 Full XIC
- All peaks from all MS1 scans, sorted by m/z ascending
- Data written: m/z, intensity, and scan number
- MSn Full XIC
- All peaks from all MSn scans, sorted by m/z ascending
- Data written: m/z, intensity, and scan number
- File metadata, for fast loading of a PBF file
- Minimum and maximum scan numbers
- Scan numbers, MSLevels, Elution times, and binary offsets for all scans, with Isolation Window information for MS2 scans
- MS1 Full XIC metadata - binary offsets for binned m/z values, used for quickly finding (in the MS1 Full XIC) peaks within a set tolerance of an m/z value
- Checksum of the original spectrum file
- Some format information about the original spectrum file (for outputting correct information to mzid files)
- Binary offsets for the MS1 and MSn full XICs, and for the file metadata section
- PBF file format version string