Quantification Methods¶
mokume supports multiple protein quantification methods, each suited to different experimental designs and goals.
Overview¶
| Method | Description | Requires FASTA | Class | Optional Dep |
|---|---|---|---|---|
| iBAQ | Intensity-Based Absolute Quantification | Yes | peptides_to_protein() |
No |
| TopN | Average of N most intense peptides | No | TopNQuantification |
No |
| MaxLFQ | Delayed normalization with parallelization | No | MaxLFQQuantification |
No* |
| DirectLFQ | Intensity traces with hierarchical alignment | No | DirectLFQQuantification |
Yes** |
| Sum | Sum of all peptide intensities | No | AllPeptidesQuantification |
No |
| Ratio | Log2 sample/reference per plex (PS protocol) | No | RatioQuantification |
No |
| Median | Median of peptide intensities | No | Built-in | No |
* MaxLFQ automatically uses DirectLFQ when installed for best accuracy, falling back to built-in implementation otherwise.
** DirectLFQ requires: pip install mokume[directlfq]
Choosing a Method¶
graph TD
A[What type of experiment?] --> B{Label-free?}
A --> C{TMT/iTRAQ?}
B --> D{Need absolute<br/>quantification?}
D -->|Yes| E[iBAQ<br/>requires FASTA]
D -->|No| F{Best accuracy?}
F -->|Yes| G[MaxLFQ or<br/>DirectLFQ]
F -->|Simple| H[TopN or Sum]
C --> I{Multi-plex with<br/>reference channels?}
I -->|Yes| J[Ratio + IRS]
I -->|No| K[Median or<br/>Sum + IRS]
iBAQ¶
Intensity-Based Absolute Quantification divides summed peptide intensities by the number of theoretically observable peptides, enabling comparison of absolute protein amounts across proteins within a sample.
$$\text{iBAQ} = \frac{\sum \text{peptide intensities}}{\text{theoretical peptide count}}$$
iBAQ requires a FASTA file to compute theoretical peptide counts via in-silico digestion.
from mokume.quantification import peptides_to_protein
peptides_to_protein(
fasta="proteome.fasta",
peptides="peptides.csv",
enzyme="Trypsin",
normalize=True,
output="proteins-ibaq.tsv",
)
Additional iBAQ-derived values:
| Value | Formula | Use Case |
|---|---|---|
| IbaqNorm | iBAQ / sum(iBAQ) per sample | Relative comparison |
| IbaqLog | 10 + log10(IbaqNorm) | Visualization |
| TPA | NormIntensity / MW | Total Protein Approach |
| CopyNumber | From ProteomicRuler | Absolute copy numbers |
TopN¶
Averages the N most intense peptides per protein per sample. Top3 is the most common choice (based on the Top3 method by Silva et al.), but any N is supported.
from mokume.quantification import TopNQuantification
top3 = TopNQuantification(n=3)
result = top3.quantify(peptides)
# Also: top5, top10, etc.
top10 = TopNQuantification(n=10)
Tip
Top3 is a good default for label-free experiments when you don't need absolute quantification.
MaxLFQ¶
The MaxLFQ algorithm (Cox et al., 2014) uses delayed normalization with pairwise peptide ratios to estimate protein intensities. It's particularly robust to missing values.
mokume provides two implementations:
- DirectLFQ backend (default when installed) — variance-guided pairwise alignment from the DirectLFQ package
- Built-in fallback — parallelized peptide trace alignment achieving ~0.95 Spearman correlation with DIA-NN's MaxLFQ
from mokume.quantification import MaxLFQQuantification
maxlfq = MaxLFQQuantification(
min_peptides=2, # Min peptides for MaxLFQ (uses median for fewer)
threads=4, # Parallel cores (-1 for all)
)
result = maxlfq.quantify(peptides)
# Check which backend is active
print(maxlfq.using_directlfq) # True/False
print(maxlfq.name) # "MaxLFQ (DirectLFQ)" or "MaxLFQ (built-in)"
DirectLFQ¶
DirectLFQ (Ammar et al., 2023) uses hierarchical normalization with variance-guided pairwise alignment. When used as the quantification method, it handles both normalization and quantification.
Note
When --quant-method directlfq is selected, mokume delegates all processing to the DirectLFQ package. Run and sample normalization settings are ignored.
pip install mokume[directlfq]
mokume features2proteins \
-p features.parquet -o proteins.csv \
--quant-method directlfq
Sum (All Peptides)¶
Simply sums all peptide intensities per protein per sample. The simplest approach, useful as a baseline.
Ratio (PS Protocol)¶
For multi-plex TMT experiments with reference channels, the ratio method computes log2(sample/reference) per PSM per plex, then aggregates to protein level via median.
PSM intensities → average fractions → divide by reference → log2
→ median by peptide → median by protein → wide matrix
This method requires an SDRF file to detect reference samples and plexes.
mokume features2proteins \
-p features.parquet -o proteins.csv -s experiment.sdrf.tsv \
--quant-method ratio \
--coverage-threshold 0.65
Info
Ratio quantification handles cross-plex normalization inherently via per-plex reference division. The --irs flag is ignored for ratio mode.
Standard Output Format¶
All quantification methods produce a standard Intensity column in long format, which the pipeline converts to wide format (proteins x samples) for the final output.