IRS Normalization¶

Internal Reference Scaling (IRS) is a normalization strategy for multi-plex TMT experiments that use shared reference channels across plexes to make protein intensities comparable.

The Problem¶

In a multi-plex TMT experiment, each plex is a separate MS run. Protein intensities within a plex are directly comparable, but intensities across plexes are not — they differ due to loading variation, MS sensitivity drift, and other technical factors.

The Solution¶

IRS uses reference channels (e.g., pooled samples) that are present in every plex as anchors. By scaling each plex so that reference channel intensities match, all samples become comparable.

graph LR
    subgraph Plex 1
        P1S1[Sample 1] --- P1R[Reference]
        P1S2[Sample 2] --- P1R
    end
    subgraph Plex 2
        P2S3[Sample 3] --- P2R[Reference]
        P2S4[Sample 4] --- P2R
    end
    P1R -->|Scale to match| G[Global Reference]
    P2R -->|Scale to match| G

Scaling Formula¶

For each protein in each plex:

$$\text{adjusted} = \text{raw} \times \frac{\text{global_ref}}{\text{plex_ref}}$$

Where:

plex_ref = median (or mean) of reference channel intensities within the plex
global_ref = geometric mean of plex_ref values across all plexes

Reference Sample Detection¶

mokume automatically detects reference samples from SDRF metadata using a priority chain:

Priority	Method	SDRF Source
1	`characteristics[pooled sample]` column	Values: `"pooled"` or `"SN=sample1;SN=sample2"`
2	Explicit sample names	`--irs-reference-samples` CLI option
3	Column + values	`--irs-sdrf-column` + `--irs-sdrf-values`
4	Regex scan	Searches all factor/characteristic columns for pattern

Default regex: pool|powder|ref|reference|bridge

Plex Detection¶

Plexes are detected from quantms-style source names in the SDRF:

Source Name	Detected Plex
`p1_1`	`p1`
`p1_10`	`p1`
`p2_3`	`p2`

The naming convention is {plex}_{channel}, where the plex ID is everything before the last _digit.

Usage¶

CLI¶

# Auto-detect references from SDRF
mokume features2proteins \
    -p data.parquet -o proteins.csv -s experiment.sdrf.tsv \
    --quant-method median \
    --irs --irs-remove-reference

# Explicit reference samples
mokume features2proteins \
    -p data.parquet -o proteins.csv -s experiment.sdrf.tsv \
    --irs --irs-reference-samples "p1_11,p2_11"

# Custom regex for reference detection
mokume features2proteins \
    -p data.parquet -o proteins.csv -s experiment.sdrf.tsv \
    --irs --irs-reference-regex "pool|bridge|control"

Python¶

from mokume.normalization.irs import (
    IRSNormalizer,
    detect_pooled_from_sdrf,
    detect_plexes_from_sdrf,
)

# Detect references and plexes
ref_samples = detect_pooled_from_sdrf("experiment.sdrf.tsv")
sample_to_plex = detect_plexes_from_sdrf("experiment.sdrf.tsv")

# Apply IRS
normalizer = IRSNormalizer(reference_samples=ref_samples, stat="median")
protein_df = normalizer.fit_transform(protein_df, sample_to_plex)

Options¶

Option	Default	Description
`--irs-stat`	`median`	Statistic for plex reference: `median` or `mean`
`--irs-remove-reference`	`False`	Remove reference samples from final output
`--irs-reference-regex`	`pool\\|powder\\|ref\\|reference\\|bridge`	Regex for auto-detection

IRS with Ratio quantification

When using --quant-method ratio, IRS is not applied because ratio quantification already handles cross-plex normalization via per-plex reference division.