Skip to content

IRS Normalization

Internal Reference Scaling (IRS) is a normalization strategy for multi-plex TMT experiments that use shared reference channels across plexes to make protein intensities comparable.

The Problem

In a multi-plex TMT experiment, each plex is a separate MS run. Protein intensities within a plex are directly comparable, but intensities across plexes are not — they differ due to loading variation, MS sensitivity drift, and other technical factors.

The Solution

IRS uses reference channels (e.g., pooled samples) that are present in every plex as anchors. By scaling each plex so that reference channel intensities match, all samples become comparable.

graph LR
    subgraph Plex 1
        P1S1[Sample 1] --- P1R[Reference]
        P1S2[Sample 2] --- P1R
    end
    subgraph Plex 2
        P2S3[Sample 3] --- P2R[Reference]
        P2S4[Sample 4] --- P2R
    end
    P1R -->|Scale to match| G[Global Reference]
    P2R -->|Scale to match| G

Scaling Formula

For each protein in each plex:

$$\text{adjusted} = \text{raw} \times \frac{\text{global_ref}}{\text{plex_ref}}$$

Where:

  • plex_ref = median (or mean) of reference channel intensities within the plex
  • global_ref = geometric mean of plex_ref values across all plexes

Reference Sample Detection

mokume automatically detects reference samples from SDRF metadata using a priority chain:

Priority Method SDRF Source
1 characteristics[pooled sample] column Values: "pooled" or "SN=sample1;SN=sample2"
2 Explicit sample names --irs-reference-samples CLI option
3 Column + values --irs-sdrf-column + --irs-sdrf-values
4 Regex scan Searches all factor/characteristic columns for pattern

Default regex: pool|powder|ref|reference|bridge

Plex Detection

Plexes are detected from quantms-style source names in the SDRF:

Source Name Detected Plex
p1_1 p1
p1_10 p1
p2_3 p2

The naming convention is {plex}_{channel}, where the plex ID is everything before the last _digit.

Usage

CLI

# Auto-detect references from SDRF
mokume features2proteins \
    -p data.parquet -o proteins.csv -s experiment.sdrf.tsv \
    --quant-method median \
    --irs --irs-remove-reference

# Explicit reference samples
mokume features2proteins \
    -p data.parquet -o proteins.csv -s experiment.sdrf.tsv \
    --irs --irs-reference-samples "p1_11,p2_11"

# Custom regex for reference detection
mokume features2proteins \
    -p data.parquet -o proteins.csv -s experiment.sdrf.tsv \
    --irs --irs-reference-regex "pool|bridge|control"

Python

from mokume.normalization.irs import (
    IRSNormalizer,
    detect_pooled_from_sdrf,
    detect_plexes_from_sdrf,
)

# Detect references and plexes
ref_samples = detect_pooled_from_sdrf("experiment.sdrf.tsv")
sample_to_plex = detect_plexes_from_sdrf("experiment.sdrf.tsv")

# Apply IRS
normalizer = IRSNormalizer(reference_samples=ref_samples, stat="median")
protein_df = normalizer.fit_transform(protein_df, sample_to_plex)

Options

Option Default Description
--irs-stat median Statistic for plex reference: median or mean
--irs-remove-reference False Remove reference samples from final output
--irs-reference-regex pool\|powder\|ref\|reference\|bridge Regex for auto-detection

IRS with Ratio quantification

When using --quant-method ratio, IRS is not applied because ratio quantification already handles cross-plex normalization via per-plex reference division.