tissuemap: Tissue Proteome Atlas¶
The tissuemap command builds a per-dataset tissue proteome atlas from QPX parquet outputs. It is intended for atlas-style tissue exploration and tissue-specificity scoring, not for standard protein quantification from a single experiment.
When to Use TissueMap¶
Use tissuemap when you want to:
- build a tissue atlas from one dataset or a directory of datasets
- compute AdaTiSS-based tissue-specificity scores
- generate PCA/t-SNE embeddings and tissue-level marker plots
- save AnnData outputs for downstream exploration
If your goal is standard LFQ or TMT protein quantification, start with features2proteins instead.
Expected Input Layout¶
--scan-dir can point to either:
- a single dataset directory containing
qpx_output/ - a parent directory containing multiple dataset directories, each with
qpx_output/
If a dataset is TMT and auto-detection is not sufficient, specify it explicitly with repeated --tmt-dataset flags.
Quick Start¶
# Generate a default YAML template
mokume tissuemap --generate-config tissuemap.yaml
# Run a single dataset directory
mokume tissuemap \
--scan-dir QPX_data/tissues-mq/PXD016999 \
--output-dir ./results
# Run a parent directory containing multiple datasets
mokume tissuemap \
--scan-dir QPX_data/tissues-mq \
--tmt-dataset PXD016999 \
--output-dir ./results \
--n-jobs 8
# Run with a custom YAML configuration
mokume tissuemap \
--scan-dir QPX_data/tissues-mq \
--config tissuemap.yaml \
--output-dir ./results
What the Pipeline Does¶
The current TissueMap workflow is organized around per-dataset processing:
- Discover dataset directories from
--scan-dir - Load QPX-derived protein matrices and metadata
- Apply log2 + median normalization
- Harmonize tissue labels
- Filter proteins with excessive missingness or contaminants
- Apply batch correction
- Compute AdaTiSS tissue-specificity scores
- Build PCA + t-SNE embeddings
- Generate atlas, marker, and tissue-specificity plots
- Save AnnData and CSV outputs
Main CLI Options¶
| Option | Default | Description |
|---|---|---|
--scan-dir |
required unless --generate-config |
Dataset directory or parent directory containing datasets |
--output-dir |
tissuemap_output |
Output directory for all results |
--config |
none | YAML configuration file |
--generate-config |
none | Write a default YAML template and exit |
--tmt-dataset |
auto | Mark one or more dataset IDs as TMT |
--n-jobs |
8 |
Threads for dataset processing and embedding |
--dpi |
250 |
Plot resolution override |
Configuration Workflow¶
Use --generate-config when you want to tune the pipeline before running it repeatedly:
The generated YAML exposes the main configuration groups:
input— dataset discovery, TMT overrides, minimum tissue sample settingsfiltering— NaN threshold and contaminant filteringtissue_specificity— AdaTiSS thresholds and scoring controlsembedding— PCA/t-SNE parametersplotting— DPI, PDF export, marker plot controlsoutput— output directory
CLI values such as --scan-dir, --output-dir, --n-jobs, and --dpi override the YAML file.
Output Files¶
Each processed dataset gets its own output directory.
| File | Description |
|---|---|
<ds_id>.corrected.h5ad |
Batch-corrected sample-level AnnData with embeddings and metadata |
<ds_id>.ts_scores.h5ad |
Tissue-specificity score matrix as AnnData |
protein_ts_scores.csv |
Per-protein tissue-specificity scores and enrichment categories |
plots/ |
PCA scree plot, atlas/dendrogram, marker plots, and TS distribution plots |
Python API¶
from pathlib import Path
from mokume.tissuemap.config import InputConfig, OutputConfig, TissueMapConfig, load_config
from mokume.tissuemap.pipeline import TissueMapPipeline
# Programmatic configuration
config = TissueMapConfig(
n_jobs=8,
input=InputConfig(scan_dir=Path("QPX_data/tissues-mq/PXD016999")),
output=OutputConfig(output_dir=Path("./results")),
)
TissueMapPipeline(config).run()
# Load from YAML and override selected fields
config = load_config(
Path("tissuemap.yaml"),
overrides={
"input.scan_dir": "QPX_data/tissues-mq",
"output.output_dir": "./results",
"n_jobs": 8,
},
)
TissueMapPipeline(config).run()
Practical Tips¶
- Use
features2proteinsfor standard quantification workflows; usetissuemapfor atlas-style tissue analysis. - Start with the generated YAML template if you expect to rerun multiple datasets.
- Use repeated
--tmt-datasetflags when a dataset should be treated as TMT explicitly. - Keep the default plotting enabled for the first run so you can inspect atlas quality and marker behavior.