mhctools

Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.

Installation

pip install mhctools

For MHCflurry support, also run:

mhcflurry-downloads fetch

Quick start

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])

# predict() returns a list of PeptideResult — one per peptide
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

for r in results:
    if r.affinity:
        print(f"{r.peptide} -> {r.affinity.allele} IC50={r.affinity.value:.1f}nM")

Data model

predict() returns a list of PeptideResult — one per peptide. Each result carries the peptide string and provides accessors for each prediction kind (affinity, presentation, stability, etc.). Accessors return None when a predictor doesn't produce that kind.

results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]

r.peptide                    # "SIINFEKL"
r.affinity.value             # IC50 in nM
r.affinity.percentile_rank   # 0-100, lower = better
r.affinity.allele            # best allele for this kind
r.presentation               # None if predictor doesn't produce it

Under the hood, each PeptideResult wraps a tuple of Prediction objects — frozen dataclasses, one per allele-kind combination. Everything converts to DataFrames with consistent column names.

Python API

Predicting peptides

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

r = results[0]
r.peptide                      # "SIINFEKL"
r.offset                       # position in source protein (if scanned)
r.kinds                        # {"pMHC_affinity", "pMHC_presentation"}
r.alleles                      # {"HLA-A*02:01", "HLA-B*07:02"}

# best prediction by kind — None when the kind is absent
r.affinity                     # Prediction or None
r.presentation                 # Prediction or None
r.stability                    # None (predictor doesn't produce it)

if r.affinity:
    r.affinity.value            # IC50 in nM
    r.affinity.percentile_rank  # 0-100, lower = better
    r.affinity.score            # ~0-1, higher = better
    r.affinity.allele           # best allele for this kind

# by rank instead of score
r.best_affinity_by_rank        # Prediction with lowest percentile rank, or None

# all predictions
r.preds                        # tuple of all Prediction objects
r.filter(kind="pMHC_affinity")
r.filter(allele="HLA-A*02:01")

NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation predictions per peptide-allele pair.

Scanning proteins

predict_proteins() takes a dictionary of protein sequences and returns {sequence_name: list[PeptideResult]}:

proteins = predictor.predict_proteins(
    {"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
    peptide_lengths=[9, 10],
)

for r in proteins["TP53"]:
    if r.affinity and r.affinity.value < 500:
        print(f"  offset={r.offset} {r.peptide} IC50={r.affinity.value:.0f}")

DataFrames

Every level has a _dataframe variant that flattens to a pandas DataFrame with consistent columns:

df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")

Columns: sample_name, peptide, n_flank, c_flank, source_sequence_name, offset, predictor_name, predictor_version, allele, kind, score, value, percentile_rank.

Multi-sample predictions

MultiSample runs a predictor across multiple samples, each with its own HLA genotype:

from mhctools import MultiSample, NetMHCpan41

ms = MultiSample(
    samples={
        "pat001": ["HLA-A*02:01", "HLA-B*07:02"],
        "pat002": ["HLA-A*01:01", "HLA-B*08:01"],
    },
    predictor_class=NetMHCpan41,
)

# {sample_name: list[PeptideResult]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])

# {sample_name: {seq_name: list[PeptideResult]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})

# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})

Measurement kinds and MHC context

Each Prediction has a kind string describing what it measures:

The canonical prediction kind strings are defined in mhctools.pred.Kind.

Kind	Meaning
`pMHC_affinity`	Peptide-MHC binding affinity
`pMHC_presentation`	Likelihood of surface presentation (EL/processing)
`pMHC_stability`	Peptide-MHC complex stability
`immunogenicity`	T-cell immunogenicity
`antigen_processing`	Combined processing score
`proteasome_cleavage`	Proteasomal cleavage score
`tap_transport`	TAP transport score (reserved, not yet used)
`erap_trimming`	ERAP trimming score (reserved, not yet used)

Predictors also expose kind_support() so downstream code can tell what MHC context is meaningful for each emitted kind:

support = predictor.kind_support()
support["pMHC_affinity"]
# {"mhc_dependence": "single_allele", "mhc_class": "I"}

mhc_dependence is one of:

Value	Meaning
`none`	The prediction is MHC-independent; `Prediction.allele` is empty.
`single_allele`	The prediction is for one peptide/MHC allele pair; `Prediction.allele` is part of the key.
`haplotype`	The prediction uses the requested MHC repertoire jointly; `Prediction.allele` may carry best-allele attribution but is not the prediction key.

mhc_class is one of none, I, II, or both.

The allowed metadata values are defined in mhctools.pred as MHC_DEPENDENCE_VALUES and MHC_CLASS_VALUES.

Examples:

Predictor	Kind	`mhc_dependence`	`mhc_class`
`NetMHCpan41`	`pMHC_affinity`	`single_allele`	`I`
`NetMHCpan41`	`pMHC_presentation`	`single_allele`	`I`
`NetMHCIIpan4_EL`	`pMHC_presentation`	`single_allele`	`II`
`NetMHCstabpan`	`pMHC_stability`	`single_allele`	`I`
`MHCflurry`	`pMHC_affinity`	`single_allele`	`I`
`MHCflurry` haplotype mode	`pMHC_presentation`	`haplotype`	`I`
`MHCflurry` per-allele panel mode	`pMHC_presentation`	`single_allele`	`I`
`Pepsickle`	`proteasome_cleavage`	`none`	`none`

For MHCflurry presentation, presentation_allele_mode="haplotype" treats the requested alleles as one sample genotype and emits one pMHC_presentation record per peptide. The allele field carries MHCflurry's best_allele attribution when available. presentation_allele_mode="per_allele" treats each allele as a separate one-allele synthetic sample and emits one presentation record per peptide/allele pair. The default "auto" mode uses haplotype mode for up to six alleles and per-allele mode for larger allele panels.

The Prediction object

Every prediction is a frozen, self-contained Prediction dataclass:

from mhctools import Prediction

pred = Prediction(
    kind="pMHC_affinity",
    score=0.85,           # ~0-1, higher = better
    peptide="SIINFEKL",
    allele="HLA-A*02:01",
    value=120.5,          # IC50 in nM
    percentile_rank=0.8,
    source_sequence_name="TP53",
    offset=42,
    predictor_name="netMHCpan",
    predictor_version="4.1",
)

score is always higher-is-better. value is in native units (nM for affinity, hours for stability). percentile_rank is always optional, 0-100, lower = stronger.

Supported predictors

MHC binding & presentation

Predictor	Kinds produced	Requires
`NetMHCpan` / `NetMHCpan41` / `NetMHCpan42`	affinity + presentation	NetMHCpan
`NetMHCpan4`	affinity or presentation	NetMHCpan 4.0
`NetMHCpan3` / `NetMHCpan28`	affinity	older NetMHCpan
`NetMHC` / `NetMHC3` / `NetMHC4`	affinity	NetMHC
`NetMHCIIpan` / `NetMHCIIpan43`	affinity or presentation	NetMHCIIpan
`NetMHCcons`	affinity	NetMHCcons
`NetMHCstabpan`	stability	NetMHCstabpan
`MHCflurry`	affinity + presentation	`pip install mhcflurry` + `mhcflurry-downloads fetch`
`MHCflurry_Affinity`	affinity	`pip install mhcflurry` + `mhcflurry-downloads fetch`
`BigMHC`	presentation or immunogenicity	BigMHC clone (set `BIGMHC_DIR`)
`MixMHCpred`	presentation	MixMHCpred
`IedbNetMHCpan` / `IedbSMM` / `IedbNetMHCIIpan`	affinity	IEDB web API
`RandomBindingPredictor`	affinity	(built-in)

Antigen processing

Predictor	Kinds produced	Requires
`Pepsickle`	proteasome cleavage	`pip install pepsickle` (paper)
`NetChop`	proteasome cleavage	NetChop

Processing predictors use configurable scoring to aggregate per-position cleavage probabilities into peptide-level scores. See ProcessingPredictor and ProteasomePredictor for details.

Commandline examples

Prediction for user-supplied peptide sequences

mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201

Automatically extract peptides as subsequences of specified length

mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201

Legacy API

The old predict_peptides() and predict_subsequences() methods still work and return BindingPredictionCollection objects:

predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
    {"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
    peptide_lengths=[9],
)
df = collection.to_dataframe()

for bp in collection:
    if bp.affinity < 100:
        print("Strong binder: %s" % bp)

To convert legacy results to the new types:

preds = collection.to_preds()           # list of Prediction
pp_list = collection.to_peptide_preds() # list of PeptideResult

Name		Name	Last commit message	Last commit date
Latest commit History 543 Commits
.github/workflows		.github/workflows
.hypothesis/unicode_data/13.0.0		.hypothesis/unicode_data/13.0.0
mhctools		mhctools
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASING.md		RELEASING.md
deploy.sh		deploy.sh
develop.sh		develop.sh
lint-and-test.sh		lint-and-test.sh
lint.sh		lint.sh
pylintrc		pylintrc
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mhctools

Installation

Quick start

Data model

Python API

Predicting peptides

Scanning proteins

DataFrames

Multi-sample predictions

Measurement kinds and MHC context

The Prediction object

Supported predictors

MHC binding & presentation

Antigen processing

Commandline examples

Prediction for user-supplied peptide sequences

Automatically extract peptides as subsequences of specified length

Legacy API

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mhctools

Installation

Quick start

Data model

Python API

Predicting peptides

Scanning proteins

DataFrames

Multi-sample predictions

Measurement kinds and MHC context

The Prediction object

Supported predictors

MHC binding & presentation

Antigen processing

Commandline examples

Prediction for user-supplied peptide sequences

Automatically extract peptides as subsequences of specified length

Legacy API

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages