Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.
pip install mhctoolsFor MHCflurry support, also run:
mhcflurry-downloads fetchfrom mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
# predict() returns a list of PeptideResult — one per peptide
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
for r in results:
if r.affinity:
print(f"{r.peptide} -> {r.affinity.allele} IC50={r.affinity.value:.1f}nM")predict() returns a list of PeptideResult — one per peptide. Each
result carries the peptide string and provides accessors for each
prediction kind (affinity, presentation, stability, etc.). Accessors
return None when a predictor doesn't produce that kind.
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]
r.peptide # "SIINFEKL"
r.affinity.value # IC50 in nM
r.affinity.percentile_rank # 0-100, lower = better
r.affinity.allele # best allele for this kind
r.presentation # None if predictor doesn't produce itUnder the hood, each PeptideResult wraps a tuple of Prediction objects —
frozen dataclasses, one per allele-kind combination. Everything converts
to DataFrames with consistent column names.
from mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]
r.peptide # "SIINFEKL"
r.offset # position in source protein (if scanned)
r.kinds # {"pMHC_affinity", "pMHC_presentation"}
r.alleles # {"HLA-A*02:01", "HLA-B*07:02"}
# best prediction by kind — None when the kind is absent
r.affinity # Prediction or None
r.presentation # Prediction or None
r.stability # None (predictor doesn't produce it)
if r.affinity:
r.affinity.value # IC50 in nM
r.affinity.percentile_rank # 0-100, lower = better
r.affinity.score # ~0-1, higher = better
r.affinity.allele # best allele for this kind
# by rank instead of score
r.best_affinity_by_rank # Prediction with lowest percentile rank, or None
# all predictions
r.preds # tuple of all Prediction objects
r.filter(kind="pMHC_affinity")
r.filter(allele="HLA-A*02:01")NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation
predictions per peptide-allele pair.
predict_proteins() takes a dictionary of protein sequences and returns
{sequence_name: list[PeptideResult]}:
proteins = predictor.predict_proteins(
{"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
peptide_lengths=[9, 10],
)
for r in proteins["TP53"]:
if r.affinity and r.affinity.value < 500:
print(f" offset={r.offset} {r.peptide} IC50={r.affinity.value:.0f}")Every level has a _dataframe variant that flattens to a pandas DataFrame
with consistent columns:
df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")Columns: sample_name, peptide, n_flank, c_flank,
source_sequence_name, offset, predictor_name, predictor_version,
allele, kind, score, value, percentile_rank.
MultiSample runs a predictor across multiple samples, each with its own
HLA genotype:
from mhctools import MultiSample, NetMHCpan41
ms = MultiSample(
samples={
"pat001": ["HLA-A*02:01", "HLA-B*07:02"],
"pat002": ["HLA-A*01:01", "HLA-B*08:01"],
},
predictor_class=NetMHCpan41,
)
# {sample_name: list[PeptideResult]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])
# {sample_name: {seq_name: list[PeptideResult]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})
# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})Each Prediction has a kind string describing what it measures:
The canonical prediction kind strings are defined in mhctools.pred.Kind.
| Kind | Meaning |
|---|---|
pMHC_affinity |
Peptide-MHC binding affinity |
pMHC_presentation |
Likelihood of surface presentation (EL/processing) |
pMHC_stability |
Peptide-MHC complex stability |
immunogenicity |
T-cell immunogenicity |
antigen_processing |
Combined processing score |
proteasome_cleavage |
Proteasomal cleavage score |
tap_transport |
TAP transport score (reserved, not yet used) |
erap_trimming |
ERAP trimming score (reserved, not yet used) |
Predictors also expose kind_support() so downstream code can tell what MHC
context is meaningful for each emitted kind:
support = predictor.kind_support()
support["pMHC_affinity"]
# {"mhc_dependence": "single_allele", "mhc_class": "I"}mhc_dependence is one of:
| Value | Meaning |
|---|---|
none |
The prediction is MHC-independent; Prediction.allele is empty. |
single_allele |
The prediction is for one peptide/MHC allele pair; Prediction.allele is part of the key. |
haplotype |
The prediction uses the requested MHC repertoire jointly; Prediction.allele may carry best-allele attribution but is not the prediction key. |
mhc_class is one of none, I, II, or both.
The allowed metadata values are defined in mhctools.pred as
MHC_DEPENDENCE_VALUES and MHC_CLASS_VALUES.
Examples:
| Predictor | Kind | mhc_dependence |
mhc_class |
|---|---|---|---|
NetMHCpan41 |
pMHC_affinity |
single_allele |
I |
NetMHCpan41 |
pMHC_presentation |
single_allele |
I |
NetMHCIIpan4_EL |
pMHC_presentation |
single_allele |
II |
NetMHCstabpan |
pMHC_stability |
single_allele |
I |
MHCflurry |
pMHC_affinity |
single_allele |
I |
MHCflurry haplotype mode |
pMHC_presentation |
haplotype |
I |
MHCflurry per-allele panel mode |
pMHC_presentation |
single_allele |
I |
Pepsickle |
proteasome_cleavage |
none |
none |
For MHCflurry presentation, presentation_allele_mode="haplotype" treats the
requested alleles as one sample genotype and emits one pMHC_presentation
record per peptide. The allele field carries MHCflurry's best_allele
attribution when available. presentation_allele_mode="per_allele" treats each
allele as a separate one-allele synthetic sample and emits one presentation
record per peptide/allele pair. The default "auto" mode uses haplotype mode
for up to six alleles and per-allele mode for larger allele panels.
Every prediction is a frozen, self-contained Prediction dataclass:
from mhctools import Prediction
pred = Prediction(
kind="pMHC_affinity",
score=0.85, # ~0-1, higher = better
peptide="SIINFEKL",
allele="HLA-A*02:01",
value=120.5, # IC50 in nM
percentile_rank=0.8,
source_sequence_name="TP53",
offset=42,
predictor_name="netMHCpan",
predictor_version="4.1",
)score is always higher-is-better. value is in native units (nM for
affinity, hours for stability). percentile_rank is always optional,
0-100, lower = stronger.
| Predictor | Kinds produced | Requires |
|---|---|---|
NetMHCpan / NetMHCpan41 / NetMHCpan42 |
affinity + presentation | NetMHCpan |
NetMHCpan4 |
affinity or presentation | NetMHCpan 4.0 |
NetMHCpan3 / NetMHCpan28 |
affinity | older NetMHCpan |
NetMHC / NetMHC3 / NetMHC4 |
affinity | NetMHC |
NetMHCIIpan / NetMHCIIpan43 |
affinity or presentation | NetMHCIIpan |
NetMHCcons |
affinity | NetMHCcons |
NetMHCstabpan |
stability | NetMHCstabpan |
MHCflurry |
affinity + presentation | pip install mhcflurry + mhcflurry-downloads fetch |
MHCflurry_Affinity |
affinity | pip install mhcflurry + mhcflurry-downloads fetch |
BigMHC |
presentation or immunogenicity | BigMHC clone (set BIGMHC_DIR) |
MixMHCpred |
presentation | MixMHCpred |
IedbNetMHCpan / IedbSMM / IedbNetMHCIIpan |
affinity | IEDB web API |
RandomBindingPredictor |
affinity | (built-in) |
| Predictor | Kinds produced | Requires |
|---|---|---|
Pepsickle |
proteasome cleavage | pip install pepsickle (paper) |
NetChop |
proteasome cleavage | NetChop |
Processing predictors use configurable scoring to aggregate per-position
cleavage probabilities into peptide-level scores. See ProcessingPredictor
and ProteasomePredictor for details.
mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201The old predict_peptides() and predict_subsequences() methods still work
and return BindingPredictionCollection objects:
predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
{"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
peptide_lengths=[9],
)
df = collection.to_dataframe()
for bp in collection:
if bp.affinity < 100:
print("Strong binder: %s" % bp)To convert legacy results to the new types:
preds = collection.to_preds() # list of Prediction
pp_list = collection.to_peptide_preds() # list of PeptideResult