Skip to content

Update for plexDIA#81

Merged
ypriverol merged 6 commits intomainfrom
dev
May 4, 2026
Merged

Update for plexDIA#81
ypriverol merged 6 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented May 4, 2026

Summary by CodeRabbit

Release Notes

  • Documentation & Project Metadata

    • Enhanced project discoverability with expanded keywords
    • Updated project URLs (Homepage, Documentation, GitHub, Bug Tracker, PyPI)
  • Bug Fixes

    • DIANN converter now handles Label column as optional
    • Improved label mapping with case-insensitive support for SILAC and MTRAQ formats

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

The PR updates project metadata with additional keywords and revised URLs, modifies unified design format parsing to make Label and LabelType optional columns, and implements flexible label normalization via case-insensitive substring matching for SILAC and MTRAQ formats. The f_table construction now conditionally includes Label only when present and multi-valued.

Changes

Project Metadata Update

Layer / File(s) Summary
Metadata Expansion
pyproject.toml
Keywords expanded to include "big data", "sdrf", "sample-metadata", and "proteomics-pipeline"; [project.urls] rewritten with Homepage, Documentation, GitHub, "Bug Tracker", and PyPI replacing prior custom URL keys.

Unified Design Format Parsing

Layer / File(s) Summary
Column Requirements
quantmsutils/diann/diann2msstats.py
Unified design validation now requires only Filename, Fraction, Sample, Condition, and BioReplicate; Label and LabelType are no longer mandatory.
Label Normalization Logic
quantmsutils/diann/diann2msstats.py
Multiplexing label normalization made conditional on Label column presence with multiple unique values; label mapping now uses case-insensitive substring matching against Label values for SILAC (light/medium/heavyL/M/H) and MTRAQ (0/4/80/4/8).
Fraction Table Construction
quantmsutils/diann/diann2msstats.py
f_table conditionally includes Label column only when Label exists and has multiple values; otherwise omitted from table output.
Test Update
tests/test_commands.py
test_unified_format_validates_sample_consistency expanded to write Label and LabelType columns in the test unified-format file to match updated schema expectations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • daichengxin
  • jpfeuffer

Poem

🐰 Whiskers twitch with metadata delight,
Labels dance conditional in the light,
SILAC hops and MTRAQ sings true,
Optional now, what freedom to pursue!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Update for plexDIA' is vague and does not clearly convey the actual changes made in the pull request, which involve updating project metadata, modifying unified design parsing logic to make Label optional, and updating tests. Revise the title to be more specific about the main changes, such as 'Make Label column optional in DIANN unified design parsing' or 'Update project metadata and support optional label columns in DIANN conversion'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity

Metric Results
Complexity 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@quantmsutils/diann/diann2msstats.py`:
- Around line 185-199: The replacement is case-sensitive because
df["Label"].replace(...) uses original casing while detection uses labels_lower;
update the logic for both silac_dict and mtraq_dict to perform lookups against
labels_lower (which is already lowercased) using lowercase keys and then write
back mapped values while preserving non-matches (e.g., use
labels_lower.map(lowercased_dict).fillna(df["Label"]) or equivalent) so that
variables labels_lower, silac_dict/mtraq_dict, and df["Label"] are used for
case-insensitive replacement and original values are kept when there is no
match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da498a12-8aac-46b4-ab96-cc891245f736

📥 Commits

Reviewing files that changed from the base of the PR and between deff52d and 9fb4d1e.

📒 Files selected for processing (3)
  • pyproject.toml
  • quantmsutils/diann/diann2msstats.py
  • tests/test_commands.py

Comment on lines +185 to +199
if labels_lower.str.contains("silac").any():
silac_dict = {
"SILAC light": "L",
"SILAC medium": "M",
"SILAC heavy": "H",
}
df["Label"] = df["Label"].replace(silac_dict)

if labels_lower.str.contains("mtraq").any():
mtraq_dict = {
"MTRAQ0": "0",
"MTRAQ4": "4",
"MTRAQ8": "8",
}
df["Label"] = df["Label"].replace(mtraq_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for Label column values in test design files to confirm expected casing
rg -n --type tsv "mTRAQ\|MTRAQ\|SILAC\|silac" --glob "*.tsv"
# Also check any usage in Python source that constructs or references these label strings
rg -n "mTRAQ\|MTRAQ\|SILAC light\|silac" --type py

Repository: bigbio/quantms-utils

Length of output: 92


🏁 Script executed:

#!/bin/bash
# Search for actual label values in TSV files (design files)
find . -name "*.tsv" -type f | head -20

Repository: bigbio/quantms-utils

Length of output: 312


🏁 Script executed:

#!/bin/bash
# Search for Label column values in any data files
rg -i "mTRAQ|MTRAQ|SILAC" --glob "*.tsv"

Repository: bigbio/quantms-utils

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Check Python source for label string construction/references
rg "mTRAQ|MTRAQ|SILAC|Label" -A 2 -B 2 --type py quantmsutils/diann/

Repository: bigbio/quantms-utils

Length of output: 4163


🏁 Script executed:

#!/bin/bash
# Check the actual code context around lines 185-199
cat -n quantmsutils/diann/diann2msstats.py | sed -n '175,210p'

Repository: bigbio/quantms-utils

Length of output: 1731


🏁 Script executed:

#!/bin/bash
# Check test design files for Label column content
echo "=== PXD026600_diann_design.tsv ===" 
head -5 ./tests/test_data/diann2msstats/PXD026600_diann_design.tsv | cat -v

echo -e "\n=== PXD026600.sdrf_openms_design.tsv ===" 
head -5 ./tests/test_data/diann2msstats/PXD026600.sdrf_openms_design.tsv | cat -v

echo -e "\n=== Check for any mTRAQ or MTRAQ in test files ===" 
rg -i "label" ./tests/test_data/diann2msstats/*.tsv | head -20

Repository: bigbio/quantms-utils

Length of output: 4533


🏁 Script executed:

#!/bin/bash
# Look at full test design file structure
wc -l ./tests/test_data/diann2msstats/*.tsv

Repository: bigbio/quantms-utils

Length of output: 279


🏁 Script executed:

#!/bin/bash
# Check convert-diann module to see what label values it produces
find . -name "*convert*diann*" -o -name "*diann*convert*" | grep -i py
rg "convert.*diann\|diann.*convert" -l --type py

Repository: bigbio/quantms-utils

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for "convert" and "diann" in module names and imports
find . -type f -name "*.py" | xargs grep -l "convert.*diann\|quantmsdiann" | head -20

Repository: bigbio/quantms-utils

Length of output: 168


🏁 Script executed:

#!/bin/bash
# Search broadly for SILAC and mTRAQ in entire repo
rg -i "silac.*light|mtraq0|mtraq4|mtraq8" -B 2 -A 2

Repository: bigbio/quantms-utils

Length of output: 933


🏁 Script executed:

#!/bin/bash
# Check what modules exist under quantmsutils/
ls -la quantmsutils/

Repository: bigbio/quantms-utils

Length of output: 633


🏁 Script executed:

#!/bin/bash
# Look at the import statements and references in diann2msstats to understand data flow
head -50 quantmsutils/diann/diann2msstats.py

Repository: bigbio/quantms-utils

Length of output: 1577


🏁 Script executed:

#!/bin/bash
# Search for where Label column gets populated in convert-diann or related modules
rg "Label.*=.*mTRAQ|Label.*=.*SILAC|\"mTRAQ|\"MTRAQ|\"SILAC" --type py -B 3 -A 3

Repository: bigbio/quantms-utils

Length of output: 1640


🏁 Script executed:

#!/bin/bash
# Check if there's any test or documentation about multiplexed designs
find . -path ./node_modules -prune -o -name "*.md" -type f -exec grep -l "mTRAQ\|SILAC\|multiplex" {} \;

Repository: bigbio/quantms-utils

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Search for actual usage or test code that calls diann2msstats
grep -r "diann2msstats\|exp_design" tests/ --include="*.py" | head -20

Repository: bigbio/quantms-utils

Length of output: 1636


🏁 Script executed:

#!/bin/bash
# Check dianncfg.py to see how it generates or handles Label values for multiplexed experiments
cat -n quantmsutils/diann/dianncfg.py | grep -A 10 -B 10 "mTRAQ"

Repository: bigbio/quantms-utils

Length of output: 871


🏁 Script executed:

#!/bin/bash
# Search for any conversion or generation of Label column values with mTRAQ/SILAC
rg "Channel.*mTRAQ|mTRAQ.*Channel|Label.*:" quantmsutils/diann/ --type py -B 2 -A 2

Repository: bigbio/quantms-utils

Length of output: 1521


🏁 Script executed:

#!/bin/bash
# Check if there's documentation or comments about expected label format
rg -i "label.*format|mtraq.*label|silac.*label" quantmsutils/ --type py -B 1 -A 1

Repository: bigbio/quantms-utils

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Get more context around the dianncfg.py TODO to understand current multiplex support status
cat -n quantmsutils/diann/dianncfg.py | sed -n '95,125p'

Repository: bigbio/quantms-utils

Length of output: 1338


🏁 Script executed:

#!/bin/bash
# Check if there are any other scripts or workflows that might generate Label columns with mTRAQ/SILAC
find . -name "*.py" -type f -exec grep -l "mTRAQ0\|MTRAQ0\|SILAC light" {} \;

Repository: bigbio/quantms-utils

Length of output: 102


🏁 Script executed:

#!/bin/bash
# Examine the entire get_exp_design_dfs function to understand data flow
rg -A 80 "def get_exp_design_dfs" quantmsutils/diann/diann2msstats.py

Repository: bigbio/quantms-utils

Length of output: 3222


Case-sensitive replacement will silently no-op for mTRAQ labels once multiplex support is enabled.

The multiplexing code (lines 185-199) currently appears unreachable: dianncfg.py (line 106) marks multiplex support as a TODO and explicitly rejects multiplexed experiments (mTRAQ, TMT, iTRAQ, Dimethyl modifications) with an error message. However, the bug is real and will manifest once this feature is implemented.

When enabled, labels_lower holds the lowercased values (for detection), but both df["Label"].replace(silac_dict) and df["Label"].replace(mtraq_dict) match against the original casing of df["Label"].

  • SILAC: keys "SILAC light"/"SILAC medium"/"SILAC heavy" match the SDRF convention exactly — works only when the upstream file uses that exact casing.
  • mTRAQ: keys "MTRAQ0"/"MTRAQ4"/"MTRAQ8" are all-caps, but the standard notation used throughout the codebase (e.g., dianncfg.py) is "mTRAQ" (lowercase m). Once multiplex support is added and design files contain "mTRAQ0", the replacement will silently no-op — detection triggers, replacement fails to match, df["Label"] retains "mTRAQ0", and the downstream merge against DIA-NN's Channel values ("0"/"4"/"8") produces all-NaN rows that are dropped, yielding an empty MSstats output.

Fix: run the replacement on labels_lower (already lowercase) using lowercase dict keys to ensure case-insensitive matching.

🐛 Proposed fix
     labels_lower = df["Label"].astype(str).str.lower()

     if labels_lower.str.contains("silac").any():
         silac_dict = {
-            "SILAC light": "L",
-            "SILAC medium": "M",
-            "SILAC heavy": "H",
+            "silac light": "L",
+            "silac medium": "M",
+            "silac heavy": "H",
         }
-        df["Label"] = df["Label"].replace(silac_dict)
+        df["Label"] = labels_lower.map(silac_dict).fillna(df["Label"])

     if labels_lower.str.contains("mtraq").any():
         mtraq_dict = {
-            "MTRAQ0": "0",
-            "MTRAQ4": "4",
-            "MTRAQ8": "8",
+            "mtraq0": "0",
+            "mtraq4": "4",
+            "mtraq8": "8",
         }
-        df["Label"] = df["Label"].replace(mtraq_dict)
+        df["Label"] = labels_lower.map(mtraq_dict).fillna(df["Label"])

Using labels_lower.map(dict).fillna(df["Label"]) performs case-insensitive lookup while preserving the original value for labels that do not match any key.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@quantmsutils/diann/diann2msstats.py` around lines 185 - 199, The replacement
is case-sensitive because df["Label"].replace(...) uses original casing while
detection uses labels_lower; update the logic for both silac_dict and mtraq_dict
to perform lookups against labels_lower (which is already lowercased) using
lowercase keys and then write back mapped values while preserving non-matches
(e.g., use labels_lower.map(lowercased_dict).fillna(df["Label"]) or equivalent)
so that variables labels_lower, silac_dict/mtraq_dict, and df["Label"] are used
for case-insensitive replacement and original values are kept when there is no
match.

@ypriverol ypriverol merged commit 478d5f1 into main May 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants