Skip to content

mlcommons/BioCroissant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🥐 BioCroissant — Metadata Sharing for (Bio)Medical AI

BioCroissant is a sub‑working group of the Croissant initiative with strong ties to the MLCommons Medical working group that is developing benchmarks and best practices. BioCroissant focuses on exploring and extending the Croissant standard for the biomedical domain, strengthening the technological link between AI and biomedical communities.

Its goal is to make medical and life‑science data standards work seamlessly with the AI ecosystem—across ML frameworks, platforms, and Responsible AI tooling.

Built with real‑world clinical, research, and regulatory needs in mind, BioCroissant bridges the gap between ML engineers, healthcare professionals, and policy makers working to build trustworthy health data ecosystems (e.g. academic health data consortia or international initiatives such as the European Health Data Space, EHDS).


🚀 Why BioCroissant?

Modern biomedical AI depends on high‑quality, interoperable data—yet many datasets today remain siloed, inconsistently documented, or difficult to reuse.

BioCroissant provides:

  • A simple, structured metadata schema for biomedical datasets
  • Built‑in FAIR alignment (Findable, Accessible, Interoperable, Reusable)
  • Compatibility with national and cross‑border health data initiatives
  • Seamless integration with ML workflows through clean, machine‑readable metadata
  • Governance‑friendly transparency, supporting reproducibility, auditability, and clinical trust

Whether you are building models, conducting clinical research, or shaping data policy, BioCroissant helps ensure your data is ready for science, ready for AI, and ready for regulation.


Who Is This For?

ML Developers & Data Scientists

  • Standardized, machine‑readable metadata
  • Faster onboarding of new datasets
  • Clear documentation of modalities, provenance, consent, and known limitations

Healthcare & Life Science Professionals

  • Improved dataset documentation across clinical and research settings
  • Reduced ambiguity when collaborating across institutions
  • A shared language for describing multimodal biomedical data

Policy Makers & Regulators

  • A practical tool aligned with FAIR principles and health data space policies
  • Increased trustworthiness and auditability of AI training data
  • A foundation for responsible, transparent biomedical AI ecosystems

🎯 Our Mission

Create a global, open, and interoperable foundation for biomedical AI—one that accelerates scientific discovery while upholding privacy, ethics, and clinical rigor.

BioCroissant is community‑driven, open‑source, and designed to evolve with the ecosystem.
You can help shape both the standard and its tooling.


Contributing

We welcome contributions of all kinds, including:

  • Schema extensions and refinements
  • Validator implementations
  • Dataset examples
  • Tooling (parsers, generators, linters)
  • Documentation improvements
  • Policy and governance feedback

Get Involved

  • Open an issue or discussion around your use case
  • Submit a pull request

Get Involved even more

Let’s make biomedical data FAIR, transparent, and AI‑ready—for everyone.

Releases

No releases published

Packages

 
 
 

Contributors

Generated from mlcommons/template