BioCroissant is a sub‑working group of the Croissant initiative with strong ties to the MLCommons Medical working group that is developing benchmarks and best practices. BioCroissant focuses on exploring and extending the Croissant standard for the biomedical domain, strengthening the technological link between AI and biomedical communities.
Its goal is to make medical and life‑science data standards work seamlessly with the AI ecosystem—across ML frameworks, platforms, and Responsible AI tooling.
Built with real‑world clinical, research, and regulatory needs in mind, BioCroissant bridges the gap between ML engineers, healthcare professionals, and policy makers working to build trustworthy health data ecosystems (e.g. academic health data consortia or international initiatives such as the European Health Data Space, EHDS).
Modern biomedical AI depends on high‑quality, interoperable data—yet many datasets today remain siloed, inconsistently documented, or difficult to reuse.
BioCroissant provides:
- A simple, structured metadata schema for biomedical datasets
- Built‑in FAIR alignment (Findable, Accessible, Interoperable, Reusable)
- Compatibility with national and cross‑border health data initiatives
- Seamless integration with ML workflows through clean, machine‑readable metadata
- Governance‑friendly transparency, supporting reproducibility, auditability, and clinical trust
Whether you are building models, conducting clinical research, or shaping data policy, BioCroissant helps ensure your data is ready for science, ready for AI, and ready for regulation.
- Standardized, machine‑readable metadata
- Faster onboarding of new datasets
- Clear documentation of modalities, provenance, consent, and known limitations
- Improved dataset documentation across clinical and research settings
- Reduced ambiguity when collaborating across institutions
- A shared language for describing multimodal biomedical data
- A practical tool aligned with FAIR principles and health data space policies
- Increased trustworthiness and auditability of AI training data
- A foundation for responsible, transparent biomedical AI ecosystems
Create a global, open, and interoperable foundation for biomedical AI—one that accelerates scientific discovery while upholding privacy, ethics, and clinical rigor.
BioCroissant is community‑driven, open‑source, and designed to evolve with the ecosystem.
You can help shape both the standard and its tooling.
We welcome contributions of all kinds, including:
- Schema extensions and refinements
- Validator implementations
- Dataset examples
- Tooling (parsers, generators, linters)
- Documentation improvements
- Policy and governance feedback
- Open an issue or discussion around your use case
- Submit a pull request
- Join MLCommons: https://mlcommons.org/user-profile-settings-entrypoint/
- Register for the BioCroissant group: https://groups.google.com/a/mlcommons.org/g/croissant-bio
- Join community calls (announced via the BioCroissant group)
Let’s make biomedical data FAIR, transparent, and AI‑ready—for everyone.