Skip to content

Add audio property to VideoDecoder#1442

Open
Samoed wants to merge 1 commit into
meta-pytorch:mainfrom
Samoed:decodeer
Open

Add audio property to VideoDecoder#1442
Samoed wants to merge 1 commit into
meta-pytorch:mainfrom
Samoed:decodeer

Conversation

@Samoed
Copy link
Copy Markdown
Contributor

@Samoed Samoed commented May 21, 2026

Previously, accessing audio from a video file required creating a separate AudioDecoder alongside VideoDecoder #1158. This split made it difficult to integrate with external libraries such as huggingface datasets utilities that expect audio to be accessible directly from the video decoder - the absence of an audio attribute on VideoDecoder either required workarounds or left audio support missing entirely in those integrations.

This PR adds a lazy audio property to VideoDecoder that returns an AudioDecoder for audio stream in the same source, or None if the source contains no audio stream

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1442

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 21, 2026
@NicolasHug
Copy link
Copy Markdown
Contributor

Hi @Samoed , can you share more about this:

This split made it difficult to integrate with external libraries such as huggingface/datasets#8007 utilities that expect audio to be accessible directly from the video decoder - the absence of an audio attribute on VideoDecoder either required workarounds or left audio support missing entirely in those integrations

Specifically I'd like to understand why having two seprate VideoDecoder and AudioDecoder objects lead to friction, and how does the proposed design solve it

@Samoed
Copy link
Copy Markdown
Contributor Author

Samoed commented May 21, 2026

The Datasets library outputs video inputs as VideoDecoder, and to be able to easily extract audio information it typically splits inputs into video and audio columns before uploading. Extracting audio input from such videos requires some hacks. Also, I think it would be good to be able to work with one object during data processing to handle both audio and video

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants