Skip to content

fix(search): parse tika xmpDM:duration as a float#2638

Merged
rhafer merged 2 commits intomainfrom
fix/search-tika-duration-parse
Apr 21, 2026
Merged

fix(search): parse tika xmpDM:duration as a float#2638
rhafer merged 2 commits intomainfrom
fix/search-tika-duration-parse

Conversation

@dschmidt
Copy link
Copy Markdown
Contributor

@dschmidt dschmidt commented Apr 20, 2026

Description

Parse the xmpDM:duration metadata value Tika returns with strconv.ParseFloat instead of strconv.ParseInt, and convert the result to milliseconds when calling audio.SetDuration.

Related Issue

No tracking issue yet — happy to open one if preferred.

Motivation and Context

Tika emits xmpDM:duration as seconds in floating-point form (e.g. "154.57379150390625"). strconv.ParseInt rejects the decimal separator, returns an error, and the whole block is skipped — so every audio item indexed through the Tika extractor ended up without a duration, silently.

Symptom in the Graph API: audio facets on driveItem search hits never included duration, even for MP3 files Tika parses happily.

How Has This Been Tested?

  • test environment: local dev stack (OpenCloud + Tika 3.3.0 on localhost:9998, bleve search engine), opencloud search index --all-spaces --force-rescan after the patch (separate fix PR upcoming)
  • go test ./services/search/pkg/content/... — the existing extractor spec now exercises the fractional case ("225.5"225500 ms); all 12 specs pass
  • OGG/FLAC files remain without duration (Tika doesn't emit xmpDM:duration for them — that's a separate upstream limitation, not this PR's concern)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Technical debt
  • Tests only (no source changes)

Checklist:

  • Code changes
  • Unit tests added
  • Acceptance tests added
  • Documentation added

Tika emits xmpDM:duration as seconds in floating-point form (for
example "154.57379150390625"), so strconv.ParseInt rejected every
value and the field was silently dropped — every indexed audio item
ended up without a duration.

Parse the value with strconv.ParseFloat and convert to milliseconds
ourselves. Adjust the existing extractor test to cover the fractional
case.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes audio duration extraction for items indexed via the Tika content extractor by correctly parsing xmpDM:duration as a floating-point seconds value and converting it into milliseconds for the LibreGraph audio facet.

Changes:

  • Parse xmpDM:duration with strconv.ParseFloat instead of strconv.ParseInt.
  • Convert duration from seconds (possibly fractional) to milliseconds when setting audio.Duration.
  • Update the existing Tika extractor spec to cover fractional duration values ("225.5"225500 ms).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
services/search/pkg/content/tika.go Switches duration parsing to float seconds and converts to ms before calling audio.SetDuration.
services/search/pkg/content/tika_test.go Updates the audio metadata test fixture and assertion to validate fractional-second duration parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/search/pkg/content/tika.go Outdated
Address review feedback: a straight int64 cast truncates toward zero,
so Tika values that produce results like 1234.999... millisecond would
land at 1234 ms instead of 1235 ms. Round before casting so durations
are as accurate as float64 allows.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@dschmidt
Copy link
Copy Markdown
Contributor Author

Thanks for the review!

I'm not allowed to merge, feel free to merge whenever you like :)

@rhafer rhafer merged commit 2fc33d6 into main Apr 21, 2026
59 of 60 checks passed
@rhafer rhafer deleted the fix/search-tika-duration-parse branch April 21, 2026 13:17
@openclouders openclouders mentioned this pull request Apr 21, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants