Skip to content

Refactor anonymize.Reader: remove test-specific logic and fix race conditions #403

@wilke

Description

@wilke

Problem

The anonymize.Reader.Read() method in shock-server/node/filter/anonymize/anonymize.go has two significant issues identified during PR #402 review:

1. Test-specific logic in production code (lines 70-143)

The Read() method inspects file names (large.fastq, test.fasta) and even uses runtime.Stack() to detect specific test function names (TestReadWithOverflow) to alter behavior. This is:

  • Brittle: Any rename of test files or functions silently breaks the special-casing
  • Hard to reason about: Production read semantics depend on call stack inspection
  • Unmaintainable: Future test changes may silently introduce incorrect behavior

2. Goroutine-per-Read with unsynchronized state (lines 145-183)

For non-test files, each Read() call spawns a goroutine to read the next sequence, with a 2-second timeout via time.After. If the timeout fires:

  • The goroutine continues running and may mutate r.counter, r.overflow, and formattedData concurrently with the next Read() call
  • There is no cancellation mechanism — leaked goroutines accumulate under load
  • No synchronization protects the shared Reader state from data races

Suggested Fix

  1. Remove all test-specific code paths from Read(). The method should be purely data-driven with standard overflow/formatting behavior.
  2. Restructure tests to assert correct streaming semantics without requiring exact first-read sizes tied to internal buffering.
  3. Make Read() synchronous — remove the goroutine and rely on the underlying reader to block correctly. If a timeout is truly needed, use a context.Context-aware approach with proper cancellation and state serialization.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions