fuzzymatch

Fuzzy name matching for Go services — string similarity, suppression, zero runtime dependencies.

⚠ Status

This library is pre-release. The API is not yet stable and may change without notice until the v1.0.0 tag. Do not use in production until the first stable release ships.

See docs/requirements.md for the authoritative specification of what this library will do.

What this is

A pure-Go library detecting pairs of similar names in a collection. Fuzzy matching for "these two probably mean the same thing" cases that humans miss when authoring schemas, taxonomies, configuration vocabularies, API field sets, database column lists, environment variable names, CLI flag sets, and any other structured naming domain.

The library is domain-agnostic. It knows about strings, weights, and thresholds — not about YAML, taxonomies, or any specific format. Consumers translate their own data into the library's generic types and process the warnings in whatever way fits their domain.

Module path: github.com/axonops/fuzzymatch License: Apache-2.0 Go version: 1.26.3 minimum Runtime dependencies: stdlib + a single curated dep (golang.org/x/text for Unicode normalisation). No other runtime deps. No cgo.

Key features

Twenty-three string-similarity algorithms across five categories: character-based, q-gram, token-based, phonetic, gestalt.
Fresh implementations from primary academic sources. Every algorithm cites its originating paper inline; no GPL/LGPL-derived code; no patent-encumbered algorithms (Metaphone 3 is explicitly excluded).
Weighted composite Scorer for mixing algorithms with caller-controlled weights and threshold (Phase 8).
Collection-scan sub-package for one-shot deduplication passes (Phase 9).
Cross-platform deterministic output — verified byte-identical across linux/amd64, linux/arm64, darwin/amd64, darwin/arm64, windows/amd64 via golden-file tests.
Pure-function library. No goroutines, no channels, no I/O, no config files, no background work.
Property-tested and fuzz-tested. Mathematical invariants (symmetry, identity, range bounds, triangle inequality where applicable) verified via testing/quick; every public function has a native Go fuzzer.
Apache-2.0 throughout. Compatible with the BSD-3-Clause licence of RE2 and the various MIT-licensed prior-art Go implementations consulted for reference vectors only.

Why this library exists

Existing Go fuzzy-matching libraries fall into three buckets:

Single-algorithm packages (adrg/strutil, hbollon/go-edlib, xrash/smetrics) — useful but require the consumer to assemble a multi-algorithm strategy by hand.
Python ports (fuzzywuzzy / rapidfuzz ports) — token-style only, no phonetic algorithms, often MIT-derived which the consumer must vet.
Heavyweight ML matchers — depend on embedding models, slow, hard to verify, hard to audit.

fuzzymatch is the missing fourth bucket: a curated, audit-friendly catalogue of pure-Go algorithms tied together by a weighted Scorer and a turnkey scan layer. Everything is determinism-first, allocation-budgeted, and licence-clean.

Three layers

Layer 1: Algorithm functions      LevenshteinScore(a, b)            ─┐
Layer 2: Scorer                   NewScorer().Score(a, b)            │  Same library,
Layer 3: Scan sub-package         fuzzymatch/scan.Check(items, cfg) ─┘  three depths.

Consumers pick the layer that matches their question:

"How similar are these two strings?" → Layer 1 (one algorithm function).
"How similar are these two strings overall?" → Layer 2 (weighted composite via Scorer).
"Which pairs in this collection are similar?" → Layer 3 (scan.Check).

Quick start

Note: Phase 1 (foundation) ships Normalise and Tokenise primitives plus the AlgoID enum and sentinel errors. Algorithm functions (e.g. LevenshteinScore) land in Phase 2. The example below uses the Phase-1 primitives; the full algorithm-driven quick start is added with Phase 2.

package main

import (
    "fmt"

    "github.com/axonops/fuzzymatch"
)

func main() {
    opts := fuzzymatch.DefaultNormalisationOptions()
    opts.StripDiacritics = true

    fmt.Println(fuzzymatch.Normalise("UserCreate-Event", opts))
    // Output: user create event

    fmt.Println(fuzzymatch.Tokenise("XMLHttpRequest", fuzzymatch.DefaultTokeniseOptions()))
    // Output: [xmlhttp request]
}

Algorithm catalogue

Twenty-three algorithms in five categories. Every entry has its primary academic source cited in docs/algorithms.md and in the implementation file once it lands (Phase 2+).

Character-based (9)

Algorithm	`AlgoID`	Primary source	Detail
Levenshtein	`AlgoLevenshtein`	Levenshtein 1965	docs/algorithms.md#levenshtein
Damerau-Levenshtein (OSA)	`AlgoDamerauLevenshteinOSA`	Boytsov 2011; Damerau 1964	docs/algorithms.md#damerau-levenshtein-osa
Damerau-Levenshtein (Full)	`AlgoDamerauLevenshteinFull`	Lowrance & Wagner 1975	docs/algorithms.md#damerau-levenshtein-full
Hamming	`AlgoHamming`	Hamming 1950	docs/algorithms.md#hamming
Jaro	`AlgoJaro`	Jaro 1989	docs/algorithms.md#jaro
Jaro-Winkler	`AlgoJaroWinkler`	Winkler 1990	docs/algorithms.md#jaro-winkler
Strcmp95	`AlgoStrcmp95`	Winkler 1994; U.S. Census 1995	docs/algorithms.md#strcmp95
Smith-Waterman-Gotoh	`AlgoSmithWatermanGotoh`	Smith & Waterman 1981; Gotoh 1982	docs/algorithms.md#smith-waterman-gotoh
LCSStr	`AlgoLCSStr`	Wagner & Fischer 1974	docs/algorithms.md#lcsstr

Q-gram / n-gram (4)

Algorithm	`AlgoID`	Primary source	Detail
Q-Gram Jaccard	`AlgoQGramJaccard`	Ukkonen 1992; Jaccard 1912	docs/algorithms.md#qgramjaccard
Sørensen-Dice	`AlgoSorensenDice`	Sørensen 1948; Dice 1945	docs/algorithms.md#sorensendice
Cosine (n-gram)	`AlgoCosine`	Salton & McGill 1983	docs/algorithms.md#cosine
Tversky	`AlgoTversky`	Tversky 1977	docs/algorithms.md#tversky

Token-based (5)

Algorithm	`AlgoID`	Primary source	Detail
Monge-Elkan	`AlgoMongeElkan`	Monge & Elkan 1996	docs/algorithms.md#mongeelkan
Token Sort Ratio	`AlgoTokenSortRatio`	SeatGeek fuzzywuzzy / RapidFuzz	docs/algorithms.md#tokensortratio
Token Set Ratio	`AlgoTokenSetRatio`	SeatGeek fuzzywuzzy / RapidFuzz	docs/algorithms.md#tokensetratio
Partial Ratio	`AlgoPartialRatio`	SeatGeek fuzzywuzzy / RapidFuzz	docs/algorithms.md#partialratio
Token Jaccard	`AlgoTokenJaccard`	Jaccard 1912	docs/algorithms.md#tokenjaccard

Phonetic (4)

Algorithm	`AlgoID`	Primary source	Detail
Soundex	`AlgoSoundex`	Russell 1918; Knuth 1973	docs/algorithms.md#soundex
Double Metaphone	`AlgoDoubleMetaphone`	Philips 2000	docs/algorithms.md#doublemetaphone
NYSIIS	`AlgoNYSIIS`	Taft 1970	docs/algorithms.md#nysiis
MRA	`AlgoMRA`	Moore et al. 1977 (NBS TN 943)	docs/algorithms.md#mra

Gestalt (1)

Algorithm	`AlgoID`	Primary source	Detail
Ratcliff-Obershelp	`AlgoRatcliffObershelp`	Ratcliff & Metzener 1988	docs/algorithms.md#ratcliffobershelp

Metaphone 3 is explicitly NOT included due to U.S. Patent 7,440,941. See docs/faq.md for the full patent screen rationale.

Configuration

The Phase-1 primitives expose two option structs. Both are passed by value, both have a Default…Options() constructor, and both are immutable inputs (callers building variant configurations construct fresh values).

// Normalisation with strict ASCII casing + diacritic stripping for "café → cafe".
opts := fuzzymatch.DefaultNormalisationOptions()
opts.StripDiacritics = true
n := fuzzymatch.Normalise("Café Müller", opts)
// n == "cafe muller"

// Tokenisation with default split rules (camelCase, snake_case, kebab-case, dot-case).
tokens := fuzzymatch.Tokenise("User-CreateEvent.v2", fuzzymatch.DefaultTokeniseOptions())
// tokens == []string{"user", "create", "event", "v2"}

NormalisationOptions fields: Lowercase, StripSeparators, SeparatorChars, SplitCamelCase, NFC, StripDiacritics. TokeniseOptions fields: Lowercase, SplitCamelCase, SplitConsecutiveUpper, SeparatorChars.

The Scorer (Phase 8) accepts a NormalisationOptions value at construction time and applies it before every algorithm invocation.

See docs/tuning.md for guidance on calibrating algorithm weights and thresholds against a domain corpus, and docs/scorer.md for the Scorer API once Phase 8 lands.

Thread safety

Every public function in the root package is pure: no shared mutable state, no goroutines, no channels, no mutexes. Concurrent callers may invoke Normalise, Tokenise, and (from Phase 2) every algorithm score function from any number of goroutines without coordination.

The Scorer (Phase 8) is immutable after construction. A constructed Scorer is safe for concurrent use; callers wanting a different configuration construct a fresh Scorer.

The scan sub-package (Phase 9) follows the same discipline: a constructed scan.Config is immutable; scan.Check is safe for concurrent invocation on disjoint inputs.

API reference

The canonical API reference lives on pkg.go.dev: pkg.go.dev/github.com/axonops/fuzzymatch.

Every exported type, function, method, and constant carries a godoc comment that begins with the symbol name. Algorithm implementation files cite their primary academic source inline at the top of the file.

Documentation

docs/requirements.md — the authoritative spec for what this library does.
docs/algorithms.md — algorithm-by-algorithm reference (per-algorithm detail fills in as each phase lands).
docs/scorer.md — Scorer configuration and tuning (Phase 8).
docs/scan.md — scan sub-package consumer guide (Phase 9).
docs/tuning.md — threshold tuning and calibration.
docs/extending.md — adding a custom algorithm.
docs/performance.md — benchmark numbers and optimisation notes.
docs/faq.md — common questions, exclusions, and rationale.

🤖 For AI Assistants

This repository ships llms.txt (concise index) and llms-full.txt (full API reference + algorithm citations) at the repo root. AI assistants and code generators should consult these first.

The contents are verified in sync with the public Go API by ai_friendly_test.go, which parses every exported root-package symbol via go/ast and asserts each appears in llms.txt. Drift fails CI.

This project is built with GSD for spec-driven development. Domain-specific review agents in .claude/agents/ gate every change. See .claude/skills/fuzzymatch-review-protocol/SKILL.md for the review protocol.

Contributing

Pre-release. External contributions welcome once v1.0.0 ships. Until then, please file issues for discussion rather than PRs. See CONTRIBUTING.md for the local development setup, conventional-commit rules, the algorithm deprecation policy, and the release-via-CI-only discipline.

The repo's review gates are documented in .claude/skills/fuzzymatch-review-protocol/SKILL.md: every change passes through algorithm-licensing, algorithm-correctness, algorithm-performance, determinism, api-ergonomics, code-reviewer, and security-reviewer agents as applicable.

Security

Vulnerabilities go to the private channel documented in SECURITY.md. Do NOT open public issues for security reports. Release signatures are verifiable via cosign — see SECURITY.md for the verification command.

License

Apache-2.0. See LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 357 Commits
.github		.github
.planning		.planning
docs		docs
examples		examples
scripts		scripts
testdata		testdata
tests/bdd		tests/bdd
.commitlintrc.yml		.commitlintrc.yml
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
.markdownlint-cli2.yaml		.markdownlint-cli2.yaml
BOOTSTRAP.md		BOOTSTRAP.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
REVIEW-FINDINGS.md		REVIEW-FINDINGS.md
SECURITY.md		SECURITY.md
ai_friendly_test.go		ai_friendly_test.go
algoid.go		algoid.go
algoid_test.go		algoid_test.go
algorithms_golden_test.go		algorithms_golden_test.go
bench.txt		bench.txt
cosine.go		cosine.go
cosine_bench_test.go		cosine_bench_test.go
cosine_fuzz_test.go		cosine_fuzz_test.go
cosine_test.go		cosine_test.go
cross_algorithm_consistency_test.go		cross_algorithm_consistency_test.go
damerau_full.go		damerau_full.go
damerau_full_bench_test.go		damerau_full_bench_test.go
damerau_full_discriminator_test.go		damerau_full_discriminator_test.go
damerau_full_fuzz_test.go		damerau_full_fuzz_test.go
damerau_full_test.go		damerau_full_test.go
damerau_osa.go		damerau_osa.go
damerau_osa_bench_test.go		damerau_osa_bench_test.go
damerau_osa_discriminator_test.go		damerau_osa_discriminator_test.go
damerau_osa_fuzz_test.go		damerau_osa_fuzz_test.go
damerau_osa_test.go		damerau_osa_test.go
dispatch_cosine.go		dispatch_cosine.go
dispatch_damerau_full.go		dispatch_damerau_full.go
dispatch_damerau_osa.go		dispatch_damerau_osa.go
dispatch_double_metaphone.go		dispatch_double_metaphone.go
dispatch_hamming.go		dispatch_hamming.go
dispatch_jaro.go		dispatch_jaro.go
dispatch_jarowinkler.go		dispatch_jarowinkler.go
dispatch_lcsstr.go		dispatch_lcsstr.go
dispatch_levenshtein.go		dispatch_levenshtein.go
dispatch_monge_elkan.go		dispatch_monge_elkan.go
dispatch_mra.go		dispatch_mra.go
dispatch_nysiis.go		dispatch_nysiis.go
dispatch_partial_ratio.go		dispatch_partial_ratio.go
dispatch_qgram_jaccard.go		dispatch_qgram_jaccard.go
dispatch_ratcliff_obershelp.go		dispatch_ratcliff_obershelp.go
dispatch_sorensen_dice.go		dispatch_sorensen_dice.go
dispatch_soundex.go		dispatch_soundex.go
dispatch_strcmp95.go		dispatch_strcmp95.go
dispatch_swg.go		dispatch_swg.go
dispatch_token_jaccard.go		dispatch_token_jaccard.go
dispatch_token_set_ratio.go		dispatch_token_set_ratio.go
dispatch_token_sort_ratio.go		dispatch_token_sort_ratio.go
dispatch_tversky.go		dispatch_tversky.go
doc.go		doc.go
double_metaphone.go		double_metaphone.go
double_metaphone_bench_test.go		double_metaphone_bench_test.go
double_metaphone_fuzz_test.go		double_metaphone_fuzz_test.go
double_metaphone_test.go		double_metaphone_test.go
errors.go		errors.go
errors_test.go		errors_test.go
example_test.go		example_test.go
export_test.go		export_test.go
go.mod		go.mod
go.sum		go.sum
golden_canonical.go		golden_canonical.go
golden_canonical_test.go		golden_canonical_test.go
golden_test.go		golden_test.go
gsd-agent-skills.json		gsd-agent-skills.json
hamming.go		hamming.go
hamming_bench_test.go		hamming_bench_test.go
hamming_fuzz_test.go		hamming_fuzz_test.go
hamming_test.go		hamming_test.go
jaro.go		jaro.go
jaro_bench_test.go		jaro_bench_test.go
jaro_fuzz_test.go		jaro_fuzz_test.go
jaro_test.go		jaro_test.go
jarowinkler.go		jarowinkler.go
jarowinkler_bench_test.go		jarowinkler_bench_test.go
jarowinkler_fuzz_test.go		jarowinkler_fuzz_test.go
jarowinkler_test.go		jarowinkler_test.go
lcsstr.go		lcsstr.go
lcsstr_bench_test.go		lcsstr_bench_test.go
lcsstr_fuzz_test.go		lcsstr_fuzz_test.go
lcsstr_test.go		lcsstr_test.go
levenshtein.go		levenshtein.go
levenshtein_bench_test.go		levenshtein_bench_test.go
levenshtein_fuzz_test.go		levenshtein_fuzz_test.go
levenshtein_test.go		levenshtein_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fuzzymatch

Table of contents

⚠ Status

What this is

Key features

Why this library exists

Three layers

Quick start

Algorithm catalogue

Character-based (9)

Q-gram / n-gram (4)

Token-based (5)

Phonetic (4)

Gestalt (1)

Configuration

Thread safety

API reference

Documentation

🤖 For AI Assistants

Contributing

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fuzzymatch

Table of contents

⚠ Status

What this is

Key features

Why this library exists

Three layers

Quick start

Algorithm catalogue

Character-based (9)

Q-gram / n-gram (4)

Token-based (5)

Phonetic (4)

Gestalt (1)

Configuration

Thread safety

API reference

Documentation

🤖 For AI Assistants

Contributing

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages