PhonemeConverter

PhonemeConverter is a C++ library for converting pronunciation strings into phoneme sequences and marking phoneme onsets. It provides simple built-in converters, file-driven converters, JSON rule-based onset marking, and LuaJIT-backed scripting hooks for custom behavior.

Features

Split space-separated pronunciation strings into phoneme sequences.
Convert phonemes through tab-separated mapping files.
Convert whole pronunciation keys through tab-separated dictionary files.
Mark phoneme onsets with JSON pattern rules.
Implement custom conversion and onset marking logic with Lua scripts.
Build as a shared library by default, or as a static library with a CMake option.

Requirements

CMake 3.17 or newer
A C++ compiler with C++20 support for the library build
PkgConfig
LuaJIT
nlohmann/json
stdcorelib
GoogleTest, only when building tests

Syllable-to-Phoneme Conversion

Direct Conversion

DirectS2P splits a pronunciation string by spaces.

#include <PhonemeConverter/DirectS2P.h>

#include <string>
#include <vector>

std::vector<std::string> phonemes = PhonemeConverter::DirectS2P::convert("AA BB CC");
// {"AA", "BB", "CC"}

Mapping Conversion

MappingS2P splits a pronunciation string by spaces and applies a tab-separated phoneme mapping. Known phonemes are replaced, while unknown phonemes are kept unchanged.

#include <PhonemeConverter/MappingS2P.h>

#include <sstream>
#include <string>
#include <vector>

std::istringstream mappingFile("AA\ta\nBB\tb\n");
PhonemeConverter::MappingS2P converter(mappingFile);

std::vector<std::string> phonemes = converter.convert("AA CC BB");
// {"a", "CC", "b"}

Mapping file format:

AA	a
BB	b

Each non-empty line must contain exactly one tab, with a non-empty source phoneme and a non-empty target phoneme. Duplicate source phonemes are rejected.

Dictionary Conversion

DictionaryS2P maps a whole pronunciation key to a configured phoneme sequence. Unknown keys convert to an empty sequence.

#include <PhonemeConverter/DictionaryS2P.h>

#include <sstream>
#include <string>
#include <vector>

std::istringstream dictionaryFile("hello\tHH AH L OW\nnihao\tn i h ao\n");
PhonemeConverter::DictionaryS2P converter(dictionaryFile);

std::vector<std::string> phonemes = converter.convert("hello");
// {"HH", "AH", "L", "OW"}

Dictionary file format:

hello	HH AH L OW
nihao	n i h ao

Each non-empty line must contain exactly one tab, with a non-empty key and a space-separated phoneme sequence. Duplicate keys and empty phonemes inside a sequence are rejected.

Lua Conversion

LuaS2P executes a Lua function named s2p. The function receives the pronunciation string and must return a table of strings.

function s2p(pronunciation)
    local result = {}
    for phoneme in string.gmatch(pronunciation, "[^ ]+") do
        result[#result + 1] = string.lower(phoneme)
    end
    return result
end

#include <PhonemeConverter/LuaS2P.h>
#include <PhonemeConverter/LuaScript.h>

#include <string>
#include <vector>

PhonemeConverter::LuaScript script(/* Lua script */, "s2p-script");

PhonemeConverter::LuaS2P converter(script);
std::vector<std::string> phonemes = converter.convert("AA BB CC");
// {"aa", "bb", "cc"}

Onset Marking

Rule-Based Onset Marking

RuleOnsetMarker loads JSON rule definitions. A definition contains phoneme type assignments and ordered pattern rules.

{
    "phonemeTypes": {
        "ae": "vowel",
        "ah": "vowel",
        "ey": "vowel",
        "ow": "vowel",
        "b": "consonant",
        "f": "consonant",
        "k": "consonant",
        "l": "liquid",
        "r": "liquid",
        "y": "liquid"
    },
    "rules": [
        { "pattern": ["vowel"], "onsets": [0] },
        { "pattern": ["consonant", "liquid", "vowel"], "onsets": [1] },
        { "pattern": ["liquid", "liquid", "vowel"], "onsets": [1] }
    ]
}

#include <PhonemeConverter/RuleOnsetMarker.h>

#include <sstream>
#include <string>
#include <vector>

std::istringstream rules(/* JSON rules */);

PhonemeConverter::RuleOnsetMarker marker(rules);
std::vector<bool> onsets = marker.mark({"b", "r", "ih", "l", "y", "ax", "n", "t"}); // brilliant
// {false, true, false, false, true, false, false, false}

Rules use phoneme type names in pattern. The wildcard pattern item "*" matches any phoneme. When multiple rules can match, the marker chooses the longest matching rule and prefers typed rules over wildcard rules of the same length. For example:

{
    "phonemeTypes": {},
    "rules": [
        { "pattern": ["*"], "onsets": [0] },
        { "pattern": ["*", "*"], "onsets": [1] }
    ]
}

Lua Onset Marking

LuaOnsetMarker executes a Lua function named markonset. The function receives a table of phoneme strings and must return a table of booleans with the same length.

function markonset(phonemes)
    local result = {}
    for i = 1, #phonemes do
        result[i] = i == 1 or phonemes[i] == "T"
    end
    return result
end

#include <PhonemeConverter/LuaOnsetMarker.h>
#include <PhonemeConverter/LuaScript.h>

#include <string>
#include <vector>

PhonemeConverter::LuaScript script(/* Lua script */, "onset-script");

PhonemeConverter::LuaOnsetMarker marker(script);
std::vector<bool> onsets = marker.mark({"S", "AA", "T"});
// {true, false, true}

Error Handling

The library reports invalid input formats and Lua failures with typed exceptions:

MappingS2PParseError
DictionaryS2PParseError
RuleOnsetMarkerParseError
LuaScriptError
LuaS2PError
LuaOnsetMarkerError

License

PhonemeConverter is licensed under the Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cmake		cmake
include/PhonemeConverter		include/PhonemeConverter
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
PhonemeConverterConfig.cmake.in		PhonemeConverterConfig.cmake.in
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhonemeConverter

Features

Requirements

Syllable-to-Phoneme Conversion

Direct Conversion

Mapping Conversion

Dictionary Conversion

Lua Conversion

Onset Marking

Rule-Based Onset Marking

Lua Onset Marking

Error Handling

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PhonemeConverter

Features

Requirements

Syllable-to-Phoneme Conversion

Direct Conversion

Mapping Conversion

Dictionary Conversion

Lua Conversion

Onset Marking

Rule-Based Onset Marking

Lua Onset Marking

Error Handling

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages