fix(consensus,framework,actuator): use Locale.ROOT for case-insensitive#6698
fix(consensus,framework,actuator): use Locale.ROOT for case-insensitive#6698halibobo1205 wants to merge 3 commits intotronprotocol:developfrom
Conversation
|
The three-layer fix is well-shaped: One angle worth recording in the PR body for future readers: the "only one site reaches persistent storage" claim is independently verifiable by inspection. Every other touched call site is one of (a) runtime engine selection ( LGTM once the remaining NITs resolve. |
|
@yanghang8612 Thanks for the detailed audit and the clean (a)–(d) categorization — updated the PR description accordingly. |
String.toLowerCase()/toUpperCase() without an explicit Locale uses Locale.getDefault(), which on Turkish (tr) or Azerbaijani (az) systems folds 'I' to dotless-ı (U+0131) instead of 'i' (U+0069). Changes: - Fix all toLowerCase()/toUpperCase() calls to use Locale.ROOT - Enable the ErrorProne StringCaseLocaleUsage checker at ERROR level to prevent future regressions at compile time - Add one-time data migration (MigrateTurkishKeyHelper) to normalize all Turkish legacy keys (ı → i) at startup.
888cbc2 to
49bb2c1
Compare
Summary
Fix a locale-dependent code hygiene issue: no-arg
String.toLowerCase()/toUpperCase()calls use the JVM default locale, and under a minority of locales (tr/az) the folding of'I'diverges from every other locale. Switching uniformly toLocale.ROOTdecouples the behavior from the deployment environment, backed by a compile-time guard and a one-time data normalization.This PR does three things:
toLowerCase()/toUpperCase()call with itsLocale.ROOTcounterpart so that string folding is independent of the JVM's startup locale.StringCaseLocaleUsagechecker atERRORlevel, and add a customStringCaseLocaleUsageMethodRefchecker to cover theString::toLowerCasemethod-reference form that upstream misses. Any future no-arg call fails compilation.MigrateTurkishKeyHelperruns once atManager.init()to normalize any non-ROOT legacy keys that may exist inAccountIdIndexStore.What this fix really does
Make the case-insensitive index behavior a pure function of the input bytes, no longer implicitly dependent on the JVM's startup locale. Before this fix,
AccountIdIndexStorewas, in principle, "case-insensitive relative to whichever locale this JVM happens to use" rather than truly locale-independent. After the fix it is the latter.This is a data-structure-level correctness convergence — remove an implicit environment dependency and align the implementation with the intended semantics.
Scope
Only one site reaches persistent storage:
AccountIdIndexStore.getLowerCaseAccountId— the lowercased result becomes a DB key. All other touched call sites (DB engine selection, disabled-API list, log topic classification, hex display, OS detection, etc.) are in-process runtime comparisons — they neither write to disk nor cross nodes.The input domain itself is tightly constrained: input to
setAccountIdis gated byvalidReadableBytesto printable ASCII (0x21–0x7E). Within this domain, the locale divergence in lowercase only manifests on a single character; every other character (letters, digits, symbols) folds identically under any locale.Auditability of the migration scope
The "only one site reaches persistent storage" claim above can be independently verified by inspection. Every other touched call site falls into one of:
dbEngine.toUpperCase()inTronDatabase,TronStoreWithRevoking,TxCacheDB. Locale only affects which DB engine the running process selects; it is never written to disk.disabledApiList,HttpMethodstring comparisons inUtil/ access filters,Account.accountTypeswitch. Comparison results are consumed in-process and never persisted.Hex.toHexString().toUpperCase()inDataWord, help-text formatting inArgs. Output to logs/console, never read back.os.name/os.arch/java.vendorreads inArch,WalletUtils.getDefaultKeyDirectory,KeystoreFactory. Local environment branching never crosses nodes.None of these reach disk or cross-node boundaries, so the migration scope (
AccountIdIndexStoreonly) is exact, not approximate.Why
MigrateTurkishKeyHelperis bundled inA code-only fix does not fully cover "nodes that were ever started under a non-default locale" — such nodes may carry legacy-format keys in their DB. The cleanest way to keep the code fix and the data state in sync is a one-shot normalization at startup.
Design notes:
MoveAbiHelperpattern, gated by aTURKISH_KEY_MIGRATION_DONEflag inDynamicPropertiesStoreso it runs at most once.AccountIdIndexStoreis a sparse index with very low cardinality — only 14 entries on mainnet today; scan cost is negligible).For nodes that were never affected, the scan yields zero candidates and the helper is a no-op.
Why the simplified normalization strategy
In principle one could enumerate every possible legacy variant of each ROOT-canonical key and merge them, but that enters a combinatorial space (per-key variant count can reach 2^k, where k is the occurrence count of the divergent character). Even with full enumeration, picking the correct value across conflicts requires knowing the historical insertion order — information the DB layer does not retain.
The conservative strategy chosen here:
Impact
Note for downstream fork chains
A downstream fork chain needs to treat this PR as a non-transparent upgrade only if both of the following hold:
SetAccountIdtransaction whose accountId contains a character that triggers the locale divergence (e.g.I).On such a chain, the
AccountIdIndexStorehistorical state was produced under non-ROOT folding; after this PR,has()lookups for the same accountId may yield different verdicts, which requires a coordinated hard-fork activation height and operator-defined historical data migration strategy.Mainnet and Nile are not in this category: condition (1) has never held historically.
Release scope