Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions proposals/4421-en-us.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Foundation (broader than SCT) review.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For interest/reference, I created a PR bringing the spec into line with the current documentation style (i.e. en_GB), as far as the word "authori[zs]ation" goes: matrix-org/matrix-spec#2351

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion in the internal Spec Core Team room about this MSC.

@richvdh was initially concerned about referencing terms from other specs with slightly different spelling (OAuth spec defines "authorization server", "authorization grant", etc.). But that concern abated after reading https://auth0.com/fr/intro-to-iam/what-is-oauth-2, which just translates the terms to French (authorization code grant -> attribution de code d'autorisation). This appears to be fine in practice.

@anoadragon453 initially said that they were indifferent on British English vs. US English being used for prose, but then conceded that US English would be better from a technical standpoint, as most other internet-defining specs are written in, or default to, US English (IETF RFCs, WHATWG, W3G, Khronos, etc.). So to avoid time-wasting footguns in the future, that was likely the easiest to work with for the Matrix spec as well.

Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# MSC4421: Standardize the spec on US English 🇺🇸
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm weakly against this change.

From what I can see, the rationale is:

  • searchability
  • how we relate Matrix to other technical standards
  • consistency

Consistency could go either way so isn't a convincing argument. Searchability is a weird one: I personally do not CTRL+F "authorisation server" then get sad that it's actually "authorization server". How we relate Matrix to other technical standards would be a compelling argument were it not for the later comparison that ISO is British English and RFCs are either. ANSI and IEEE would obviously be American English because they were founded in the States. W3C is a weird one given Berners-Lee and CERN, but it was established... in the States.

Matrix was notably not established in the States, so it's not unreasonable for it not to follow American English. The spec's recommendation of British English pretty much settles it for me, if anything we should be more strongly enforcing it. Enforcement itself isn't an argument since after all, even if we did follow American English we would still need to strongly enforce it for consistency.

All considered, there isn't enough here to overcome inertia imo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANSI and IEEE would obviously be American English because they were founded in the States. W3C is a weird one given Berners-Lee and CERN, but it was established... in the States.

Matrix was notably not established in the States, so it's not unreasonable for it not to follow American English.

I hadn't looked at it this way before but if you think of the choice as an extension of heritage, that actually makes for a decent argument. This thing was invented in the UK. Therefore, it uses British English. Period.

The spec's recommendation of British English pretty much settles it for me, if anything we should be more strongly enforcing it.

The haziness is that the spec recommends it but we're actually doing the opposite in practice. There was a longer discussion in the Matrix Spec & Docs Authoring room when the OAuth APIs were introduced which ended with an en-US momentum which, at least in my experience, has then semi-officially been applied in spec PRs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The haziness is that the spec recommends it but we're actually doing the opposite in practice.

...which is why we need to enforce it. :)

A slight tangent: enforcement is unfortunately bureaucratic. I would strongly oppose having MSC authors or reviewers manually check for en-GB compliance because quite frankly, it's not a good use of human time imo. I'd much rather we enforced other things which can have a material impact on the protocol. When MSCs get converted into Spec prose, that seems like a good time to get out the en-GB spell checker and have a checklist item for conformity: this is still bureaucratic but it affects the spec writer rather than everyone.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which is why we need to enforce it. :)

I'd be totally fine with that option, too. My goal here is to force a decision to settle the current confusion and Americanization just seemed like the most likely outcome based on the previous chats.

Assuming we FCP-close this proposal and (actually) stick to British English, we should reiterate the house rules to explicitly exclude non-localizable terms and identifiers inherited from other standards. I think that would qualify as a clarification and shouldn't require an MSC itself.

A slight tangent: enforcement is unfortunately bureaucratic. I would strongly oppose having MSC authors or reviewers manually check for en-GB compliance because quite frankly, it's not a good use of human time imo. I'd much rather we enforced other things which can have a material impact on the protocol. When MSCs get converted into Spec prose, that seems like a good time to get out the en-GB spell checker and have a checklist item for conformity: this is still bureaucratic but it affects the spec writer rather than everyone.

Yes, agreed. I'm only concerned with the spec text here. Not proposals.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aligned with kegan's position here. The use of UK English reflects the language of those who wrote the spec in the first place, and I don't see enough of a reason to change that; instead we should improve the consistency - at least in the spec itself.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All considered, there isn't enough here to overcome inertia imo.

The main argument for US English is that the spec currently has currently 54 instances of authorise (including authorisation, and similar), vs 165 instances of authorize, so I think the inertia is in favour of this MSC.

Much of the problem here is that we are naturally constrained by other specifications, notably OAuth2. We have to talk about concepts like an "authorization server", which is a defined concept in OAuth2. If we were writing in, say, German, then (I gather from native German speakers) we'd probably still call it an "authorization server" rather than ein "Autorisierungsserver" or something, so by extension we should probably do the same even if the body of the doc is en_GB. And of course the authorization_endpoint identifier is cast in stone because it's defined by RFC8414.

So we get into this whole question of where exactly we draw the line, which makes authoring and reviewing tricky, and it just goes away if we settle on en_US across the board.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] and it just goes away if we settle on en_US across the board.

Maybe the question is whether we expect this to stay true in future. If we switch to en_US but then integrate with another standard that uses en_GB, we'll be back to the same problem. I cannot say how likely that is. It looks like the most probable cause would be an RFC that uses en_GB. There seem to be fairly few such RFCs around though. From a quick search I've only found RFC1484, RFC1781 and RFC2076 – all of which use "organisation". So maybe this is not very likely to happen after all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for better or worse the majority of specs seem to be in en_US

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now come across https://auth0.com/fr/intro-to-iam/what-is-oauth-2. If they can talk about "Serveur d'autorisation" (for authorization server) and "attribution de code d'autorisation" (for authorization code grant), then I guess there's no reason we can't spell those terms with an s.


The spec's house style currently recommends British over American English[^1]. This has historically
not been strongly enforced, however. For instance, there are multiple instances of both "authorize"
(🇺🇸) and "authorise" (🇬🇧) in the spec text as well as in identifiers such as `M_UNAUTHORIZED` and
`m.unauthorised`. While this inconsistency usually doesn't hinder readability, it negatively impacts
searchability and general consistency of the spec.

Standardizing on British English is difficult though because many other technical standards use the
American spelling. For instance, RFC6749 defines the term "authorization server"[^2] as well as the
`authorization_code` grant type[^3] as an identifier. Using the British spelling when covering these
in the Matrix spec would be confusing for terms and impossible for identifiers.

For comparison, the following noteworthy standards use American English:

- W3C[^4]
- IEEE[^5]
- ANSI[^6]

In contrast, the following standards enforce British English:

- ISO[^7]

Lastly, these standards allow either of the two as long as they are used consistently within a
document:

- RFCs[^8]

Given the dominant use of US English in other standards and the unsolvable problem of localizing
identifiers, this proposal seeks to standardize Matrix on US English.

## Proposal

The spec's house rules are updated to RECOMMEND the American over the British spelling. Existing
spec text and identifiers are not updated but MAY be migrated in future. Any new spec text or
identifiers SHOULD use the American spelling.

## Potential issues

This proposal doesn't directly resolve the current inconsistency of both spellings being used in the
spec simultaneously. It paves the way for an eventually consistent spelling without the need for
busywork, however.

Many of the spec contributors and especially members of the core team have a British background.
Departing from their native spelling might feel odd for some.

Matrix has a huge center of mass in Europe. In a time of transatlantic tension, committing to the
American spelling might feel uncomfortable to some. Language and politics should not be conflated,
however.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"should not" and yet it is conflated, you can't avoid that. There's been plenty of high profile cases in the tech world:

  • master/slave => leader/follower
  • blacklist => blocklist
  • master branch => main branch

All of these changes had to overcome inertia in order to happen. I wouldn't dismiss the impact of politics on choice of language, especially when there isn't a compelling reason to fall into one or the other.


## Alternatives

We could enforce the British spelling in spec text and identifiers that are not inherited from other
standards. To aid searchability, a legend of common words that differ in spelling could be included
at the bottom of each page.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do RFCs do, as it seems like they would hit this the most due to allowing both?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nothing. Searchability might not be as big of a problem for them given that RFCs get their own pages and search engines appear to be smart enough to even out the spelling differences.


authorise -> authorize
authorisation -> authorization
...

A reader searching for "authorization" would at least land on the legend and receive a cue to search
for "authorisation" instead. This feels more complicated and less practical than standardizing on US
English, however.

## Security considerations

None.

## Unstable prefix

None.

## Dependencies

None.

[^1]: <https://github.com/matrix-org/matrix-spec/blob/bb3daafe96cce7ec7e139223429d2ea93e087c08/meta/documentation_style.rst?plain=1#L50>

[^2]: <https://datatracker.ietf.org/doc/html/rfc6749#section-1.1>

[^3]: <https://datatracker.ietf.org/doc/html/rfc6749#section-4.1.3>

[^4]: <https://www.w3.org/guide/manual-of-style/#Spelling>

[^5]: <https://sagroups.ieee.org/1588/wp-content/uploads/sites/144/2020/05/2014-ieee-sa-standards-style-manual.pdf>
"under 19.2 c"

[^6]: <https://www.ansi.org/american-national-standards/ans-introduction/essential-requirements>
"4.0 only mentions "English" but it is the **American** National Standards Institute"

[^7]: <https://www.iso.org/ISO-house-style.html#spelling>

[^8]: <https://www.rfc-editor.org/rfc/rfc7322.html#section-3.1>