Adding multilingual translation

With the current growth of sourcecred, we will have users all around the world coming in, some of them who may not speak English.

Vision

The idea is to create accurate translations in different languages which can help the sourcecred growth. This will increase the number of users we can reach and the number of clients using sourcecred. I wanted to get this idea out ASAP, to be able to modify the code of sourcecred to add translation in the future without taking too much time to implement.

I don’t really see any downside of adding this to the project.

Thank you so much for reading this, this is my first post and I would love to hear some feedback

3 Likes

Translations are a great idea both for opening us up for growth in different linguistic markets and for just generally providing accessibility to non-english speakers. I have done this sort of work and have witnessed firsthand the cognitive load this can take off of humans, and I generally value it.

However, I don’t think incentivizing translation work ASAP is a good use of our resources for the following reasons:

-inviting more engagement from other language-speakers is likely to leave us in a swampy position with regards to answering questions in those languages.

-It will also be difficult to fully integrate monolingual non-English speakers into our present community (I can imagine needing to spin up different language versions of our Community Call. This sounds great, but also not something that should be an ASAP priority)

-the Creditor seems like a potentially substantial change that may require large reworkings of our Docs. Doing a lot of translation work before that, esp. in light of my previous points, doesn’t seem to make much sense.

OTOH - I think translations, and the Community Cultivation opportunities attendant with them are a key thing to think about as we move out of beta and into intentional growth.

2 Likes

I may have been misleading my idea about the fact to incentivize the translation work ASAP. I wanted to share this idea early on, so it may be easier to change the code of sourcecred by referring to a dictionary instead of using a hardcoded value for the text.

1 Like

I agree with both the idea of being proactively conscious of and oriented towards a multilingual future for SourceCred, and @panchomiguel’s concerns about resources. However, my understanding of @Felix’s suggestion wasn’t for us to translate the docs, but to use no hardcoded strings in the codebase. In other words, instead of doing something like this:

<button>Change weight configuration</button>

…the devs would implement it more like this:

<button>$label['change_weight_configuration']</button>

Wherein $label is a dictionary that gets populated in the language selected by the user, if available, and falls back to English (en) if not.

Then the entire interface would be contained in e.g. a JSON file like this:

languages.en.json

{
  'change_weight_configuration': "Change Weight Configuration",
  'hide_weight_configuration': "Hide Weight Configuration",
  'grain_accounts': "${Grain} Accounts",
  'transfer_grain': "Transfer ${Grain}",
}

Since the term Grain is a variable that any SourceCred instance can customize, we would use a variable in the string. In the actual code under the surface, the label key would still be $label['grain_accounts'] etc.

I want to thank @Felix for this, because creating the Creditor in this way from the ground up will be that much better than retrofitting it after the fact, and it involves minimal extra effort to develop it this way. Additionally, it allows for non-devs (like design & content writers) to more easily adjust the UI labels themselves and explore better texts without having to involve devs in the process.

2 Likes

Yes yes and yes! This is exactly what I was referring to, I’m still having a hard time explaining my ideas with words. Thanks for making it more clear for others to understand!

Hmm; is it really so clear? I’ve developed localizable applications, localized apps myself (translating English into Spanish), and worked on adding translation systems to a large existing product. My experience is that (a) it is notable effort to develop this way and (b) it is easy to add later, and easier to add when you are actually ready for it than just speculatively. Let me describe.

(This is just from the technical side, in addition to what @panchomiguel has already written.)

Placeholders

Your examples hint at one problem that makes this not trivial:

  "grain_accounts": "${Grain} Accounts",

Here, you’ve included a placeholder in the string. That means that our UI code has to perform interpolation. Previously, we would write:

<span>{currencyName} Accounts</span>

…but now we must write:

<span>{uiStrings["grain_accounts"].replace("${Grain}", currencyName)}</span>

Of course, you don’t do the interpolation manually every time. You have a framework to do it (pofiles, etc.)… so now we need to set that up, and everyone has to learn to use it.

This goes beyond simple substitutions. Our code now needs to be generic over different languages’ grammatical structures. In English, we often write:

<span>{userCount} {userCount === 1 ? "user" : "users"}</span>

But other languages have different singular/plural forms, so this doesn’t suffice anymore. Likewise for dates, currencies, decimal places.

Readability, navigability

Translated strings also make it harder to read and navigate the code. When looking at the code, you can’t directly visualize the UI. And when looking at the UI, you can’t directly find what part of the code that maps to. You can usually do it through one or two layers of indirection, which is fine if you’re only looking up one thing. But when you’re trying to get your bearings in a GUI module that is littered with Messages.getStrings or whatever, it’s notably harder to get an idea of what’s going on. Some approaches use gettext directives around natural language, like _("My account balance"), but this makes updates harder…

Updates

Once you have a translated app, changing it becomes a whole process. If you change the structure of the strings that need to be localized, then you lose existing translations. Without mitigation, this results in localized UIs that are always half in the target language and half in English, as translations keep regressing. Users often report this as more frustrating than if the UI were just in English, since they have to keep context-switching and they can’t just learn the interface. There are different ways to handle this, but they usually involve some form of (a) minimum time delay before new code makes it into production and (b) an expected turnaround time for translators to submit updated strings.

When?

The thing is, when we decide that the time is right and we do want to invest in this infrastructure, retrofitting it onto existing strings is pretty straightforward. There are linter plugins to help catch untranslated strings in UIs. You can define mechanically generated locales with extra-long text to suss out fixed-width UI elements, or with extra-tall diacritic stacks to suss out bad Unicode or font handling. You can actually try out uncommon substitution patterns on real data.

What we can do now

All that said, I do agree that keeping localization in mind from the beginning is helpful, and there are things that we can do now. Use Unicode and UTF-8 extensively. Distinguish byte strings and text strings correctly. Avoid baking critical text into images (should be trying to do this anyway for accessibility). Don’t assume that people have “first names” or “last names”. Don’t rely on strings fitting into really small areas (like, a 20px wide button labeled “Go!”). These are good practices now, and they’ll also pay off later.

tl;dr: Localization is important; localization reduces development velocity; it doesn’t make sense to pay the real costs of localization before reaping its benefits. Ship a half-baked product to English speakers and then a polished product to everyone.

5 Likes

I was going to address your initial concerns with a point along the lines of “some of this yes, other parts we don’t have to do right away” but then you mentioned this:

Fair enough. When I built these systems myself (in a comprehensive content management system a la Wordpress, and at the Apple Online Store for 6,000+ instances and UTF-16 requiring languages), no such tools existed yet and doing it up front was significantly less effort than retrofitting. Since I don’t know the SourceCred UI codebase anywhere near as well as you do, I will of course defer :slight_smile:

Thanks for the detailed report @wchargin!

Data point: I recently learned that the game Slay the Spire will not have any more content updates because of cost-of-localization issues. They can change numeric values, but there are functional changes that they would like to make, but can’t. Conversation with a co-lead dev:

Discord conversation with SneakySly (Anthony) #5972, a core developerof Slay the Spire, on 2020-11-10. SneakySly: “We are limited to valuechanges / Because at this point localization is very arduous / Being ondifferent consoles and things”. Another user: “Ooh gotcha, so stuff likechanging effects is no go?”. SneakySly: “Correct / … / Otherwise Clashand Setup would have been changed”.

(Source: public messages in their Discord server. Later on, SneakySly says that there would be “total reworks” but they have “no specific designs because of the [localization] limitation”.)

In other messages, the other co-lead dev says that “it’s true” that some languages’ volunteer maintainers just disappear, and they have to nix support for the language. Changes incur “headaches with our currently very slow porting process”. At this point, “10+ languages” are missing translator teams, and “Thai and [Polish] are over a year behind”.

Now, Slay the Spire is available in 17 languages, so clearly parts of their translation process have worked well. But these kinds of costs are consistent with my experience, so I wanted to offer this example of how the effects can block development in the real world.

We should partner with / hire Guerrilla Translation when the time comes. https://www.guerrillatranslation.org/about-2/

They are the same folks who developed the DisCO framework, which I would love SourceCred to develop a relationship with.

2 Likes

While not trying to argue with this, there is a huge difference between SourceCred, which has minimal UI and text, and a game like Slay The Spire which has enormous volumes of text—all of which requiring very high precision in what they communicate and verification of accuracy. Misrepresenting what a game card does in another language is a breaking failure in STS; a UI label difference of “transfer grain” vs. a literal translation saying “move grain” is not.

1 Like