Control shorting of language codes#346
Conversation
|
The PHP parts seem fine. |
|
Doesn't this conflict with your changes in #345? |
|
Sure, in the sense that both touch the same neighborhood of code, so I will need to fix some conflicts should one be merged, otherwise they complement each other. Not sure what you were trying to say. |
5607f4b to
a768e8a
Compare
| if (!$localeObj) { | ||
| $localeObj = $locale; // fallback | ||
| } | ||
| $localeName = locale_get_display_language($locale, $locale) ?? $locale; |
There was a problem hiding this comment.
locale_get_display_language returns false on failure, not null, and ?? only coalesces null.
There was a problem hiding this comment.
Thanks. Changed to "?:". Though it is something that should never happen (as locales are not that user-friendly, though better than nothing), and in practice it appears locale_get_display_language will return the first argument if it couldn't find a translation.
| else: | ||
| lflag = lncode | ||
| flagurl = "http://www.unilang.org/images/langicons/" + lflag + ".png" | ||
| flagurl = "https://flagcdn.com/w40/" + lang.locale.split("_")[-1].lower() + ".png" |
There was a problem hiding this comment.
What about for locales with no region?
There was a problem hiding this comment.
I haven't seen a locale returned from crowdin without a territory (the code do use the locale instead of the id). In the case where it picks out something that is not a valid country code, you get the "Please do it manually" message (note that since unilang has added bot-protection, you always get this message before this change). In the case where it picks out a language code and tries to use it as a country and that country exists, you might get the wrong flag. But is does not seem to be much worse than unilang that responded with flags for both languages and countries (e..g. en and us) and sometimes something odd (e.g. ir returns irish flag -- ga-IE is locale for Irish in Ireland, while fa-IR is Persian in Iran).
| langCode = lang.locale.split("_")[0] | ||
| if langCodeCount[langCode] > 1 and langCode not in alias: | ||
| print(f'Multiple locales with language-code "{langCode}" manually add alias to disambiguate') | ||
| exit() |
There was a problem hiding this comment.
Since this will kill the whole process, we might need to expand the error message to make it clear that now your working dir is "dirty" -- localeMap.json is out-of-sync with the on-disk directories and you basically have to fix it. Is there a way to restructure this so we can fail out of this condition before the local dirs are changed?
There was a problem hiding this comment.
I moved stuff around to the check comes first. But if the error is seen, the user can just update the script and rerun it. And it is in git so you can always go back.
Considered that it would perhaps be better if the data was not in git and the logic was in a php script than could be run as a webhook on change from crowdin. Kinda like the posthog-event-fetcher.php is fetching that data to a file.
…ter use, use argparse
Currently localeMap.json on the website does not match the one on main. Not sure when that happened. This issue is based on the code on github.
Currently it seems quite unstructured what you can put in the lang-parameter and how it is interpreted. E.g. you can put "ab_cd" and the site will happily add "ab_cd" to all links and "pt" is interpreted as "pt_BR". The shorting of codes are just done of a "first come first served basis" in the order of completion. So "zh" could refer to CN or TW depending on who has translated the most. This issue tries to make it a bit more structured.
As I couldn't figure out how to download translations from crowdin, I also implemented download via their api.
Regarding Serbian, this issue fixes the same problem as #300. That PR however unfortunately edited the translation files directly which are overwritten by
updatefromcrowdin.py. This PR updates the script instead. Crowdin has two Serbian translations, one using latin (id/locale: sr-CS) and one using cyrillic script (id:sr, locale: sr-SP). But "CS" referes to "Serbia and Montenegro" which was dissolved in 2006 and "SP" appear to have been proposed at some point, so both locales are wrong. So since "In general, the alphabets are used interchangeably; media and publishers typically select one alphabet or the other" ref this update hides the Cyrillic as also done in #300. Bot Serbian translations have a completion of 54% of homepage.po, so not much over the 50% cut-off, so it is not worth handling the Latn/Cyrl.