Jump to content

Template talk:Lang

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Foreign-language article titles

[edit]

31 August 2021 (UTC)

Forced prefixing of *

[edit]

I've just noticed that use of codes for protolanguages, as in {{lang|cel-x-proto|...}}, forces a prepended * (indicating a construction unattested in surviving materials). This is undesirable, since in the vast majority of cases what we're going to be doing is replacing existing in-article strings with bare italics and no lang markup, like *''kal-'', with templated replacements, e.g. *{{lang|cel-x-proto|kal-}}, but this produces a double ** which has to be manually fixed. And there are apt to be tabular-data cases (interlinear glosses, etc.) in which an entire row of cells is prefixed with * and specific words or morphemes in particular cells follow this and should not each individually have * but should still have language markup. At bare minimum we need a way to suppress this "auto-*" behavior, but ideally it would be off by default and turned on only by a parameter switch, since it is unexpected, inconsistent, completely undocumented, and almost always editorially unhelpful. PS: If this does get changed, please ping me, since I will need to go fix Caledonians#Etymology and some other things to have non-templated * again.  — SMcCandlish ¢ 😼  07:35, 2 March 2024 (UTC)[reply]

Two thoughts: there is some value to the asterisk symbol as unattested (especially if we tooltip the first occurrence à la {{c.}}), so could we use {{asterisk}}, or perhaps (new) {{unattested}} and have that resolve to {{asterisk}}? Alternatively, what about just using one of the many star-shaped thingies that look like asterisk, but aren't, e.g.,
(U+274B HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK) (my favorite, but several more hidden in the wikicode).
Thanks, Mathglot (talk) 11:17, 2 March 2024 (UTC)[reply]
Using an alternative character would be WP making up a "fake style" out of nowhere. The standard across all linguistics writing since at least the Victorian era is the * symbol (asterisk).  — SMcCandlish ¢ 😼  05:07, 10 June 2024 (UTC)[reply]
Already exists but, alas, not documented:
{{lang|cel-x-proto|kal-}}*kal-
{{lang|cel-x-proto|kal-|proto=no}}kal-
{{lang-cel-x-proto|kal-}}Proto-Celtic: *kal-
{{lang-cel-x-proto|kal-|proto=no}}Proto-Celtic: kal-
Trappist the monk (talk) 15:02, 2 March 2024 (UTC)[reply]

Testing bullet-asterisk interaction with proto asterisk:

  • one asterisk, to make a bullet item
  • *kal- one asterisk, followed immediately by {{lang|cel-x-proto|kal-}}
  • one asterisk to make another bullet item

Looks good. We should document Module code starting at line 791 of the Module in a new, level-4 subsection 'Proto' at Template:Lang, probably to live under section § Formatting. Mathglot (talk) 20:05, 2 March 2024 (UTC)[reply]

But wait—you said in sentence 2, but this produces a double ** which has to be manually fixed, so what was your example that produced a double asterisk? It seems to be the identical code that works just above. Can you reproduce your error case below? Mathglot (talk) 20:14, 2 March 2024 (UTC)[reply]
To repeat from the OP: in the vast majority of cases what we're going to be doing is replacing existing in-article strings with bare italics and no lang markup, like *''kal-'', with templated replacements, e.g. *{{lang|cel-x-proto|kal-}}, but this produces a double ** which has to be manually fixed. If the solution is changing *''kal-'' to *{{lang|cel-x-proto|kal-|proto=no}} or to {{lang|cel-x-proto|kal-}}, I guess I can live with that, but I still think it would be preferable for the template to not force the * by default.  — SMcCandlish ¢ 😼  05:07, 10 June 2024 (UTC)[reply]

Question about Category:Articles containing Dogrib-language text

[edit]

For some reason, Category:Articles containing Dogrib-language text has started showing an error message (Error: Dogrib is not a valid ISO 639 or IETF language name. Please see Template talk:Lang for assistance.), is empty, and has been tagged for speedy deletion. We still have an article at Dogrib language, and the ISO 639 code is still dgr. It appears that articles that should be placed into that category are now being placed into the new (as of 22 May 2024) Category:Articles containing Tlicho-language text. I don't know what happened behind the scenes (maybe this change?), but we have an inconsistency between our article name and our category naming, which is undesirable. – Jonesey95 (talk) 23:41, 27 May 2024 (UTC)[reply]

The 2024-04-15 update to the ISO 639-3 dgr name list is the result of this change request. That update is reflected in this update to Module:Language/data/ISO 639-3. Subsequently, IANA incorporated that change (in reverse name-order in the 2024-05-16 update to their language-subtag-registry file which is reflected in this update to Module:Language/data/iana languages.
When multiple names are provided by IANA, Module:Lang takes the first name in the list – in this case 'Tlicho'. This may be overridden in Module:Lang/data when there is consensus to do so.
Trappist the monk (talk) 18:01, 28 May 2024 (UTC)[reply]
I believe that the stable title (for the last eight years or so) of our Dogrib language article constitutes sufficient consensus to override. I poked around the sources, and they seem to be split somewhat evenly among "Dogrib", "Tlicho", and "Tłı̨chǫ Yatıì", the latter of which would be a challenging article name for the English Wikipedia. If the article is moved, the override can easily be removed. – Jonesey95 (talk) 00:27, 29 May 2024 (UTC)[reply]

Limitations possibly requiring template modification

[edit]

Given that I edit ancient history articles, I have to use this template extensively for a large range of languages, and I'm finding some lacks in it that limit my ability to edit:

  1. the {{lang- form of the template should also display a label for the name of the language used similar to when using the {{lang| form;
  2. the {{lang| form needs a |translit= option that works just like it does with the {{lang-... form;
  3. the |translit= needs an option where there is a comma instead of the romanized: label usually preceding the transcription in addition to the already existing format with the romanized: label;
  4. there needs to be an option for adding multiple spellings and multiple transliterations; for example:
    1. the name Tuwaddis was recorded as 𔕬𔗬𔑣𔓯𔗔 and 𔕬𔓬𔑣𔕣, and presenting them in an article currently requires me to write
      the code {{lang-hlu|𔕬𔗬𔑣𔓯𔗔}} <small>and</small> {{lang|hlu|𔕬𔓬𔑣𔕣}}, <small>romanized</small> {{transl|akk-x-neobabyl|Tuwaddis}}
      to obtain Hieroglyphic Luwian: 𔕬𔗬𔑣𔓯𔗔 and 𔕬𔓬𔑣𔕣, romanized Tuwaddis;
    2. similarly, if I want to make a list of the various forms of the Hieroglyphic Luwian name Ḫartapus in an article, I would need to write it
      as {{lang-hlu|𔓟‎𔖱𔐞𔕯𔗔‎}}, {{lang|hlu|𔓟‎𔖱𔐞𔗣𔗔‎}} and {{lang|hlu|𔗖‎𔐞𔕯𔗔}}, <small>romanized:</small> {{transl|hlu|Ḫartapus}}
      to obtain Hieroglyphic Luwian: 𔓟‎𔖱𔐞𔕯𔗔‎, 𔓟‎𔖱𔐞𔗣𔗔‎ and 𔗖‎𔐞𔕯𔗔, romanized: Ḫartapus;
    3. meanwhile, the name 𒁹𒌇𒁮𒈨𒄿 is interpreted as either Tugdammî and Dugdammî, and presenting them in an article currently requires me to write
      the code {{lang-akk-x-neoassyr|𒁹𒌇𒁮𒈨𒄿|translit=Tugdammî}} <small>or</small> {{transl|akk-x-neoassyr|Dugdammî}}
      to obtain Neo-Assyrian Akkadian: 𒁹𒌇𒁮𒈨𒄿, romanized: Tugdammî or Dugdammî;
    4. and the name 𒁹𒄖𒊌𒄖 is interpreted as both Gugu and Guggu, but presenting it in an article would require that I write
      the code {{lang-akk-x-neoassyr|𒁹𒄖𒊌𒄖|translit=Gugu}} <small>and</small> {{transl|akk-x-neoassyr|Guggu}}
      to obtain Neo-Assyrian Akkadian: 𒁹𒄖𒊌𒄖, romanized: Gugu and Guggu
    5. if I need to make a list of the various spellings of the name māt Tabali, for example, I need to write
      the code {{transl|akk-x-neoassyr|māt Tabali}} ({{lang|akk-x-neoassyr|𒆳𒋫𒁀𒇷}}, {{lang|akk-x-neoassyr|𒆳𒋫𒁀𒀀𒇷}}, {{lang|akk-x-neoassyr|𒆳𒋫𒁄𒇷}}, {{lang|akk-x-neoassyr|𒆳𒋰𒀀𒇷}})
      to obtain māt Tabali (𒆳𒋫𒁀𒇷, 𒆳𒋫𒁀𒀀𒇷, 𒆳𒋫𒁄𒇷, 𒆳𒋰𒀀𒇷);
    6. and if I want to make a list of the various forms of the name Qedar, I need to write
      the code Neo-Assyrian Akkadian: 𒆳𒆤𒊑, Qidri; 𒆳𒆤𒊏𒀀𒀀, Qidrāya; 𒆳𒆥𒁕𒀀𒊑, Qidāri; 𒆳𒋡𒁕𒊑, Qadari; 𒆳𒋡𒀜𒊑, Qadri; 𒇽𒆤𒊏𒀀𒀀, Qidrāya; 𒇽𒆥𒁯𒊏𒀀𒀀, Qidarāya; 𒌷𒆥𒁕𒊑, Qidari; and 𒇽𒄣𒁕𒊑, Qudari
  5. I would also require a transcription parameter because some scripts are not transcribed in the exact same was as their reconstructed pronunciation, and this sometimes needs to be shown in the text.
    1. For example, Mycenaean Greek kʰalkós was written as 𐀏𐀒 in Linear B script, which is transcribed as ka-ko, but the template as it now exists only allows me to add the Linear B text and the word as it was pronounced, but not the transcription of the text;
  6. Integrating the functions of {{script}} into the {{lang}} template would also be useful because sometimes the coding takes too much space in the article or using it makes the article unnecessarily big so that it would be preferable to shift this onto the templates instead.
    1. For example, {{lang-ae|}} and {{lang|ae|}} should have a parameter that functions in the same way as if {{lang-ae|{{script|Avst|}}}} and {{lang|ae|{{script|Avst|}}}} were used.
      1. Some scripts, like cuneiform, however use multiple variants due to how widespread and long-lived their use was, and, if creating such a parameter is possible, it would need to be able to render the various fonts used in Template:Script/Cuneiform.
      2. This parameter should be optional, however, because some of the script templates, like {{script|Grek|}} and {{script|Latn|}} render the text in a font that is difficult to read and are therefore already discouraged.
  7. There also needs an option where interting - in the text parameter followed by the transliteration in the template displays the language followed by the transliteration.
    1. For example, {{lang-sa|-|bharu}} should give something that displays like Sanskrit: bharu.

Would it be feasible to modify the template so as to remove any or some or even all of these current limitations? Antiquistik (talk) 15:01, 28 May 2024 (UTC)[reply]

@Trappist the monk: can any of these issues be resolved? Antiquistik (talk) 08:13, 3 June 2024 (UTC)[reply]
I numbered your list to make it easier to answer.
1. because {{lang-??}} already has a wikilinked language label, using the label= html attribute is considered redundant or superfluous
2. might be done but probably not necessary because {{transliteration}} exists to serve that purpose
3. see 4
4. if you do a lot of these custom lists, you might be better served to create one or more templates to do the grunt work
5. a new template, {{transcript}} might be created; you will need to work out details of its implementation
6. nesting templates in {{lang}} may take more space, but space in a Wikipedia article is not an issue; this is not a dead tree encyclopedia
7. {{transliteration-??}} templates might be created; you will need to work out the implementation details
In general, Module:Lang works well for the vast majority of its uses; mucking about with that for a small number of articles seems to me to be counterproductive.
Trappist the monk (talk) 14:22, 3 June 2024 (UTC)[reply]
@Trappist the monk: For #6, I've faced problems with pages I rewrote becoming too big per WP:TOOBIG before, so space is unfortunately an issue.
For #2, editing using both {{transliteration}} and {{lang|}} for this purpose is too cumbersome and unwieldy. Adding a |translit= parameter to {{lang|}} would be the best option.
For #3, I am not sure I understand how #4 relates to this issue. You might need to spell it out for me.
For #4, yes, a separate template for lists would be ideal. It should work like Wiktionary's {{desc| template, minus creating a link for the term. However, the options for the first four sub-issues should be integrated into {{lang|}}/{{lang-}}.
For #5, an additional transcript template would definitely be a very useful addition. However, I also have no choice in needing the transcription option to be part of {{lang}} as well. There are articles in some scripts that really do need a Name in script, followed by a transliteration, followed by a transcription, option. This is especially important because the present |translit= parameter is presently used for transcription instead than transliteration but otherwise still require both transcription and transliterations, including in articles that are not part of the topics that I cover.
For #7, what would be ideal would be both a {{transliteration}} template and the option to insert a nil parameter in the present template as well.
As for #1, well, I can do without it for now. But I would still support adding a label if the question arises again in the future. In my (personal) opinion, the label in fact makes it more easily to read the pages.
Additionally, could you modify the private-use tag akk-x-latbabyl to render as "Late Babylonian Akkadian" rather than simply as "Late Babylonian" as it now does?
Antiquistik (talk) 19:31, 7 June 2024 (UTC)[reply]
@Trappist the monk: Can you have a look at my response and see what can be done please? Antiquistik (talk) 06:26, 25 June 2024 (UTC)[reply]

Mild performance improvement

[edit]

I have been looking at Kashmiri language and wondering why it takes a long time to build the page.

In the timing stats on the page, the lua function gcodepoint_init is high on the list. Looking at the wikitext of the page, templates of the form {{#invoke:lang|lang|ks|{{uninastaliq... (of which there are over 900) are responsible for this timing. For each of these calls, gcodepoint_init is called three times.

The call to gcodepoint_init actually comes from Module:Unicode_data is_Latin

Now to the point, Module:Lang calls is_Latin from line 988 and is_rtl (which calls is_Latin) from lines 549 and 551.

The call to is_rtl could be made once and then used in the two succeeding if statements - hence saving a call Desb42 (talk) 08:16, 7 June 2024 (UTC)[reply]

Renaming a template per ISO 639 changes?

[edit]

At Template talk:Lang-sh#HBS?, we've had a question whether to use "sh" or "hbs" as the template name/code. Is this a doable change, is there something to think about? The first thing that occurs to me is that the TFD notice would clutter a lot of lead sections... --Joy (talk) 15:05, 8 June 2024 (UTC)[reply]

Anything is possible. What is the end goal? Change instances that use {{Lang-sh}} to {{Lang-hbs}} or just switch between the current template and the "hbs" redirect? If the first and there is consensus on the page, a bot operator could help with the 800s transclusions, if just rename, then anyone that can rename pages can do that or WP:RM/T. Also a small edit to the template to change the code parameter. Gonnym (talk) 15:16, 8 June 2024 (UTC)[reply]
So if I get this right, you wouldn't use a TFD tagging procedure? --Joy (talk) 19:15, 8 June 2024 (UTC)[reply]
sh is the IETF BCP 47 language subtag utilized for Serbo-Croatian, and these are the basis for Wikipedia's lang templates and by extent subdomains (which is the reason sh.wikipedia.org does not need to move to hbs.wikipedia.org). –Vipz (talk) 16:37, 8 June 2024 (UTC)[reply]
@Vipz well, that's actually making things a tad more confusing because there it says for "sh":
Registry comment: sr, hr, bs are preferred for most modern uses
sh is a macrolanguage that encompasses the following more specific primary language subtags: bs hr sr cnr. If it doesn't break legacy usage for your application, you should use one of these more specific language subtags instead. On the other hand, sh is often preferred by legacy applications rather than sr (Serbian).
We're not just using it for legacy applications. --Joy (talk) 19:14, 8 June 2024 (UTC)[reply]
Interestingly, that quoted material doesn't mention "hbs".  — SMcCandlish ¢ 😼  04:38, 10 June 2024 (UTC)[reply]

Side question on Proto-[Foo]

[edit]

Why are we at en.wp using, e.g., gem-x-proto while en.wikt is using gem-pro? Is one site or other making a grave mistake?  — SMcCandlish ¢ 😼  04:39, 10 June 2024 (UTC)[reply]

Early Modern English?

[edit]

Not sure of the process here, but is it possible to make the template accept Early Modern English as a language? There are some long quotations on Elinor Fettiplace which, while definitely not in Middle English, must be extremely confusing to a screen reader. ISO 639-6 proposed emen as a code for it, but as I gather was not accepted. UndercoverClassicist T·C 08:40, 22 June 2024 (UTC)[reply]

{{lang|en-emodeng|text}}text
{{lang-en-emodeng|text}}Early Modern English: text
Trappist the monk (talk) 11:39, 22 June 2024 (UTC)[reply]
Wonderful: thank you! UndercoverClassicist T·C 15:11, 22 June 2024 (UTC)[reply]

lang-xx missing tooltip?

[edit]

Didn't {{lang-xx}} use to add a tooltip, the same as {{lang}}, or am I mistaken? For example:

  • {{lang|fr|bonjour}}
    

    bonjour

    tooltip: French-language text
  • {{lang-fr|bonjour}}
    

    French: bonjour

    no tooltip

W.andrea (talk) 00:01, 3 July 2024 (UTC)[reply]

Perhaps for a short time. In the olden days before Module:Lang, {{lang-fr}} called {{language with name}} which called {{lang}}. This wikitext form did not have a tooltip. During the transition to Module:Lang, {{lang}} was the first to be converted. The new {{lang}} has a tooltip. Before {{lang-fr}} was converted to directly call Module:Lang, it continued to call the new {{lang}} so did have a tooltip. When {{lang-fr}} was converted to directly call Module:Lang, the tooltip went away because it is redundant to the language label link that precedes the French language text.
If you must have the redundant tooltip, you can use {{language with name}}:
{{language with name|fr|French|bonjour}}French: bonjour – has a tooltip because this template calls {{lang}}.
Trappist the monk (talk) 01:03, 3 July 2024 (UTC)[reply]

the tooltip went away because it is redundant to the language label

Oh, of course! I don't know how I didn't realize that. Thanks! — W.andrea (talk) 01:10, 3 July 2024 (UTC)[reply]

Schrödinger's language template

[edit]

The latest run of Special:WantedCategories featured a redlink for Category:Articles containing no linguistic content-language text, autogenerated by an invocation of {{Lang-zxx}} in emoji.

Now, I grok the context of what it would be for — the template was used in the emoji article to reify a short series of non-linguistic colour-code boxes into "language", because of a technical glitch that was bleeding into the rest of the paragraph when the colour codes were just sitting as raw "text" not wrapped in a lang template, so basically it's a wrapper for non-linguistic content (symbols, colour codes, etc.) that has to be treated as para-linguistic for some technical reason or other. But its name is weird and illogical on its face — "no linguistic content language"? — and it's a category that has existed in the past but was deleted. I was able to make it go away by wrapping the lang-zxx template in the {{suppress categories}} wrapper, but since it's template-generated it may recur again in the future.

So is this a category that we want, at either that seemingly oxymoronic name or another more logical alternative? Obviously it can be created if it's desired and its name is considered fine — but if an alternative name would be more desirable, then the lang-zxx template needs to be modified to generate that alternative name instead, and if it's undesirable at any name then the lang-zxx template needs to be prevented from generating it at all. But those are both things that would require a higher level of template-coding expertise than I've got, so I'm bringing it to the project's attention so that I don't break stuff. Thanks. Bearcat (talk) 15:26, 31 July 2024 (UTC)[reply]

The category was nominated for CSD G8 deletion 22 September 2020 by Editor Gonnym without explanation and deleted the same day by Editor Maile66 using an automated process; also without explanation. Seems to me that the category should not have been deleted because the category was marked with {{Possibly empty category}}. Perhaps this was an oversight because at the time we were shifting the category documentation templates from {{Category articles containing non-English-language text}} which required parameters to {{Non-English-language text category}} which does not require parameters.
I am wholly indifferent to the category name. If it is really important, it can be changed but I see no pressing need.
A benefit of template documentation is that it lists available parameters. For {{lang}} (and its {{lang-xx}} counterparts) the documentation lists both |nocat= (accepting a variety of positive values) and |cat= accepting a variety of negative values). Both parameters accomplish the same thing: when set appropriately, the template will not emit categories.
Trappist the monk (talk) 16:41, 31 July 2024 (UTC)[reply]
I don't remember why I nominated it. If it is only created by usages of Template:Lang-zxx and that template did not exist at the time, then that probably was a likely reason, as those categories shouldn't be manually populated and at the time there was no automatic template handling this. Gonnym (talk) 18:27, 31 July 2024 (UTC)[reply]
The OP erred; there is no {{lang-zxx}} in Emoji and that template did not exist at the time of the category's deletion. But, {{lang|zxx|...}} was/is a legitimate use (Emoji has {{lang|zxx-Zsye|🏻 🏼 🏽 🏾 🏿}}). Use of {{lang|zxx|...}} would have emitted Category:Articles containing no linguistic content-language text then as it does now; see line 548 et seq. (19 September 2020 permalink) in Module:Lang.
Trappist the monk (talk) 18:56, 31 July 2024 (UTC)[reply]

Rut

[edit]

Hello!
Please change in the (1) Module:Lang/data/iana languages: ["rut"] = {"Rutul"} to ["rut"] = {"Rutulian"}.   (and also in these modules: (2) Module:ISO_639_name/ISO_639-3, (3) Module:ISO_639_name/ISO_639_name_to_code)
Thank you. Digitalberry (talk) 08:36, 5 August 2024 (UTC)[reply]

Not what it's called in the ISO 639 specification. Remsense 08:38, 5 August 2024 (UTC)[reply]
Thanks for your reply. Could you give me the source (link) you are referring to? Digitalberry (talk) 08:46, 5 August 2024 (UTC)[reply]
I didn't find the source. Can you provide me with the source? Digitalberry (talk) 09:34, 5 August 2024 (UTC)[reply]
Well, the source is ISO 639. You can see a corresponding table we have at ISO 639:r Remsense 10:15, 5 August 2024 (UTC)[reply]
Also, you could've followed the ISO 639 link on the Rutul language page itself. Remsense 10:16, 5 August 2024 (UTC)[reply]
Thanks for the answer. Still, the data indicated there is erroneous and needs to be clarified. Digitalberry (talk) 10:22, 5 August 2024 (UTC)[reply]
That's unfortunate; this tool and many other second-order tools use the ISO-assigned name, so there's not much to do here I'm afraid. Remsense 10:50, 5 August 2024 (UTC)[reply]
We can override some language names used by {{lang}} which are taken from the IANA language subtag registry which draws tags/names from all of the ISO 639 standards. The override is accomplished in Module:Lang/data when there is evidence of sufficient consensus to do so. That consensus often takes the form of an en.wiki article under the desired name. That is not the case here.
Trappist the monk (talk) 11:10, 5 August 2024 (UTC)[reply]
I think the right way is to change the information via a request to ISO-639. Digitalberry (talk) 11:48, 5 August 2024 (UTC)[reply]

Spelling of "Romanization"

[edit]

Any way to allow the BrE spelling of "Romanisation" when using e.g. Template:lang-grc? An optional parameter like |-ise=y (similar to how date templates have |df=y) would seem like a possible solution. UndercoverClassicist T·C 21:48, 28 August 2024 (UTC)[reply]

Perhaps, I have not looked in the the details. There has to be a better parameter name; |engvar=gb? Module:lang currently supports eight regional variants of English:
["en-au"] = "Australian English",
["en-ca"] = "Canadian English",
["en-gb"] = "British English",
["en-ie"] = "Irish English",
["en-in"] = "Indian English",
["en-nz"] = "New Zealand English",
["en-us"] = "American English",
["en-za"] = "South African English"
If we do this, the default will remain as it is: |engvar=us.
Your task is to research these variants and group them by suffix: ~ise or ~ize (or other?). Report back with the results.
Trappist the monk (talk) 22:26, 28 August 2024 (UTC)[reply]
I am honoured -- let's see if I can do this with a nice table. Data source for the moment is our respective articles on the dialects, except for South African English, which luckily has plenty of results on Google to say that it follows the British system.
EngVar Suffix
en-au -ise
en-ca -ize
en-gb -ise
en-ie -ise
en-in -ise
en-nz -ise
en-us -ize
en-za -ise

It might be worth clarifying in the documentation that if people want to use e.g. Oxford English (which uses -ize but otherwise follows regular BrE), they can just set the parameter to en-us and it won't affect anything except that single word? UndercoverClassicist T·C 22:39, 28 August 2024 (UTC)[reply]

Done:
{{lang-ja|東京タワー |translit=Tōkyō tawā |engvar=ca}}
Japanese: 東京タワー, romanizedTōkyō tawā
{{lang-ja|東京タワー |translit=Tōkyō tawā |engvar=za}}
Japanese: 東京タワー, romanisedTōkyō tawā
{{lang-ja|東京タワー |translit=Tōkyō tawā |engvar=}}
Japanese: 東京タワー, romanizedTōkyō tawā
also works in {{transliteration}} (in the tool tips)
{{transliteration|ja|Tōkyō tawā |engvar=ca}}
Tōkyō tawā
{{transliteration|ja|Tōkyō tawā |engvar=nz}}
Tōkyō tawā
{{transliteration|ja|Tōkyō tawā}}
Tōkyō tawā
and for the three transliteration standards names that use the term 'Romani(sz)ation'; Revised Romanization of Korean:
{{transliteration|ko|rr|test |engvar=ca}}
test
{{transliteration|ko|rr|test |engvar=nz}}
test
{{transliteration|ko|rr|test}}
test
Ukrainian National system of romanization
{{transliteration|ko|ukrainian |test |engvar=ca}}
test
{{transliteration|ko|ukrainian |test |engvar=nz}}
test
{{transliteration|ko|ukrainian |test}}
test
Yale romanization of Korean:
{{transliteration|ko|yaleko|test |engvar=ca}}
test
{{transliteration|ko|yaleko|test |engvar=au}}
test
{{transliteration|ko|yaleko|test}}
test
Trappist the monk (talk) 22:40, 31 August 2024 (UTC)[reply]
Thanks -- really nice work, and kudos for catching the tooltip case as well. Just implemented on Fear and trembling and seems to work well. UndercoverClassicist T·C 08:48, 1 September 2024 (UTC)[reply]

lang-my outputs tofu on my browser (FF)

[edit]

I've been removing lang-my where I come across it because it turns burmese script into tofu. Not sure what the problem is, but assume it's forcing a script to display that I don't have installed. I have a number of burmese scripts, though, including generic ones like Noto, so display shouldn't be a problem. — kwami (talk) 09:26, 31 August 2024 (UTC)[reply]

Don't do that without evidence that {{lang-my}} is at fault. Here are examples of differently written Burmese text:
မြန်မာအက္ခရာ ← plain text; no markup
မြန်မာအက္ခရာ<span>မြန်မာအက္ခရာ</span>
မြန်မာအက္ခရာ<span lang="my">မြန်မာအက္ခရာ</span>
Burmese: မြန်မာအက္ခရာ{{lang-my|မြန်မာအက္ခရာ}}
[[Burmese language|Burmese]]: <span lang="my">မြန်မာအက္ခရာ</span>
For me, all of the above render correctly (win 10, chrome). Do any of the above render correctly for you?
We have had discussions with you about fonts in the past:
Template talk:Lang/Archive 2 § Screwing up formatting
Template talk:Lang/Archive 2 § Which fonts are used?
Template talk:Lang/Archive 11 § bug in Rapa Nui language
Template talk:Lang/Archive 11 § bug with Chinese
None of those discussions revealed a problem with {{lang}}, the various {{lang-xx}}, or Module:Lang.
Trappist the monk (talk) 13:54, 31 August 2024 (UTC)[reply]
The results of 1 and 2 display correctly (as well as the code of all 5). lang="my" appears to be the problem, and lang-my appears to inherit that problem. — kwami (talk) 14:16, 31 August 2024 (UTC)[reply]
It is your browser that interprets the lang="my" attribute. If it does not interpret the attribute correctly, you will get rubbish for a rendering. Here I have switched the language tags (don't do this in mainspace):
မြန်မာအက္ခရာ<span lang="ja">မြန်မာအက္ခရာ</span>lang="my" switched to lang="ja"
မြန်မာအက္ခရာ<span lang="ru">မြန်မာအက္ခရာ</span>lang="ru" switched to lang="ru"
For me, they both render correctly.
Trappist the monk (talk) 14:40, 31 August 2024 (UTC)[reply]
Okay, it's my browser then. Bizarre that FF doesn't render 'my' by default. I'll look for overrides. Thanks. — kwami (talk) 15:13, 31 August 2024 (UTC)[reply]

Block level

[edit]

Is there a version of this template for use on block-level content? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:37, 2 September 2024 (UTC)[reply]

This template. It will correctly wrap <poem>...</poem> tags, ordered, unordered, and definition lists, and content wrapped in <div>...</div> tags.
Trappist the monk (talk) 17:22, 2 September 2024 (UTC)[reply]
Odd then that the opening sentence of the documentation refers to a "span of text". I'll change that. But what about simple paragraphs, singly or in multiple? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:43, 2 September 2024 (UTC)[reply]
A span of text does not necessarily mean the html <span>...</span> tags. The term span has been used as a descriptor since the first version (permalink) of the documentation (then held at Template talk:Lang). I would suppose that had the original author (Editor Monedula) meant the html <span>...</span> tags, they would have written something to the effect:
The purpose of this template is to indicate that text in HTML <span>...</span> tags belongs to a particular language.
Of coarse, at the time, {{lang}} only supported inline text.
Paragraphs written as normal wikipedia paragraphs are supported.
Trappist the monk (talk) 18:53, 2 September 2024 (UTC)[reply]
Yes; I was saying it was odd that it had never been updated to say that it covered block level content. I have now done so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:20, 2 September 2024 (UTC)[reply]
I have seen Linter errors caused by the use of this template with block content. This version of my sandbox lists one missing end tag (for <p>) and at least one misnested pair of <i>...</i> tags. – Jonesey95 (talk) 22:07, 3 September 2024 (UTC)[reply]

Template-protected edit request on 5 September 2024

[edit]

Can someone please remove the following comments from Module:lang/data?:

To do as you have asked would not have been the optimal solution.
["lij"] = "Ligurian (Romance language)" can be deleted because the language name for lij in Module:Lang/data/iana languages is 'Ligurian'
["lij-mc"] = "Monégasque language" because there is a duplicate in another table that would have caused lij-mc to link to 'Monégasque language'
['qwm'] = "Kuman (Russia)" can be deleted but the resulting link would be to Kuman (Russia) language from the language name for qwm in ~/iana languages: 'Kuman (Russia)'
["xlg"] = "Ligurian (ancient language)" can be deleted but the resulting link would be Ligurian (Ancient) language from the language name for xlg in ~/iana languages: 'Ligurian (Ancient)'
So, I have:
deleted ["lij"] = "Ligurian (Romance language)"
modified ["lij-mc"] = "Monégasque language" so that it points to 'Monégasque dialect'
{{lang|lij-mc|fn=name_from_tag|link=yes}}Monégasque
deleted ['qwm'] = "Kuman (Russia)"
added ['qwm'] = "Cuman" to the override table
modified ["xlg"] = "Ligurian (ancient language)" so that it points to 'Ligurian language (ancient)'
{{lang|xlg|fn=name_from_tag|link=yes}}Ligurian
Trappist the monk (talk) 14:35, 5 September 2024 (UTC)[reply]

Hanja

[edit]

For {{lang|ko-Hani}} (supposed to be for Hanja), it renders the "traditional" characters used for Hanja as simplified characters on iOS. This seems to be undesirable; Hanja doesn't use most of the simplified characters.

For example, on iOS {{lang|ko-Hani|龜}} renders incorrectly using the simplified char (⻱). However, on Mac desktop this issue doesn't occur.

I feel like we should recommend against people using ko-Hani or ko-Hant, and just ask them to stick to ko, which doesn't have this issue. seefooddiet (talk) 04:13, 8 September 2024 (UTC)[reply]

This is not an issue for {{lang}}. The character, no matter how it is rendered, is the same unicode character U+9F9C from the CJK Unified Ideographs unicode block. From your browser's point of view, the character is just a series of digits. Your browser and the operating system under which it is running decide which (of many) font faces is used to convert that series of digits to the character displayed on the screen. You can control that to some extent by providing the appropriate script subtag when you write a {{lang}} template but ultimately, the font face is chosen by the browser and its OS.
I suspect that iOS has physical limitations (available memory?) that determine how many font faces are available. If I understand the tables in CJK Unified Ideographs (search for 9F9C) there are seven ways to write the character that is 9F9C – 3 Chinese, 2 Korean, 1 Japanese, and 1 other (Vietnamese?). There are 20,735 characters identified in CJK Unified Ideographs; many (most?) of those have multiple ways to write a CJK character so it would not surprise me to learn that the iOS/browser designers elected to fall back to one or two of those ways when rendering a CJK character.
Regardless, when appropriate, we should always identify the correct script and not presume that all browsers have the same design as your iOS/browser. And who knows, perhaps at IOS v30 or whatever, the problem as you see it will have been resolved.
Trappist the monk (talk) 13:19, 8 September 2024 (UTC)[reply]

I'd argue you don't need script (writing system) tagging. Machines can easily identify the script by checking the code point of each character in a string.

Language tagging is needed for distinguishing different languages using the same script (e.g. English, Spanish; Russian, Bulgarian; etc.) or for distinguishing different orthographies using the same script in a language (e.g. Norwegian Bokmål/Nynorsk, Chinese simplified/traditional, etc.); it is not needed for distinguishing different scripts (Latin, Cyrillic, etc.).

Also, Hani is for text consisting of Chinese characters (hanzi, kanji, hanja) only. Hanja forms of Korean terms can also contain hangul (e.g. 서울特別市 – 서울 does not have hanja), so ko-Hani is not really appropriate anyway. I think ko is good enough. 172.56.232.227 (talk) 23:36, 8 September 2024 (UTC)[reply]

Apparently I wasn't as clear as I ought to have been. I do not support writing es-Latn or ru-Cyrl, etc. But, for Spanish transliterated into Greek, for example, es-Grek is appropriate.
Hanja forms of Korean terms can also contain hangul. If I understand our article on Hanja, it is Chinese characters used to write Korean text. When that occurs, it would seem that the correct thing to do is to mark the text with ko-Hani. IANA seems to support this with this definition for Hani (see the IANA language-subtag-registry file):
%%
Type: script
Subtag: Hani
Description: Han
Description: Hanzi
Description: Kanji
Description: Hanja
Added: 2005-10-16
%%
Trappist the monk (talk) 14:05, 9 September 2024 (UTC)[reply]
In fact, there is a code specifically for hangul+hanja Korean text: ko-Kore. But for some reason no one uses this on Wikipedia.
Anyway, ko is good enough. 172.56.232.227 (talk) 04:09, 10 September 2024 (UTC)[reply]
Oh neat, I didn't know that! Now I do, thank you. Remsense ‥  06:19, 10 September 2024 (UTC)[reply]
ko-Kore not supported by IANA and so not supported by this template:
%%
Type: language
Subtag: ko
Description: Korean
Added: 2005-10-16
Suppress-Script: Kore
%%
{{lang|ko-Kore|}} → [龜] Error: {{Lang}}: script: kore not supported for code: ko (help)
Trappist the monk (talk) 06:26, 10 September 2024 (UTC)[reply]
Oh, that's a shame. In any case, Japanese is an analogous case as it also uses a mixed script, so simply ko would seem to suffice, with ko-Hani also usable for hanja-only text. Remsense ‥  06:29, 10 September 2024 (UTC)[reply]
Correct me if I'm wrong, but I think we're in agreement that ko-Hani is fine if it's exclusively Hanja, but if there is Korean mixed script then the more general ko is more accurate. seefooddiet (talk) 06:16, 11 September 2024 (UTC)[reply]
Bingo! Remsense ‥  07:56, 11 September 2024 (UTC)[reply]

Possible bug

[edit]

At the bottom of the page, List of transgender public officeholders in the United States is in the category "Category:Articles containing Neapolitan-language text", despite not having any Neapolitan text. I'm not seeing anything labeled {{lang|nap}} or anything like that, either. Snowman304|talk 13:47, 15 September 2024 (UTC)[reply]

That page transcludes Template:Transgender sidebar which does use that. Gonnym (talk) 14:38, 15 September 2024 (UTC)[reply]
Gotcha! Thanks Snowman304|talk 14:47, 15 September 2024 (UTC)[reply]

Template-protected edit request on 18 September 2024

[edit]

Can someone please categorise the template {{lang-ku}} under Category:Iranian multilingual support templates instead of Category:Indo-Iranian multilingual support templates, and the templates {{lang-bn}}, {{lang-hi}}, {{lang-ne}}, {{lang-pa}}, {{lang-sa}} and {{lang-ur}} under Category:Indo-Aryan multilingual support templates instead of Category:Indo-Iranian multilingual support templates, because the categories 'Indo-Aryan multilingual support templates' and 'Iranian multilingual support templates' are more specific than the category 'Indo-Iranian multilingual support templates'? PK2 (talk; contributions) 03:44, 18 September 2024 (UTC)[reply]

Done. Another reason why this system of creating hundred of templates like this is horrible maintenance-wise, when one template with a language code works. Gonnym (talk) 08:25, 18 September 2024 (UTC)[reply]

merge language-specific templates

[edit]

For years I've wanted to create a {{lang-??}} template that would replace all of those hundred of templates. Alas, {{lang-xx}}, the most obvious choice for a template name, is used as a redirect to Template:Lang § Language-specific templates. One might argue that the language-specific templates need not be mentioned in Template:Lang/doc if {{lang-??}} was a template that accepted the same parameters as the language-specific templates. {{lang-x}} is used for documentation for the language-specific templates and would become superfluous if we created a single {{lang-??}} template.

We might:

  1. create a redirect {{language-specific templates}}
  2. replace all instances of {{lang-xx}} with {{language-specific templates}} so we could recover the {{lang-xx}} name
  3. modify Module:lang to have a lang-xx() entry point
  4. create {{lang-xx}} as a template that invokes the new lang-xx() entry point to Module:Lang
  5. create Template:lang-xx/doc from {{lang-x}}
  6. create language-tagged index of categories (as a new submodule?)
  7. replace appropriate instances of the {{lang-XX|...}} templates with {{lang-xx|XX|...}} where -xx is literal and XX is the language tag and subtags if any (not all are appropriate, {{lang-zh}} for example; there are also {{lang-XX}} templates that have been 'augmented')
  8. delete appropriate {{lang-XX}} templates that are supported by Module:lang (not all are appropriate)
  9. replace instances of {{language-specific templates}} with links to {{lang-xx}}
  10. delete {{language-specific templates}}
  11. cleanup the mess

No doubt I've missed something here, not the least of which is community approval to make this change.

Trappist the monk (talk) 14:34, 18 September 2024 (UTC) 16:11, 18 September 2024 (UTC) +category list 17:16, 18 September 2024 (UTC) strike category list[reply]

Yeah, that all sounds great and I support it. The past few years saw us move from instances of templates with multiple language or country versions, to one single template ({{ISO 639 name}}: TfD; {{In lang}}: TfD (part 1) and TfD (part 2); {{Globalize}}: TfD; {{Contains special characters}}: TfD; {{Wikt-lang}}: TfD), this isn't different. Another option for the name can be {{lang2}} (which currently is an unused unrelated redirect) since "lang-xx" doesn't have any semantic meaning either. Gonnym (talk) 15:24, 18 September 2024 (UTC)[reply]
It also seems that most usages of {{lang-xx}} is from transclusions of Module:Road data/strings/doc. Gonnym (talk) 16:05, 18 September 2024 (UTC)[reply]
I have removed all but 16 usages of {{lang-xx}}. The remaining usages appear to be generated by an error, possibly in this module. If someone wants to dig in to the remaining 16, we can free up the template name for a better use. – Jonesey95 (talk) 16:18, 19 September 2024 (UTC)[reply]
Thanks for doing that. I am beginning to favor {{langx}}; easier to write and no pre-existing conflicts to clean up. I am working to implement {{langx}} in Module:Lang/sandbox:
{{#invoke:lang/sandbox|langx|es|text}}Spanish: text
{{#invoke:lang/sandbox|langx|he|text}}Hebrew: text
Trappist the monk (talk) 16:59, 19 September 2024 (UTC)[reply]
If semantic meaning is a requirement, perhaps the solution is a change to {{lang}} where we add a parameter |<something>= that causes Module:Lang to select lang(), lang_xx_inherit(), or lang_xx_italic depending on the language tag supplied in the template call. The replacement in article space then becomes {{lang-XX|...}}{{lang|XX|<something>=yes|...}}. I can imagine that editors won't like that so much and would want a more-or-less familiar shortcut which brings us back to {{lang-xx}} or {{lang2}} or {{langx}} or {{lang+}} or ...
Trappist the monk (talk) 16:11, 18 September 2024 (UTC)[reply]
I agree. If we add too much character count it will fail. Gonnym (talk) 16:19, 18 September 2024 (UTC)[reply]
Made this topic its own section.
I'm going to commandeer {{langx}} for use as a testbed/demonstrator with a module in my sandbox.
Trappist the monk (talk) 16:44, 18 September 2024 (UTC)[reply]
Category list. The categories listed in these various {{lang-??}} templates (like those listed at Template talk:Lang § Template-protected edit request on 18 September 2024) seem to be mostly collections of related templates (see Category:Iranian multilingual support templates as an example). Because a single {{langx}} template can't be categorized in this way there is no need to support those collection categories. I have struck it from the list.
Trappist the monk (talk) 17:16, 18 September 2024 (UTC)[reply]
Yes, that's another thing that gets simplified. The category system at Category:Wikipedia multilingual support templates will get trimmed by quite a lot. Gonnym (talk) 17:39, 18 September 2024 (UTC)[reply]
The sandbox module is pretty simple, doesn't do error checking (leaves that for _lang_xx() in Module:Lang) and chooses upright font if the language tag is listed in a table of upright tags; italic else:
{{langx|es|casa}}Spanish: casa
{{langx|he|לעז}}Hebrew: לעז
{{langx|aaa|לעז}}Ghotuo: ɔ-kàkà – there is no {{lang-aaa}}
I suppose that the next thing to do is to hack on Module:Lang/sandbox so that it can support both {{langx}} and {{lang-??}}. That will be necessary if or when we transition from the one to the other. I think that we ought to leave support for {{lang-??}} in the module so that the ~155 wikis that use it can adapt to the change in their own time.
Trappist the monk (talk) 18:45, 18 September 2024 (UTC)[reply]
Module:Episode list, Module:Nihongo, and Module:Lang/utilities will need to be adjusted if we transition to {{langx}}.
Trappist the monk (talk) 19:11, 18 September 2024 (UTC)[reply]
Yes, I agree with leaving in the lang-?? support. I think maybe a note should be added to its documentation that this usage is the deprecated method. Gonnym (talk) 19:12, 18 September 2024 (UTC)[reply]
I had thought to enforce the deprecation by testing the value returned in lines 27 & 28; if en and calling _lang_xx, return an error message.
Trappist the monk (talk) 19:33, 18 September 2024 (UTC)[reply]
That's also a good idea. Gonnym (talk) 21:17, 18 September 2024 (UTC)[reply]

a way to mark something as being in multiple languages

[edit]

Maybe this is pie-in-the-sky, or a different matter entirely, but it would be nice if there were a way to mark something as being in multiple languages, e.g., Czech and Slovak from Chort: A chort (Russian: чёрт, Belarusian and Ukrainian: чорт, Serbo-Croatian čort or črt, Polish: czart and czort, Czech and Slovak: čert, Slovene: črt) Snowman304|talk 19:12, 18 September 2024 (UTC)[reply]

Not in these templates. The primary purpose of these templates is to provide correct html markup for non-English text. html allows only one lang= attribute per tag. Which one of these multiple languages would apply? Browsers use this attribute to choose a proper font; screen readers use the attribute to control pronunciation. Do Belarusians and Ukrainians pronounce 'чорт' the same way? If not then that suggests that a different way of writing that lead sentence should be preferred.
Trappist the monk (talk) 19:43, 18 September 2024 (UTC)[reply]
Gotcha, I wasn't thinking about those things at all. Snowman304|talk 21:08, 18 September 2024 (UTC)[reply]

Italics in foreign-language text

[edit]

I'm struggling with what to do with foreign-language text containing italic text while following default rules on foreign-language italicization. Specifically, I'm working on Template:Translated blockquote. The default rules are described at Template:Lang#Automatic italics and defined at Module:Lang#L-996.

Option Source Issue
{{lang|fr|Je suis un clown nommé ''Maurice''|italic=unset}} Category:Lang and lang-xx template errors Doesn't use the default italicization
{{lang|fr|Je suis {{noitalic|English}}.}} Template:Lang#Automatic italics Uses Template:Noitalic, when the content should invert italics relative to the surrounding text.
tûndra Template:Lang#italic parameter Doesn't use the default italicization

I have edited Template:Lang/with italics (permalink) as a proof-of-concept that can accept the following kinds of markup:

Markup Renders as
{{Lang/with italics|en|Some text}}

Some text

{{Lang/with italics|en|Some <i>italic</i> text}}

Some italic text

{{Lang/with italics|fr|Je suis française.}}

Je suis française.

{{Lang/with italics|fr|Je ''suis'' française.}}

Je suis française.

{{Lang/with italics|he|לעז}}

לעז

{{Lang/with italics|he|''לעז''}}

לעז

My implementation is really klunky, so this isn't an edit request. It just seemed easier for me to implement in the template rather than the Lua module.

Questions:

  1. Why doesn't Template:Lang accept italics in its text, as Template:Lang/with italics does?
  2. What do you recommend I do with Template:Translated blockquote? At the moment, it uses |italic=invert. It could use Template:Lang/with italics by a more permanent name, eg Template:Lang/with italics.

Daask (talk) 20:08, 18 September 2024 (UTC)[reply]

{{lang}} emits errors because in the beginning of this module's life, there were a bunch of {{lang|es|''casa''}}, holdovers from the time that Latn-script text had to be manually italicized. This doesn't happen so much anymore now that editors have learned the 'new' way. But, this italics prohibition brought with it the problem of what to do with mixed italic/upright text. The solution to that was |italic=unset and |italic=invert. So far as I know, there has been no call for any other options.
What is wrong with using |italic=invert? Does it not do what you need doing?
Trappist the monk (talk) 21:47, 18 September 2024 (UTC)[reply]
@Trappist the monk: The |italic=default only italicizes Roman-script text, whereas |italic=invert always italicizes the text, regardless of script.
Eg. {{Lang|italic=invert|he|לעז}}לעז vs. {{Lang/with italics|he|לעז}}לעז
Daask (talk) 13:32, 19 September 2024 (UTC)[reply]
Maybe we could add an option |allow-italics=yes to omit error messages about italics within the text? Daask (talk) 13:39, 19 September 2024 (UTC)[reply]
On second thought, Category:Lang and lang-xx template errors is empty except for a citation template issue, so I suggest the Template:Lang/with italics behavior be made the default. These error messages are no longer necessary. Daask (talk) 13:41, 19 September 2024 (UTC)[reply]
I disagree. These italics errors do still appear. The template is responsible for styling the rendered non-English text so it considers italic markup an error unless the editor has explicitly directed the template to allow the markup.
Trappist the monk (talk) 14:44, 19 September 2024 (UTC)[reply]
Yes: |italic=default only italicizes Roman-script text – this determination happens at lines 996–1003; see also lines 94–135
The purpose of invert is to flip italicized text within upright text so that you get upright text within italicized text. This is a completely bogus example because the English text should never be marked up as Hebrew:
{{Lang|italic=invert|he|some italic text followed by inverted Hebrew text ''לעז'' and then some more italic text}}
some italic text followed by inverted Hebrew text לעז and then some more italic text
So, the module inverts everything to the opposite markup:
some italic text followed by inverted Hebrew text ''לעז'' and then some more italic text
some italic text followed by inverted Hebrew text לעז and then some more italic text
becomes:
''some italic text followed by inverted Hebrew text ''לעז'' and then some more italic text''
some italic text followed by inverted Hebrew text לעז and then some more italic text
If there is no italic markup, |italic=invert is the same as |italic=yes as you demonstrated in your example. Conversely, when there is only italicized text:
{{Lang|he|''לעז''|italic=invert}}
לעז
Your example:
{{Lang/with italics|he|''לעז''}}לעז
can be achieved with any of these:
{{Lang|he|לעז|italic=yes}}לעז
{{Lang|he|לעז|italic=invert}}לעז
{{Lang|he|''לעז''|italic=unset}}לעז
These |italic= parameter values are working as they are intended to work.
Trappist the monk (talk) 14:44, 19 September 2024 (UTC)[reply]
@Trappist the monk: I have current set Template:Translated blockquote to use Template:Lang/with italics, because I see no way to use Template:lang. I need the default behavior (which Template:Lang/with italics detects via Template:lang/italicize), but I also need to omit error messages. I apologize for being overly bold in suggesting that the error messages are no longer useful, but I need a means to omit them. Daask (talk) 14:55, 19 September 2024 (UTC)[reply]