[Standards] XEP-0301 0.5 comments -xml:lang

Gunnar Hellström gunnar.hellstrom at omnitor.se
Fri Jul 27 17:05:48 UTC 2012


On 2012-07-27 16:04, Mark Rejhon wrote:
> On Fri, Jul 27, 2012 at 2:11 AM, Gunnar Hellström
> <gunnar.hellstrom at omnitor.se> wrote:
>> I see a need to deal with the 'xml:lang' attribute in XEP-0301.
>>
>> This attribute can introduce alternative language variants of the text in
>> messages and other elements.
>> The use is described in RFC 6221. For us it is of interest to study its use
>> for the <body/> element:
>>
>> ----copy from RFC 6221 section 5.2.3 Body element--------------------------
>> There are no attributes defined for the <body/> element, with the exception
>> of the 'xml:lang' attribute. Multiple instances of the <body/> element MAY
>> be included in a message stanza for the purpose of providing alternate
>> versions of the same body, but only if each instance possesses an 'xml:lang'
>> attribute with a distinct language value (either explicitly or by
>> inheritance from the 'xml:lang' value of an element farther up in the XML
>> hierarchy, which from the sender's perspective can include the XML stream
>> header as described in [XMPP-CORE]).
>>
>> <message from='juliet at example.com/balcony'
>>   id='z94nb37h' to='romeo at example.net' type='chat' xml:lang='en'>
>>    <body>Wherefore art thou, Romeo?</body>
>>    <body xml:lang='cs'> Pro&#x010D;e&#x017D; jsi ty, Romeo? </body>
>>   </message>
>>
>> -----------end of copy---------------------------------
>>
>> For XEP-0301 it would be natural to either offer the same opportunity to
>> provide the alternative languages in the same message, or explicitly say
>> that alternative languages are not supported.
>>
>> This would at least go into section 4.2 RTT attributes and 4.5.3.1 <t/>
>> element
>>
>> Each language will have its own editing elements and values, so the xml:lang
>> attribute should be on the <rtt/> level.
>>
>> I propose insertion a new subsection in 4.2
>> -----------------------------------------------------------------------------------------------------------------------------------------------
>> 4.2.4 Language
>> Multiple instances of the <rtt/> element MAY be included in a message stanza
>> for the purpose of providing alternate versions of the same real-time text,
>> but only if each instance possesses an 'xml:lang' attribute with a distinct
>> language value (either explicitly or by inheritance from the 'xml:lang'
>> value of an element farther up in the XML hierarchy, which from the sender's
>> perspective can include the XML stream header as described in RFC 6220 [
>> ]). The support for language variants SHALL follow the principles of support
>> for language variants in message bodies specified in RFC 6221[   ].
>>
>> This example provides a small part of real-time text in the default language
>> English and the alternative language Check.
>>
>> <message from='juliet at example.com/balcony'
>>   id='z94nb37h' to='romeo at example.net' type='chat' xml:lang='en'>
>>    <rtt xmlns='urn:xmpp:rtt:0' seq='89002'><t>tho</t></rtt>
>>    <rtt xmlns='urn:xmpp:rtt:0' seq='32304' xml:lang='cs'> <t>ty</t></rtt>
>>   </message>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------
>> The second line from the bottom of 4.1 should be changed from
>> "There MUST NOT be more than one <rtt/> element per <message/> stanza."
>> to
>> "There MUST NOT be more than one <rtt/> element per language variant in each
>> <message/> stanza."
>>
>> -----------------------------------------------------
>> Gunnar
> This best require delibrations for an extended period --
> People can only type on one keyboard simultaneously, so this is of
> interest only in special situations such as simultaneous interpreters
> running concurrently (e.g. European Union, United Nations meetings).
> Although you can solve this mechanism by having separate nicknames for
> each language (InterpreterEN, InterpreterFR, InterpreterCS, etc.)
>
> The XSF meeting logs show that other people want to review Version 0.6
> this weekend, so I'm going to submit 0.6 tonight.  Since more
> deliberations are needed about the language, I am going to need to
> leave this out of Version 0.6 unless there's pressure from XSF, or
> unless there's a good reason (e.g. Europeans promised quick inclusion
> of XEP-0301 during simultaneous-multiple-translation at European Union
> meetings) to say "HOLD THE PRESSES"
>
> Also, here is an alternate method that keeps one <rtt/> per message stanza.
> I suggest that this is preferable, because the interpreters will be
> typing keypresses separately of each other, and interpreters may have
> pauses independently of each other, so there's no good reason to
> combine multiple <rtt/> into the same message stanza:
>
> <message from='juliet at example.com/balcony'
>   id='z94nb37h' to='romeo at example.net' type='chat' xml:lang='en'>
>     <rtt xmlns='urn:xmpp:rtt:0' seq='89002'><t>Hello</t></rtt>
> </message>
>
> <message from='juliet at example.com/balcony'
>   id='z94nb37h' to='romeo at example.net' type='chat' xml:lang='fr'>
>     <rtt xmlns='urn:xmpp:rtt:0' seq='32304'><t>Bonjour</t></rtt>
> </rtt>
>
> The advantage is that the above continues to use the existing XEP-0301
> protocol, and keeps the language attribute out of the <rtt/> element.
>   It is more backwards compatible, I think.  Clients that don't track
> multiple languages, will just simply focus only on the default
> language (if it already filters XML to a specific language) or will
> simply stall ("Keeping Real-Time Text Synchronized"), while clients
> that distinguish the language attribute, would know to keep separate
> real-time messages per language.  This can also easily be done as a
> private extension for a single specialized client.   On the other
> hand, at this time, I need an opinion from XSF about whether this is
> acceptable to hold off to 0.7, since this is a very "niche" and
> specialized feature, but it can have merit in international and
> supranational organizations, where members of the public might
> download off-the-market software to watch captioning/translations in
> their languages.
>
> XSF: I need comments by the end of today, about whether it is OK to
> hold this off till 0.7.
> Gunnar: I need comments, is this related to the European procurement
> interests that you told me about?
>
> Thanks
> Mark Rejhon
This was a pure discovery of a missing attribute that I found when I was 
searching in RFC 6221 for info on Unicode and normalization.
I saw that <body/> and many other elements have the xml:lang attribute 
specified.
So, it does not come from any urgent need.
It is more caused by an ambition to make the <rtt/> element consistent 
with the habits of other XMPP elements.

The example is directly taken from the corresponding specification for 
<body/> in RFC 6221. It is equally unrealistic that a person composes 
two language variants of a message before sending them off, as it is 
that a person sends two language variants of rtt in the same <message/>. 
So, with that example I assume that it is some application generated 
text that is available in multiple language variants.

In this case we do not need to be so realistic with the example. I think 
it is better that it aligns with the example for <body/> in RFC 6221.

I do not think that it is super important to include this attribute. But 
I served you with ready text, so that it would be convenient for you to 
just merge it into the specification.


Gunnar









More information about the Standards mailing list