[Standards-JIG] JEP-0171 (Translation): One successful upon-receipt model

Jay Carlson nop at mitre.org
Mon Feb 20 04:10:00 UTC 2006

In order to explain what we're concerned about with JEP-0171 (Translation),
I'm going to back up a little bit and talk about the architecture of the
TrIM system.  Translingual Instant Messaging is a prototype IM system MITRE
built to explore the viability, deployability, and utility of automatic
machine translation coupled to text chat.  TrIM's spent a couple years
touring various coalition military exercises and operations, so I believe we
have a pretty good feel for at least one model we know works.

Since we're lazy---uh, I mean thrifty---TrIM is implemented on top of the
SIMP protocol, which was a strawman standard we did way back when the IMPP
working group hadn't quite made it to specifics yet, and Jabber didn't have
a real specification.  You can read about SIMP at http://simp.mitre.org/ and
download an Open Source implementation, but it's probably not worth it at
this point.  It's a bit Neanderthal in how it uses XML, and for the purposes
of this discussion, there are only two significant differences between XMPP
and SIMP.

The first is that you get an end-to-end confirmation of message delivery.
This potentially allows faster failure when translators fall over, but is
not all that relevant in practice due to asynchrony, covered below.

The second is that presence data has arbitrary key/value pairs.  This allows
us to stuff a preferred language field into each user's profile in a
distinguished spot.  Enough about protocols; what does a user see?

Each user, as part of enrollment, is prompted to fill in their preferred
language.  The preferred language is marked as part of each outgoing
message.  Also during enrollment they pick a translation wrapper from a list
preconfigured in the client.  They can change these at any time.

Wrappers are bots that respond to specially mime-typed translation requests
containing source text, source language, and requested destination language.
They may respond with an error message; the requestor would then either
retry the request, or go down the list to another preconfigured wrapper

>From the user's point of view, a TrIM person-to-person conversation between
people speaking different languages looks like any other AIM-style window,
except that the scrollback has two columns.  The left is the original
message, as transmitted or received.  The right is the result of machine
translation, and it is always provided.  Because received messages have lang
tags, we already know when there's a mismatch between an incoming message
and the user's preferred language.

TrIM shows machine translation of outgoing messages because many users have
some passive facility in reading foreign languages.  There are a lot of
people out there who can read some English.  And although I can't read more
than ~40 words in Japanese, I'll definitely notice when my misspelled word
like "plausability" shows up in Roman characters in the middle of a sea of
hiragana and kanji.

>From this, you'll guess that TrIM is a translate-before-send system.  That's
not true.  TrIM is a translate-on-receipt system.  Translated messages are
never exchanged between users.  The displayed translation of outbound
messages is only a convenience to you, and is done through your own
configured translator bot.  For our deployed systems, there is almost always
homogeneity of translation engine configuration, so if I see "hello"
translated one way, you'll see it the same way when you talk to your
identically-configured bot.  Not exactly Internet-scale....

Translation is asynchronous.  That is, message transmission or
display-on-receipt never blocks on getting a translation.  That right-hand
column may take a few seconds to catch up, but you don't have to wait for
it.  In fact, translations may be displayed out of order, and frequently are
when using heavily concurrent translation engines.  To expedite the matching
of source text to translation, translation requests to the bots include a
request ID, which is matched up with the right row in the scrollback to
insert the response.  If a request takes too long, try another bot.

Because all users have a preferred language defined in their presence
information, TrIM can provide an appropriate translation even for the first
message in a conversation: there's no need to wait for someone to respond in
Korean before showing outgoing messages translated into Korean.

Chat rooms are...an interesting UI challenge.  I think we've made it to
about seven concurrent legitimately-preferred languages in use in a single
chat room.  Whenever you send an outgoing message, do you want to see seven
rows of translations?  The current TrIM client punts, and declares that
you're most likely talking to the most recent foreign language speaker, and
just shows that translation.

What does this have to do with JEPs?

Since we've been screaming to our sponsors for at least seven years or so
about the necessity of standardization for collaboration tools, it will
probably not surprise you that we now have a few XMPP-based tools with
machine translation.  (We should have had these done a while back, but we're
often at the mercy of sponsor funding whims.)  

Because of both NIH and a thorough understanding of the positive and
negative security issues of the existing TrIM architecture, the tools we
have tend to replicate the exact TrIM message exchange patterns, except now
over a standard protocol.  Aside from discovery, they look a lot like what's
in JEP-0171; this should come as no surprise, as there aren't all that many
reasonable ways to phrase an RPC service for machine translation.

What 0171 is missing is a reliable way of associating translation requests
and translation replies.  In a translation-on-receipt model, it's much more
likely that multiple requests are outstanding at any given time.

We could use the original source text as a hash key; because the original
text is always provided with 0171 translation responses, it could serve as
the unique identifier I'm looking for.

The problem is that translation is not guaranteed to be idempotent.  In the
presence of human translation or fixup, this gets much worse.  Let me give
you an example (excuse the militarism).

User1: My tanks are leaking.
User2: Your [water tank] is leaking?
User1: That's the wrong word.  I'm talking about my armored vehicles.
User1: My tanks are leaking.

The optimal translation of the last line is different than the naive
translation of the first line.  In the presence of human translators for the
whole conversation, they will have a better sense of how to translate for
the remainder of the conversation, and therefore using "My tanks are
leaking." as a hash key to uniquely identify a response loses.

TrIM doesn't support human translator intervention, and I'm not aware of any
implemented system that does.  A few of the upcoming research machine
translation systems *do* support context beyond a single request, though.  

In any case, anybody with a translation-on-receipt model needs to have a
reliable way of having multiple translation requests in the air, and
returned in indeterminate order.  The simplest way we can think of to
support this is to allow requestors to specify a thread id on request, and
have the response include it.

I think the JEP should be modified to state that translation responses will
echo the thread id of the request, and that will solve the showstopper for
us.  This doesn't deal with the issue of providing continuing context to a
translation engine, but I think it's much closer.  I don't know enough XMPP
to figure out how to solve both the need for a unique ID for a
request/response pair, and an association with an ongoing translation
"I was unaware irony had military utility."

More information about the Standards mailing list