[Standards] MIX Addressing

Kevin Smith kevin.smith at isode.com
Fri Jun 1 07:29:15 UTC 2018



> On 31 May 2018, at 21:43, Jonas Wielicki <jonas at wielicki.name> wrote:
> 
> On Donnerstag, 31. Mai 2018 13:45:06 CEST Kevin Smith wrote:
>> 1) Stick with proxy JIDs and user%channel at domain[/resource] (or similar),
>> with the resource missed off for bare-JID traffic, where
>> ‘user%channel at domain’ as the proxy JID is the user’s identifier used
>> everywhere.
>> 2) Drop proxy JIDs, use channel at domain/user[/resource] and then
>> ‘user’ is just a string identifier for the user, of whatever format (as
>> long as it doesn’t contain ‘/‘).
> 
> While discussing about both options in xsf@ with MattJ, we realized that both 
> approaches have the same issue, but in a different part of the JID: 
> Information about identities is merged in a part of the JID. 

Yes, this is the point I was originally making - four bits of information need to be encoded into three parts of a JID.

> Different infrastructure components will have different issues with this:
> 
> - Servers might want to block a MIX channel from sending stuff to the user, so 
> they want to know the MIX identity a message is coming from. For this, Variant 
> 2 is more convenient (bare JID == Channel).

Yep.

> - Clients might want to operate on occupant identities (as I mentioned 
> elsewhere), and for this Variant 1 is more convenient (bare JID == Occupant).

Although not that much more convenient, I think.


> In both variants, one component will have to deal with special-casing MIX JIDs 
> in places which normally would not have to know about MIX. The bad thing here 
> is that a non-RFC6122 operation on JIDs is needed, which are otherwise fairly 
> well defined. This also has implications with respect to length limits in the 
> different parts of the JID. In variant 2, the server would have to Address-
> Translate the users resource to prevent the resourcepart from exceeding limits 
> in edge cases. In variant 1, the MIX channel identifier would be restricted by 
> the length of user identifiers on the service.

While both are true, when we’re talking about JIDs with maximum size of a k, I think this is mostly a footnote rather than an operational concern (1022 characters should be enough for anyone).

> So here’s a straw-man proposal, Variant 3 (because, creating many variants is 
> what we’re good at!):
> 
> An occupant is identified by an occupant-identifier. The occupant JID is 
> occupant-identifier at mix-service. The channel to which a message belongs is 
> identified with a payload item. Example message:
> 
>  <message type="groupchat"
>           from="4973d5d365f8 at mixservice.domain.example/client-resource"
>           to="user at other.example">
>    <mix channel="some-channel"/>
>    <body>...</body>
>  </message>
> 
> Advantages:
> 
> - No re-write of resources needed (good for MIXes)
> - Bare JID refers to occupant identity (good for clients)
> - Servers can simply filter on message/mix/@channel (not perfect, but better 
> than requiring a new JID processing function)
> - Opens up the possibility of re-using the same proxy JID for the same 
> occupant across different channels (may be useful in some deployments, via 
> MattJ)
> - No non-RFC6122-based operations required on JIDs.
> 
> Disadvantages:
> 
> - All (including 1:1) stanzas exchanged between occupants require the <mix 
> channel="…"/> element for MIX channels to be able to easily route them
> - Entities filtering on MIX channel identity still need to know about MIX (and 
> the <mix channel="…"/> element)
> - The namespace of MIX channel JIDs and occupant JIDs needs to be separated in 
> some way. This can be achieved with a single bit, so a forced prefix on 
> occupant JIDs (and a forbidden prefix on channel JIDs), such as "%", would 
> work for that. (I’m not sure if we would have to standardize the method by 
> which services do this split.)
> 
> 
> What do you folks think?

I don’t want to go as far as saying I hate it, but I certainly don’t like it, I think it’s the worst of the three (now four) options. It completely removes all context from the header, and you have to go snooping inside every packet to work out where it’s really from (because for almost all processing concerns it’s the channel that’s regarded as the sender, not the originator). Now your archive will be completely broken without quite excessive DPI on the MAM server, and your client’s internal routing is no longer straightforwardly hierarchical based on the from. You’re also opening the door to fun types of exploits (as now you’ve got an entity saying “My routing information says I’m coming from X, but really I’m from Y, honest” and you need protection against this throughout the system.

I appreciate trying to come up with better solutions than 1/2 (neither of which is ideal) and the space is complicated, but in this case I very much don’t think this is the best option.

/K


More information about the Standards mailing list