[Standards] Message-IDs

Michal Piotrowski michal.piotrowski at erlang-solutions.com
Wed Feb 14 08:45:05 UTC 2018

Hi Simon

Thanks for refreshing the topic.
Few things from me below (perspective of XMPP server developer, MongooseIM).

Best regards
Michal Piotrowski
michal.piotrowski at erlang-solutions.com

On 13 February 2018 at 17:57, Simon Friedberger <simon.jabber at a-oben.org>

> Hello List!
> During the discussion on the different ID types at the summit I had an
> idea for
> a possible solution to the problem but not a sufficient understanding of
> the
> problem to even discuss it. I tried to find somebody to discuss it with
> in chat
> afterwards but nobody was available and I forgot about it. To get it off
> my ToDo
> list, here is my current understanding. I hope it can be a basis for
> further
> discussion.
> A) Status-Quo:
>     Currently there are
>         A1. stanza-ID: generated by server
>         A2. origin-ID: generated by client
>             from https://xmpp.org/extensions/xep-0359.html and
>         A3. message-ID: this is the ID-attribute on the stanza
>             from https://tools.ietf.org/html/rfc6120#section-8.1.3
>     There are also (4.) SM-IDs in stream management but those are
> per-stream and
>     unrelated.
> B) Use-cases:
>     B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
>     B2. MUCs require IDs to detect reflections of own messages.
>         And reflection is great because it gives everybody the same view
> on the
>         MUC in the presence of things like autopastebin or other rewrites.
>     B3. Error responses have the same ID-attribute as the original stanza.
> C) Problems with current situation:
>     C1. People dislike having so many different IDs.
>         This is not a problem per se but it does mean implementation
> complexity
>         and confusion.

I'm really tempted to say that the new message routing (in next gen XMPP as
discussed during summit)
must require the message stanza to have "id" attribute. I personally think
that uuid v4 would enough here.
This, to my knowledge, is hard to guess so a malicious user is probably not
able to guess next ID.
What it can do, though is to "reuse" the same id in other message, which
maybe a bad thing.

    C2. According to Daniel it is not clear which ID should be used when
>         referencing things. In other words if he gets a delivery receipt
> for an
>         ID the client might have based that on the origin-ID or the
> message-ID.
>         I'm not sure if this should be considered relevant. People can
> always
>         write broken clients which send back crap. Of course if it happens
>         unintentionally because of (C1.) fewer IDs would help
>     C3. Using origin-ID to detect MUC reflection doesn't always work
> because MUCs
>         may not reflect it.
>         That's of course unfortunate but should IMHO considered an error
> in the
>         MUC implementation (probably a transport) and fixed there. I
> understand
>         that it might be difficult in some cases
>         ( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel
>         already pointed out yesterday it is much easier to fix a transport,
>         since it knows which protocol it is talking, to instead of working
>         around it at the end.
>         In any case the current situation seems to be bad:
> https://wiki.xmpp.org/web/XEP-Remarks/XEP-0045:_Multi-User_
> Chat#Matching_Your_Reflected_Message
>     C4. Clients require a bounce of their messages to learn the
> stanza-id which
>         is used for MAM.
>         Why do they need to know? Maybe they want to reference their own
> message.

They may need that, for instance, to know where from they can start syncing
the archive after being offline.

>         Do they require this bounce anyway to make sure that their was
> on rewriting?
>     C5. Some MUCs rewrite the message-id
>         Why is this allowed? It is even suggested here:
>         https://xmpp.org/extensions/xep-0045.html#message
>     C6. A global ID to reference messages might be nice.
>     C7. When referencing a message for example by "liking" it a forgeable
> ID
>         could get you to like things you didn't intend to like.
>         This is a difficult problem because in many cases it requires
> malicious
>         clients and servers and those have a lot of power anyway.
> D) Possible root cause:
>     People do not trust the message IDs assigned by others and therefore
> want to
>     assign their own.
> E) Suggested solutions, including partial solutions:
>     E1. message-ID and origin-ID should always be the same, as proposed
> by Georg
>         in
> https://mail.jabber.org/pipermail/standards/2017-September/033415.html
>         Some concerns where voiced in that thread the only valid one is
> that due
>         to bad software we need to deal with the situation that they are
>         different anyway.
>         There was a privacy concern about the "by=" attribute but
> origin-ID does
>         not actually have that.
>         According to Daniel and Georg things currently break down anyway
> if this
>         does not hold.
>     E2. Make the ID verifiable: This is what I had in mind at the summit
> and
>         after some discussion yesterday Jonas and Dave basically
> immediately
>         came up with the same thing, so it might be reasonably
> straightforward.
>         Basically, the client calculates the ID based on some
> information that
>         it shares with the server like HASH(stream-id || sm-counter).
> This would
>         allow the server to verify that the client generated a proper
> ID. Jonas
>         suggested HMAC(key=stream-id, msg=sm-counter). If the message is
> in a
>         MUC, the MUC server can provide the user with some salt and then a
>         HASH(message-counter || salt) could be used to ensure that
> proper unique
>         IDs are generated.
>         This ID is based on there being a party which is in charge of
> checking
>         the IDs. If you connect to a malicious MUC with malicious
> clients they
>         can still send you whatever. I don't think that is a problem, is
> it?

Making the id verifiable (in the most efficient way) would be perfect.
I think, here we need to remember that no every client will have SM
enabled, so it may not have the sm-counter.

>     E3. Simply make the ID: FROM-TIMESTAMP.
>         Here FROM needs to be the eventual FROM after possible
> rewriting. Can
>         that be done?
>         And TIMESTAMP has to be strictly increasing so should have
> sub-second
>         resolution.
>         I assume this is impossible because otherwise it would be to
> easy. But
>         why is it impossible? :)
> F) Left-overs:
>     F1. Would it be useful to have monotonically increasing IDs?
>         It seems these might be useful if not necessary to query the MAM or
>         some other archive for certain periods? I'm not sure.
>     F2. Discussions about malicious forgery of responses when IDs are
> predictable
>         ended with the assumption that this is impossible because the
> receiver
>         needs to be properly verified anyway.
>     F3. Zash wants to use timestamps in the MAM-ID. Why? Because of (F1.)?
>     F4. Related to (F1.): Would good IDs, possibly monotonically
> increasing ones
>         simplify the problems that MAM and SM are solving?
> I would be very happy if people would comment! :)
> Regards,
> Simon
> _______________________________________________
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: Standards-unsubscribe at xmpp.org
> _______________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20180214/bf3a819c/attachment-0001.html>

More information about the Standards mailing list