[Standards] Message-IDs

Kevin Smith kevin.smith at isode.com
Wed Feb 28 08:59:01 UTC 2018

On 13 Feb 2018, at 16:57, Simon Friedberger <simon.jabber at a-oben.org> wrote:
> During the discussion on the different ID types at the summit I had an
> idea for
> a possible solution to the problem but not a sufficient understanding of the
> problem to even discuss it. I tried to find somebody to discuss it with
> in chat
> afterwards but nobody was available and I forgot about it. To get it off
> my ToDo
> list, here is my current understanding. I hope it can be a basis for further
> discussion.
> A) Status-Quo:
>     Currently there are
>         A1. stanza-ID: generated by server
>         A2. origin-ID: generated by client
>             from https://xmpp.org/extensions/xep-0359.html and
>         A3. message-ID: this is the ID-attribute on the stanza
>             from https://tools.ietf.org/html/rfc6120#section-8.1.3
>     There are also (4.) SM-IDs in stream management but those are
> per-stream and
>     unrelated.
> B) Use-cases:
>     B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
>     B2. MUCs require IDs to detect reflections of own messages.
>         And reflection is great because it gives everybody the same view
> on the
>         MUC in the presence of things like autopastebin or other rewrites.
>     B3. Error responses have the same ID-attribute as the original stanza.
> C) Problems with current situation:
>     C1. People dislike having so many different IDs.
>         This is not a problem per se but it does mean implementation
> complexity
>         and confusion.

I think confusion I buy - we need to be careful to define things properly.

>     C2. According to Daniel it is not clear which ID should be used when
>         referencing things. In other words if he gets a delivery receipt
> for an
>         ID the client might have based that on the origin-ID or the
> message-ID.
>         I'm not sure if this should be considered relevant. People can
> always
>         write broken clients which send back crap. Of course if it happens
>         unintentionally because of (C1.) fewer IDs would help

I don’t think this is particularly unclear, (it’s the id of the stanza - all the other ids are newer inventions with specific contexts), but easy to clarify.

>     C3. Using origin-ID to detect MUC reflection doesn't always work
> because MUCs
>         may not reflect it.
>         That's of course unfortunate but should IMHO considered an error
> in the
>         MUC implementation (probably a transport) and fixed there.

Mayyyybe. I note that MUCs stripping out non-body payloads is actually a feature in some servers.

>     C4. Clients require a bounce of their messages to learn the
> stanza-id which
>         is used for MAM.
>         Why do they need to know? Maybe they want to reference their own
> message.

They need to know where their stanza sits in the ordering of the archive (and its id) if they want to be able to do sync later.

>         Do they require this bounce anyway to make sure that their was
> on rewriting?


>     C5. Some MUCs rewrite the message-id
>         Why is this allowed? It is even suggested here:
>         https://xmpp.org/extensions/xep-0045.html#message

Mostly it’s allowed because the spec didn’t say not to do it, and it got moved to Draft, and it was implemented, and so the rules of “don’t make breaking changes unless unavoidable” applied and it couldn’t be sensibly changed.

>     C6. A global ID to reference messages might be nice.
>     C7. When referencing a message for example by "liking" it a forgeable ID
>         could get you to like things you didn't intend to like.
>         This is a difficult problem because in many cases it requires
> malicious
>         clients and servers and those have a lot of power anyway.

Not that much power, relatively. They’re not usually able to rewrite history in a meaningful way, but with this they become able to (look like they) do that.

> D) Possible root cause:
>     People do not trust the message IDs assigned by others and therefore
> want to
>     assign their own.

I’m not sure what this is saying - the root cause of *what*?

> E) Suggested solutions, including partial solutions:
>     E1. message-ID and origin-ID should always be the same, as proposed
> by Georg
>         in
> https://mail.jabber.org/pipermail/standards/2017-September/033415.html
>         Some concerns where voiced in that thread the only valid one is
> that due
>         to bad software we need to deal with the situation that they are
>         different anyway.
>         There was a privacy concern about the "by=" attribute but
> origin-ID does
>         not actually have that.
>         According to Daniel and Georg things currently break down anyway
> if this
>         does not hold.

>     E2. Make the ID verifiable: This is what I had in mind at the summit and
>         after some discussion yesterday Jonas and Dave basically immediately
>         came up with the same thing, so it might be reasonably
> straightforward.
>         Basically, the client calculates the ID based on some
> information that
>         it shares with the server like HASH(stream-id || sm-counter).
> This would
>         allow the server to verify that the client generated a proper
> ID. Jonas
>         suggested HMAC(key=stream-id, msg=sm-counter). If the message is
> in a
>         MUC, the MUC server can provide the user with some salt and then a
>         HASH(message-counter || salt) could be used to ensure that
> proper unique
>         IDs are generated.
>         This ID is based on there being a party which is in charge of
> checking
>         the IDs. If you connect to a malicious MUC with malicious
> clients they
>         can still send you whatever. I don't think that is a problem, is it?

It depends which problem you’re trying to solve.

>     E3. Simply make the ID: FROM-TIMESTAMP.
>         Here FROM needs to be the eventual FROM after possible
> rewriting. Can
>         that be done?
>         And TIMESTAMP has to be strictly increasing so should have
> sub-second
>         resolution.
>         I assume this is impossible because otherwise it would be to
> easy. But
>         why is it impossible? :)

Because timestamps aren’t monotonic? :)

> F) Left-overs:
>     F1. Would it be useful to have monotonically increasing IDs?
>         It seems these might be useful if not necessary to query the MAM or
>         some other archive for certain periods? I'm not sure.

I think enforcing a particular scheme on a server for their MAM IDs is probably not a great idea. These are the sorts of things that a server can encode information in to make querying work nicely for it.

>     F2. Discussions about malicious forgery of responses when IDs are
> predictable
>         ended with the assumption that this is impossible because the
> receiver
>         needs to be properly verified anyway.
>     F3. Zash wants to use timestamps in the MAM-ID. Why? Because of (F1.)?

For fast access to the library, I believe. I think he uses the MAM-ID as a meta-index on his database.

>     F4. Related to (F1.): Would good IDs, possibly monotonically
> increasing ones
>         simplify the problems that MAM and SM are solving?

I think we should leave SM out of this, as that’s at a different layer, and applies to all stanzas, not just ones with IDs.


More information about the Standards mailing list