[Standards] Message-IDs

Simon Friedberger simon.jabber at a-oben.org
Tue Feb 13 16:57:55 UTC 2018


Hello List!


During the discussion on the different ID types at the summit I had an
idea for
a possible solution to the problem but not a sufficient understanding of the
problem to even discuss it. I tried to find somebody to discuss it with
in chat
afterwards but nobody was available and I forgot about it. To get it off
my ToDo
list, here is my current understanding. I hope it can be a basis for further
discussion.


A) Status-Quo:
    Currently there are
        A1. stanza-ID: generated by server
        A2. origin-ID: generated by client
            from https://xmpp.org/extensions/xep-0359.html and

        A3. message-ID: this is the ID-attribute on the stanza
            from https://tools.ietf.org/html/rfc6120#section-8.1.3

    There are also (4.) SM-IDs in stream management but those are
per-stream and
    unrelated.


B) Use-cases:
    B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
    B2. MUCs require IDs to detect reflections of own messages.
        And reflection is great because it gives everybody the same view
on the
        MUC in the presence of things like autopastebin or other rewrites.
    B3. Error responses have the same ID-attribute as the original stanza.


C) Problems with current situation:
    C1. People dislike having so many different IDs.
        This is not a problem per se but it does mean implementation
complexity
        and confusion.
    C2. According to Daniel it is not clear which ID should be used when
        referencing things. In other words if he gets a delivery receipt
for an
        ID the client might have based that on the origin-ID or the
message-ID.
        I'm not sure if this should be considered relevant. People can
always
        write broken clients which send back crap. Of course if it happens
        unintentionally because of (C1.) fewer IDs would help
    C3. Using origin-ID to detect MUC reflection doesn't always work
because MUCs
        may not reflect it.
        That's of course unfortunate but should IMHO considered an error
in the
        MUC implementation (probably a transport) and fixed there. I
understand
        that it might be difficult in some cases
        ( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel
        already pointed out yesterday it is much easier to fix a transport,
        since it knows which protocol it is talking, to instead of working
        around it at the end.
        In any case the current situation seems to be bad:
       
https://wiki.xmpp.org/web/XEP-Remarks/XEP-0045:_Multi-User_Chat#Matching_Your_Reflected_Message
    C4. Clients require a bounce of their messages to learn the
stanza-id which
        is used for MAM.
        Why do they need to know? Maybe they want to reference their own
message.
        Do they require this bounce anyway to make sure that their was
on rewriting?
    C5. Some MUCs rewrite the message-id
        Why is this allowed? It is even suggested here:
        https://xmpp.org/extensions/xep-0045.html#message
    C6. A global ID to reference messages might be nice.
    C7. When referencing a message for example by "liking" it a forgeable ID
        could get you to like things you didn't intend to like.
        This is a difficult problem because in many cases it requires
malicious
        clients and servers and those have a lot of power anyway.


D) Possible root cause:
    People do not trust the message IDs assigned by others and therefore
want to
    assign their own.


E) Suggested solutions, including partial solutions:
    E1. message-ID and origin-ID should always be the same, as proposed
by Georg
        in
https://mail.jabber.org/pipermail/standards/2017-September/033415.html
        Some concerns where voiced in that thread the only valid one is
that due
        to bad software we need to deal with the situation that they are
        different anyway.
        There was a privacy concern about the "by=" attribute but
origin-ID does
        not actually have that.
        According to Daniel and Georg things currently break down anyway
if this
        does not hold.
    E2. Make the ID verifiable: This is what I had in mind at the summit and
        after some discussion yesterday Jonas and Dave basically immediately
        came up with the same thing, so it might be reasonably
straightforward.
        Basically, the client calculates the ID based on some
information that
        it shares with the server like HASH(stream-id || sm-counter).
This would
        allow the server to verify that the client generated a proper
ID. Jonas
        suggested HMAC(key=stream-id, msg=sm-counter). If the message is
in a
        MUC, the MUC server can provide the user with some salt and then a
        HASH(message-counter || salt) could be used to ensure that
proper unique
        IDs are generated.
        This ID is based on there being a party which is in charge of
checking
        the IDs. If you connect to a malicious MUC with malicious
clients they
        can still send you whatever. I don't think that is a problem, is it?
    E3. Simply make the ID: FROM-TIMESTAMP.
        Here FROM needs to be the eventual FROM after possible
rewriting. Can
        that be done?
        And TIMESTAMP has to be strictly increasing so should have
sub-second
        resolution.
        I assume this is impossible because otherwise it would be to
easy. But
        why is it impossible? :)


F) Left-overs:
    F1. Would it be useful to have monotonically increasing IDs?
        It seems these might be useful if not necessary to query the MAM or
        some other archive for certain periods? I'm not sure.
    F2. Discussions about malicious forgery of responses when IDs are
predictable
        ended with the assumption that this is impossible because the
receiver
        needs to be properly verified anyway.
    F3. Zash wants to use timestamps in the MAM-ID. Why? Because of (F1.)?
    F4. Related to (F1.): Would good IDs, possibly monotonically
increasing ones
        simplify the problems that MAM and SM are solving?



I would be very happy if people would comment! :)

Regards,
Simon


More information about the Standards mailing list