simon.jabber at a-oben.org
Tue Feb 13 16:57:55 UTC 2018
During the discussion on the different ID types at the summit I had an
a possible solution to the problem but not a sufficient understanding of the
problem to even discuss it. I tried to find somebody to discuss it with
afterwards but nobody was available and I forgot about it. To get it off
list, here is my current understanding. I hope it can be a basis for further
Currently there are
A1. stanza-ID: generated by server
A2. origin-ID: generated by client
from https://xmpp.org/extensions/xep-0359.html and
A3. message-ID: this is the ID-attribute on the stanza
There are also (4.) SM-IDs in stream management but those are
B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
B2. MUCs require IDs to detect reflections of own messages.
And reflection is great because it gives everybody the same view
MUC in the presence of things like autopastebin or other rewrites.
B3. Error responses have the same ID-attribute as the original stanza.
C) Problems with current situation:
C1. People dislike having so many different IDs.
This is not a problem per se but it does mean implementation
C2. According to Daniel it is not clear which ID should be used when
referencing things. In other words if he gets a delivery receipt
ID the client might have based that on the origin-ID or the
I'm not sure if this should be considered relevant. People can
write broken clients which send back crap. Of course if it happens
unintentionally because of (C1.) fewer IDs would help
C3. Using origin-ID to detect MUC reflection doesn't always work
may not reflect it.
That's of course unfortunate but should IMHO considered an error
MUC implementation (probably a transport) and fixed there. I
that it might be difficult in some cases
( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel
already pointed out yesterday it is much easier to fix a transport,
since it knows which protocol it is talking, to instead of working
around it at the end.
In any case the current situation seems to be bad:
C4. Clients require a bounce of their messages to learn the
is used for MAM.
Why do they need to know? Maybe they want to reference their own
Do they require this bounce anyway to make sure that their was
C5. Some MUCs rewrite the message-id
Why is this allowed? It is even suggested here:
C6. A global ID to reference messages might be nice.
C7. When referencing a message for example by "liking" it a forgeable ID
could get you to like things you didn't intend to like.
This is a difficult problem because in many cases it requires
clients and servers and those have a lot of power anyway.
D) Possible root cause:
People do not trust the message IDs assigned by others and therefore
assign their own.
E) Suggested solutions, including partial solutions:
E1. message-ID and origin-ID should always be the same, as proposed
Some concerns where voiced in that thread the only valid one is
to bad software we need to deal with the situation that they are
There was a privacy concern about the "by=" attribute but
not actually have that.
According to Daniel and Georg things currently break down anyway
does not hold.
E2. Make the ID verifiable: This is what I had in mind at the summit and
after some discussion yesterday Jonas and Dave basically immediately
came up with the same thing, so it might be reasonably
Basically, the client calculates the ID based on some
it shares with the server like HASH(stream-id || sm-counter).
allow the server to verify that the client generated a proper
suggested HMAC(key=stream-id, msg=sm-counter). If the message is
MUC, the MUC server can provide the user with some salt and then a
HASH(message-counter || salt) could be used to ensure that
IDs are generated.
This ID is based on there being a party which is in charge of
the IDs. If you connect to a malicious MUC with malicious
can still send you whatever. I don't think that is a problem, is it?
E3. Simply make the ID: FROM-TIMESTAMP.
Here FROM needs to be the eventual FROM after possible
that be done?
And TIMESTAMP has to be strictly increasing so should have
I assume this is impossible because otherwise it would be to
why is it impossible? :)
F1. Would it be useful to have monotonically increasing IDs?
It seems these might be useful if not necessary to query the MAM or
some other archive for certain periods? I'm not sure.
F2. Discussions about malicious forgery of responses when IDs are
ended with the assumption that this is impossible because the
needs to be properly verified anyway.
F3. Zash wants to use timestamps in the MAM-ID. Why? Because of (F1.)?
F4. Related to (F1.): Would good IDs, possibly monotonically
simplify the problems that MAM and SM are solving?
I would be very happy if people would comment! :)
More information about the Standards