[Standards] Message-IDs

Kevin Smith kevin.smith at isode.com
Wed Feb 28 09:28:01 UTC 2018

On 26 Feb 2018, at 15:59, Simon Friedberger <simon.jabber at a-oben.org> wrote:
> So, lest this discussion just die. Here is a proposal:

Thanks for the proposal. Bashing follows.

>    Client-A generates message-ID based on HASH(connection_counter,
>    server_salt). The connection_counter needs to be maintained only for
>    one connection. The server salt is server generated, anew for each
>    connection and is sent to.
>    Server-A checks that this is correct and uses it for MAM. This
>    should make life easier for clients because they only need to deal
>    with one ID.

I think stopping servers being able to use their own IDs for DB storage is probably disadvantageous.
Although I see the appeal of a client knowing its own MAM IDs, I’m not sure that simply knowing it is sufficient - you also need to know where it fits into the order of the archive, if you’re going to use it for archive sync, so I’m not sure this is actually buying anything, at the cost of of lack of flexibility in server implementations.

>  * Two problems need to be considered here:
>      o The client needs to maintain a counter.

The literal ‘have a counter in memory’ is trivial, although getting the rules for incrementing it right can be difficult - moreso than for SM IDs, which there’s another thread at the moment about people not being able to get right.

>  Even
>        though I called it a counter, it does not need to be contiguous.
>        It just needs to be increasing that the server can easily check
>        that for a given salt value it is unique.

If it’s not contiguous, how is the server going to go about validating the hash of an unknown value?

>      o The server needs to check the validity of the counter. If the
>        server is actually replicated and consists of multiple machines
>        this is not strictly possible.

I’m not sure I understand this. If the server salt is local to a node, and the connection counter is local to a connection, which is local to a node, even in a split cluster this should be fine?

> However, assuming normal
>        operations the IDs generated by the client will be fine and if
>        the servers have any mechanism for eventual consistency a
>        misbehaving client will be detected.

Will they? If the server can’t check the stanza at submission time, I don’t think it can ever (reasonably) check it later.

>    Server-B gets the message via s2s. It changes the message-ID to a
>    new one and stores the original as "origin-ID”.

That’s going to break errors and all sorts isn’t it? A stanza’s ID needs to be stable or things will break.

>    Client-B gets a message with only TWO IDs. message-ID is for
>    referencing locally for MAM, origin-ID is for referencing when
>    talking to the sender i.e. read receipts.

What happens with MUC? That’s an extra entity that may be doing MAM, and will generate new stanza IDs for the fan-out.

>    If a server generates follow-up messages it makes up a new
>    sender-ID. It should maybe set a “triggered-by-ID” so the client can
>    determine that it triggered this message. Maybe this is unnecessary.
>    The server definitely must send the message it inserted back to the
>    client to ensure a common view of history.

What does ‘generates follow-up messages’ here mean?

>    If a server changes a message it can keep the sender-ID but it MUST
>    notify the client who sent the message to make sure that clients
>    have the same view of the history.

What does ‘changes a message’ here mean? There are situations where a message is modified in flight and the sender can’t be told what it’s modified to.

> In this proposal stanza-IDs are not required. The message-ID is
> authoritative and when rewriting the original message-ID is kept as
> origin-ID.

I’m not sure they’re not required (see comments on MAM).

> From my original mail this solves C1, C2, C3, C4 and C5.

I’m not sure it helps with C1. It only helps with C2 by going through and changing every XEP that uses a stanza ID and change it to use an origin-ID, I think? I don’t think it makes a difference to C3 at all, does it? It doesn’t help C4, as the client still needs a bounce to get ordering right, and I don’t see how it handles C5.

> Also note, to make this a simpler change the clients could set both
> origin-ID and message-ID. The stanza-ID for MAM would turn out to be the
> same. This would be very similar to what is probably currently the most
> widespread behavior. Except that the origin-ID should be used for
> read-receipts, etc.

I suspect that just saying in message receipts and in LMC etc. “use the origin-id when present” would achieve much the same thing as this proposal?


More information about the Standards mailing list