[Standards] MAM ids on new messages to prevent deduping
mwild1 at gmail.com
Mon May 11 15:53:23 UTC 2015
On 11 May 2015 at 16:25, Brian Cully <bcully at gmail.com> wrote:
> In implementing MAM in clients there can be a case where MAM results contain duplicates of already seen messages. In order to prevent such duplication, the MAM ID for a stanza would need to appear on a newly generated non-MAM stanza.
> As background, imagine a client which, when it receives a new stanza from a server, presents a view that renders the new stanza and then queries MAM to provide a chat history between two JIDs. When the JID1 sends a message to JID2 it is logged in the MAM store and forwarded on to JID2, JID2 then requests MAM results for JID1, returning the last 50 messages, which would include the stanza that indirectly generated the MAM request, leading to two copies of the stanza in the message view between JID1 and JID2.
> Note that while the common case would be the most recent stanza being duplicated, it is also possible for more than one to be duplicated because of the asynchronous nature of the MAM IQ response and they may arrive interleaved with new messages.
> By showing the MAM ID on newly generated inbound messages, the client would be able to ask MAM for all messages before that ID, preventing duplication while allowing new messages to be correctly shown in order.
In summary: we know. IDs on messages have been in, out, in, out and
now they're going back in (based on discussion at the last summit).
But we're planning a separate XEP for the message ID part now, as the
IDs are useful even without MAM. Florian Schmaus has been working on
this spec, which will pave the way for the rest of the work in MAM and
Carbons (Carbons is required to receive the IDs of outgoing messages).
> Querying MAM by message times also will not work, given the potential differences in clocks between arbitrary clients and the MAM store.
Querying solely by time was never the intention of the XEP (though I
know some clients are currently doing this :( ). The "query by time"
aspect is intended for clients that want to show something like a
history browser, if they don't have local history. It's not intended
for automated sync.
More information about the Standards