[Standards] XEP-0427: MAM Fastening Collation questions

Andrzej Wojcik andrzej.wojcik at tigase.net
Fri Jun 5 12:23:02 UTC 2020


>  I've started the implementation of XEP-0427 with a goal to use the collation of fastenings (and pseudo-fastenings) to reduce traffic related to MAM history synchronization.
> 
> 
> You're ahead of me; though the Mobile Lead at Pando is begging me to do the same.
> 
> I should warn that we discussed this specification in detail 6 months ago, and those have been really busy for me, so I've let a lot of things slip and not updated this specification.
> 
> Our major problem is that fetching <displayed/> chat markers effectively clogs our connection during resync, and as a result we're using a lot of RTTs, when we have very unreliable bandwidth thanks to people building hospitals as Faraday cages. :-)
> 
> Hence my primary concern is reducing round-trips in large groupchats, though traffic size is a close second.

I share the same concern. But additionally I want to reduce load on the mobile device storage as syncing over a week of message takes some time and iOS apps sometimes sync once a few days - especially if you mostly use XMPP on the desktop.

>  
> I was thinking about using `collate` summarizing to retrieve delivery receipts and chat markers. However, I'm not quite sure how would it really work.
> 
> In the section related to Pseudo-Fastening (https://xmpp.org/extensions/xep-0427.html#pseudo <https://xmpp.org/extensions/xep-0427.html#pseudo>) there is the following note:
> 
> > Message Delivery Receipts: Message Delivery Receipts (XEP-0184) [5] "ack messages" - those containing a <received/> element - are considered to be equivalent to a fastening containing just the <received/> element, applying to the message given by the "id" attribute.
> 
> and that is quite clear. However, this means that result from MAM (if I'm correct) would look like that:
> 
> <message id='aeb213' to='juliet at capulet.lit <mailto:to='juliet at capulet.lit>/chamber'>
>   <result xmlns='urn:xmpp:mam:2' <> queryid='f27' id='28482-98726-73623'>
>     <forwarded xmlns='urn:xmpp:forward:0' <>>
>       <delay xmlns='urn:xmpp:delay' <> stamp='2010-07-10T23:08:25Z'/>
>       <message xmlns='jabber:client <>'
>         to='juliet at capulet.lit <mailto:to='juliet at capulet.lit>/balcony'
>         from='romeo at montague.lit <mailto:from='romeo at montague.lit>/orchard'
>         type='chat'>
>         <body>Call me but love, and I'll be new baptized; Henceforth I never will be Romeo.</body>
>       </message>
>     </forwarded>
>     <applied xmlns='urn:xmpp:mamfc:0' <>>
>         <received xmlns='urn:xmpp:receipts' <> />
>     </applied>
>   </result>
> </message>
> 
> That looks OK for 1-1 chat. But how about delivery confirmations forwarded by the MUC room or MIX channel? (note: MIX messages are stored in user MAM archive)
> 
> 
> I think you're missing a count there (which is fine in the 1:1 case, probably, as it defaults to '1').
> 
> So you should have, in the MUC case, a count of >1.
>  
> If I'm correct we would have the same response if at least any of the recipients sent a delivery receipt. If many recipients would send those delivery confirmations we would still have one entry and no way to tell who actually sent that confirmation. So we only know that someone received this message - it could be even our own client! The only information would be how many clients received that message (thanks to the 'count' attribute).
> 
> 
> Yes, and I think that's fine, because if you want to know you can ask (and you might ask consistently for the latest message in the chatroom, or you might only ask on hover or something in the UI).

Well, true. I was thinking of more FB Messenger style, where you see actually who read up to where and with that collation it (I think) would not work out of the box. I was thinking about that as it would be useful to MIX which would be presence-less and you could still know if someone read you question and you should wait for the answer or just check for the response later on and do something else in the meanwhile.

It would be nice to add some info on how to query fastenings only for particular message to the XEP. Currently, MAM does not allow that (if I recall correctly) and XEP-0427 does not explain how to do so.

> That said, there's an implication here that for many fastenings, it's particularly important to know if they're from the bare jid of the sender or the bare jid of the recipient; should we collate those separately?

With MIX/MUC it could be the only way to actually know if anyone else received this message (or displayed it). But it may be tricky as for MIX (most likely) you will archive only messages sent by MIX, so direction of the message would be always 'incoming' and if it would be anonymous MIX it will not be possible to "check sender" as there wouldn't be a sender jid in it.

>  
> The same issue I think is with Chat Markers. Moreover, in XEP-0427 there is the following statement:
> > Chat Markers: Chat Markers (XEP-0333) [6] A Chat Marker is similarly equivalent to a fastening containing the Chat Marker, but applying to all previous messages (since previous messages can be assumed to have been read and or displayed, etc).
> 
> So, should all messages preceding message with chat marker (ie. <received/>) have fastening in the summary? each of them should have the following  element in the <result/>:
> 
> <applied xmlns='urn: <>xmpp:mamfc:0' <xmpp:mamfc:0'>>
>     <received xmlns='urn: <>xmpp:chat-markers:0' <xmpp:chat-markers:0'> />
> </applied>
> 
> 
> Hmmm. Chat markers are indeed a bit painful here. I'm open to suggestions here, including "let's not bother applying chat markers to previous messages in the archive", which feels pretty good right now.

I have no other suggestion on how to solve that. Not applying chat markers to all previous messages was the only sane solution with which I came up.

> We also have no solution around clients wishing to see what the latest message seen by a particular occupant in a chatroom is.

This is something which is rather important from my point of view. But solution to that would be increased traffic, which also is not a good solution.
> 
> I think that there is one more possible issue with XEP-0427 related to 'Last Archive ID'. XEP states that while this value could be deduced, it suggests that <latest/> element is added to return id of the last element in the query (even if it is fastening message id). And that could work, but not when client wants to use RSM for pagination (ie. it was not connected for a longer time and wants to sync in batches). Then it is possible that latest fastenings id would not even be in the original result set. 
> 
> Example 1.
> 
> Let's say that the user has 200 archived messages since the 'start' date. The first message is a message with stable id '1' and it has delivery receipts at position 150 with a stable id of '150' in this archive. Then the client asking for first 100 messages (assuming that all of them are messages and not fastenings) will receive 100 messages and fastening for a message with stable id '1'. In this case, <lastest/> would be set to '150' as that was lastest ID in the returned set. But when a client would ask (using RSM) for messages after <latest/> then it would receive messages from positions 151 to 200. Messages from positions 101 to 149 would not be fetched and synced at all.
> 
> 
> Oh... So shouldn't the client be asking for the messages after the last stable identifier for a message that it sees? (ie, 100, so it'll get 101->150). The latest is really not much use until you've got fully synchronised, in which case you can find messages with new fastenings.
> 
> Sorry, this is entirely my fault, it's really not clear at all in the XEP.

Ok, if that is how it should be used, it would be good to mention that in the XEP. I was not sure if it should be used when we ask for the next batch of messages or not.
> 
> Example 2.
> 
> Let's say that the user has 200 archived messages since the 'start' date. The first message is a message with stable id '1' and it has delivery receipts at position 150 with a stable id of '150' in this archive. Then the client asking for the first 100 messages (assuming that all of them are messages and not fastenings) will receive 100 messages and fastening for a message with stable id '1'. Assuming that client would ignore the value of <latest/> and would fetch once again using <after/> set to the value available in <last/> element of the previous response, then it would receive messages for range 101-200 and instead of the message at position 150 it would get it as a fastening (no <forwarded/> just <applied/>).
> 
> Moreover, if the client would always use a value of <last/> to fetch the next messages it could end up in the infinite loop. This could happen if in the archive would be 300 messages, first 100 of that would be normal messages, then 100 would be just "fastenings" but each fastening would point to a different message. that would give us 100 fastenings pointing to 100 different messages. Client asking for messages after stable id 120 (I assume that stable id is equal to the position of the message in the set), would receive 100 fastenings (nothing more) and <last/> id would actually match the id sent in <after/> element creating an infinite loop.
> 
> 
> OK, so if we assume we have 100 messages, followed by one delivery receipt for each message, followed by a further messages (fastenings or not).
> 
> The client asks for 1->*, limit 100. It'll get 1->100, each with a single fastening collated, RSM saying 1->100, and <latest/> set to 200. If it then asks for collated messages 101->* limit 100, it'll get the messages starting at 200, since everything before that is a fastening and therefore ignored, and it'll have latest set to 300.

Really? I've somehow assumed that if I've asked for 101->* limit 100 then I would receive only 100 fastening. I've got to that conclusion from point 3.4. Incremental queries (https://xmpp.org/extensions/xep-0427.html#sect-idm45353205399792 <https://xmpp.org/extensions/xep-0427.html#sect-idm45353205399792>) which states:
> A MAM query where the MAM summary type is "collate", and where "start" and "end" (or the RSM <after/> element) would exclude the parent message but include the fastening, then the MAM result is sent with the <forwarded/> element omitted but the summary present (including all fastenings, not just those that have changed).

In my example I've asked with <after/> equal to 100 (so after excluded 100 parent messages!), so I would end up with 100 fastenings only?

I would agree with your answer if there would be no" or the RSM <after/> element" part in that statement.

> 
> It's then up to date.
> 
> If it subsequently resynchronizes by asking for 301 (latest from its last MAM fetch), it'll get - in effect - any messages since, *and* any messages that have had subsequent fastenings, including all the fastenings. This is certainly somewhat redundant information, but I think trying to optimize it further gets fearsomely complicated.

True, but I think that clause "or the RSM <after/> element" should be removed. However, it will create a loop hole, where client would sent (on the begining of the synchronization)
<after/> of the last synced element, as it would exclude all fastenings received after <after/> message but where parent was before <after/>. (I'm not sure if that is clear).

> Do you have any suggestions on what I can write to make this clearer in the specification?

"or the RSM <after/> element" clause suggests me that if I would ask 100->* limit 100 I would get only fastenings. I do not see any rule that would say I need to exclude them.

>  
> To sum up, I think that idea for aggregation of messages on the server-side is good, but the XEP in the current state has some holes in it making it unusable. I do not see, how we could benefit from this XEP, even assuming that it would only be used for 'real' fastenings (removing aggregation of delovery receipts), as even if we would know that this message has count of 'likes' of 100, we would need all the actual fastening messages to be able to update that count when a new message is received, because it may happen that use changed his reaction, so instead of like now different reaction should be shown and if I recall correctly XEP-0422 (https://xmpp.org/extensions/xep-0422.html#replace <https://xmpp.org/extensions/xep-0422.html#replace>) allows.
> 
> 
> Quite, I didn't put in anything about fastening replacement.

Would it be harmful in any way to force MAM to thread a replacement as "retraction" of previous "vote"? In this case new "vote" would be added as a new fastening and the old one would be replaced by MAM tombstone?

I wonder if ChatMarkers couldn't also work that way. If new chat marker with "higher" in rank than previous is sent then it is recorded and previous ChatMarker would be replaced by tombstone. That would reduce number of messages to synchronize even by plain MAM sync (assuming that ChatMarkers are returned).

> Maybe I've did not understand something from the XEP or that XEP needs clarification, but I would prefer a XEP that would allow me to fetch messages from date to date (including after) that would return messages and aggregated fastenings in that time period. That would allow XMPP clients to sync faster (knowing the actual state of the message (received/displayed) client would update it in the local archive just once) and would redeuce load on the server (less data to aggregate). It would be also good to keep details about sender of a fastening message (even if aggregated). That would allow (in case of MIX/MUC) to show who actually read and who just received that message.
> 
> 
> The trouble (for us, at least) is that we have >200 people in grouchats quite regularly, and listing them all would create really huge stanzas. This is true whether it's reactions or receipts. So I believe the best option is to ask for the actual fastenings on demand, when the client needs to know.

Or introduce additional form of the summary, but would need to extend XEP for that.

Regards,
Andrzej Wójcik

XMPP: andrzej.wojcik at tigase.org
Email: andrzej.wojcik at tigase.net

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20200605/ecbe285f/attachment-0001.html>


More information about the Standards mailing list