[Standards] XEP-136 and XEP-59 implementation comments

Alexander Tsvyashchenko lists at ndl.kiev.ua
Sat Mar 15 18:12:39 UTC 2008


Hello Peter,

Sorry for delay with the answer :-(

I'll discuss here both your comments and changes to XEP-136 (v 0.15).

To keep the message short (well, a kind of ;-) I do not comment those  
changes you've made already to XEP which I'm completely OK with, so I  
say "thanks for listening to my feedback!" for all of them here ;-)

>> Duplicate items
>> ---------------

... skipped ...

>> Proposal: change "10. Replication" item by removing references to <after>
>> and <last> element and stating that start replication date should be
>> specified
>> using "start" attribute of "modified" command with additional note that
>> the collections with changed time exactly equal to "start" time are NOT
>> included
>> in the result (thus, "start" will effectively work as "after").

... skipped ...

> +1 to that modification.

Looking at XEP-136 v 0.15 I see you've added "start" element, but it  
seems that there's still one thing that remains for fixing "duplicate  
items" issue:
i.e. "9.4 Replication" -> "The server MUST set the content of the  
<last/> element to the UTC time". This is exactly the reason why  
duplicate items issue happens, so I believe this phrase should be  
removed, so that client treats <last/> element as opaque.

It may be a good idea though to say instead that this ID should be  
persistent, so that client can re-use it upon the next query even if  
it happens not immediately after the previous one and even if there  
are some changes to the output since then - this is quite easy from  
implementation point of view (for example current implementation in  
mod_archive_odbc should satisfy that even without being written with  
this requirement in mind, I believe), and those implementations that  
are ready to neglect the possibility of "duplicate items" may just use  
UTC there, as it also satisfies this requirement.

>> XEP-59: detecting the change
>> ============================

... skipped ...

>> Proposal: add to RSM result the tag "changed", which, when present,
>> indicates the datetime of the most recent change of the items affected by
>> the query. It typically shouldn't be that problematic to compute this value
>> (certainly it wasn't for XEP-136 implementation), and it can be made
>> optional,
>> as it is done with "index" if in some cases it's hard to calculate it.
>
> Can this be handled via the 'version' attribute in archiving?

Well, for my initial idea of client-side implementation, which  
included collections indexing at client side, this seems to be not  
enough, I think. 'version' attribute indeed covers issues with single  
collection change, but that's not enough for more general case, as  
there's still no way for the client to know that some changes happened  
other than by performing replication, and this can be quite costly, as  
then replication has to be performed after every request to make sure  
things didn't change during the request, otherwise it's possible that  
local cache is filled with inconsistent info.

However, now I tend to think that supporting local collections  
indexing was not such a good idea anyway as its implementation is  
quite complex and fragile, so I'm not sure that for me personally this  
is a real issue anymore, because if collections shouldn't be indexed  
locally there's no need in keeping them consistent.

But <changed/> element might still be useful if somebody else decides  
to go that route, and also for other cases such as you've described  
with searching.

However, if you agree to proceed with some kind of "change  
notification" item for RSM, I think that my original <changed/>  
element proposal should be upgraded to versioning-like scheme: so,  
instead of including UTC, it should include just opaque integer which  
is increased when items in requesting range are changed, but not  
necessarily by +1 for each change. In this way it's still possible to  
include UTC by converting it to integer first, or use any more  
reliable way of versioning if that's applicable.

>> Resource modification when auto archiving

... skipped ...

>> Of course, the possibility here would be to just drop all resources from
>> JIDs
>> and store only bare JIDs, but that seems to be too limiting and
>> inconvenient.
>
> I'm not so sure.

Well, I do not have that much experience with multiple resources  
usage, but for me it seems that dropping resource altogether looks  
like a bad idea due to at least several reasons:

1. By that we increase possibility of collections collisions. As due  
to XEP-136 each collection has to be uniquely identified by "with" and  
"start" if we strip resource there's higher probability of two  
collections colliding by these attributes; while it's highly unlikely  
I'm going to have two different conversations with the same person  
under the same client started at the same time, it's more likely to  
happen if resources are not used, so in fact these could be two or  
more different clients.

2. I believe that resource may be an important part of information  
about conversation in some cases, i.e. a kind of "where exactly did  
this conversation happened?"

> There is also the case of sending a message to the bare JID and the
> receiving server sends that message to all resources. Then the recipient
> could reply from multiple resources, thus starting multiple
> conversations! I'm not sure how to handle that. Probably it's best to
> save each conversation separately but each conversation / collection has
> the same start message (however they might have different threads).

Hm, in fact that seems for me to be quite complex case and, most  
likely, I have not enough knowledge to judge what is the best option  
here. So everything written below are just some random (more or less)  
thoughts ...

 From the client side, if <thread/> elements are used, one  
possibility, it seems, is to use "parent" attribute according to  
XEP-201 in all children conversations pointing to the "root" <thread>  
element of the original message and use different <thread> elements  
for each conversation.

For XEP-136 this can be mapped to storing all these conversations  
separately in different collections, store original message in its own  
collection and put links from all children collections to this parent  
collection (probably there has to be "parent" element in collections  
linking besides "prev/next" then?)

Other possibility seems to be to treat these conversations as a kind  
of special "group chat" ;-) Then everything just has to be stored into  
single collection, but for differentiating between different parties  
probably some attribute to messages should be added, similar to "name"  
for groupchats.

To be true, I think that this issue is out of the scope of XEP-136 - I  
would expect that <thread> behavior either should be specified by  
XEP-201 (or somewhere else) or left as implementation-defined; on the  
other hand, XEP-136, probably, should just take into account <thread>  
values and use them for its business like described above.

So, for me it looks like the following could be specified in XEP-136:

1) If no <thread> element exist, server may use its own  
implementation-defined strategies for mapping messages and  
conversations to collections and also may treat resources in  
implementation-defined way.

Maybe some heuristic can be suggested such as the one I described in  
my first letter for "conversations tracking", but I doubt anything  
100% reliable can be proposed.

2) If <thread> element is present, the mapping is exactly 1 <-> 1 (one  
thread element to one collection). If "parent" attribute is present  
for thread - the link should be created of type "parent" to the  
appropriate collection.

Resources can be treated as follows: when receiving first message with  
full JID it's allowed to overwrite previous bare JID of collection by  
new, full JID; if previous JID was already full and the new one is  
also full, and differs from the previous one - assume that we have  
"multi-resource" case, modify collection's JID to bare one and forbid  
all its further overwrites.

>> Duplicate messages times
>> ------------------------
>>
>> In "5.3 Uploading Messages to a Collection" it's specified that "If the
>> collection already exists then the server
>> MUST append the messages to the existing collection." However, it's not
>> said
>> what should be done if time for some of the messages is equal to time of
>> those
>> messages existing already in collection.
>>
>> I assume that from "append the messages" clause it follows that
>> duplicate entities
>> should be created, but it could be good to mention to avoid ambiguities.
>
> By "duplicate entities" do you mean <from/> or <to/> elements with the
> same dateTime?

Yes. As I said I think it's more or less deducible what the required  
behavior here is, but probably it can be useful to clarify it in  
specs, as at least I had some doubts thinking about it. Maybe it's  
just me, though ;-)

>> List collections for Bare JID / Domain
>> --------------------------------------
>>
>> There seems to be no way to list collections solely for service JID,
>> as according to XEP-136 it's treated as domain JID request.
>>
>> For example, when trying to list all collections for icq.example.com
>> you will get instead all collections of all users at icq.example.com - even
>> if you wanted to receive collections ONLY for icq.example.com
>>
>> I do not think this is major problem, as it can be filtered out on
>> client side -
>> the only drawback is high amount of extra traffic, so, probably, it can
>> be left as it is, but adding some notice in specification on that subject
>> could be nice.
>
> Hmm. That's the matching process we use in Multi-User Chat (XEP-0045)
> and Privacy Lists (XEP-0016) and so on. I don't see this as a big
> problem (you don't really chat with services directly), but it we find
> out that it causes problems in reality we can fix it later.

Well, in fact I think I've found already one case when this is a  
problem, not only for collections listing, but also for their removal  
and for preferences storing, see my message:

http://mail.jabber.org/pipermail/standards/2007-November/017205.html

Basically, current approach means we have no real control over the  
messages with bare/domain JIDs: so I can nor delete messages from/to  
icq.example.com transport, neither forbid auto-archiving them without  
affecting all messages to all ICQ users.

> Thanks for your feedback, and sorry for taking so long to reply!

NP, thanks for taking care of that!


Good luck!                                     Alexander

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.





More information about the Standards mailing list