[Standards] XEP-136 and XEP-59 implementation comments

Alexander Tsvyashchenko lists at ndl.kiev.ua
Wed Nov 14 19:28:17 UTC 2007


Hello Olivier,

Quoting Olivier Goffart <ogoffart at kde.org>:

>> Who changed it?
>> ---------------
>> Typically the client will perform replication when it has some local cache
>> for collections / messages, to synchronize its cache with server one.
>> Therefore, it makes sense that client also use this cache for caching those
>> collections client uploads.
>>
>> However, implementing it strictly according to XEP-136 means that client
>> has no way to determine if the changes received in replication were done
>> by this client or not - so, it will have to re-fetch entire collection even
>> if <changed> item in replication results was caused by upload from itself,
>> thus basically downloading the same collection it just uploaded on the
>> server, which is stored already in local cache.
>
> Can't the client first synchronize his cache before uploading ?

No, this is not guaranteed to work - if one client checks that there  
were no changes to this collection before its uploading, some other  
client may change it after uploading, and first client still does not  
have any way to determine if the "modified" item in further  
replication response is caused by its own upload - or smb's else change.

> But the ressource is not a valuable identifier.
> You can connect from another client or elsewhere with the same ressource name
> (not in the same time of course)

Yes, but as far as I understand you cannot have the same resource  
bound to several different sessions, no? If this my assumption is  
correct, then using resource still guarantees uniqueness during one  
session - therefore it's enough to perform replication once, at  
session start, and further during the session rely on "by" attribute.

But in fact, after thinking about it somewhat more, it seems that even  
"by" attribute solution I proposed is not enough, as when client  
uploads collection - it effectively overrides "by" value, so even if  
it performed replication right before uploading - it's possible that  
other client will modify collection between first client's replication  
and uploading, thus causing first client to loose these changes when  
it will perform next replication, as it will see only it's own "by"  
modified item.

Besides that, even if choosing to neglect this possibility of change  
between replication and upload (which is risky at least due to  
automatic archiving), performing replication each time before  
uploading seems to be a kind of overhead I personally would like to  
avoid, if possible.

Therefore, it seems that we need another approach to solve this problem.

What about adding very simple & primitive versioning to collections?

Suppose that we require that each collection holds is version as  
integer number,
where initial upload has version number 0 and all subsequent uploads  
(or changes by server due to auto-archiving) increase this number by 1?

Then, if we include version number in "modified" response, such as

<changed with='juliet at capulet.com/chamber'
          start='1469-07-21T02:56:15Z'
          version='3'/>

and in collection retrieval response in chat tag such as

<chat xmlns='http://www.xmpp.org/extensions/xep-0136.html#ns'
         with='juliet at capulet.com/chamber'
         start='1469-07-21T02:56:15Z'
         subject='She speaks!'
         version='3'>

it provides easy & efficient way to track all changes in collection  
and determine if client has the last version or not. Even if client  
does not know whether the collection it uploads exists or not - this  
will still work, as the client may just assume the version is 0 when  
uploading collection and record this version in cache, and later when  
it sees "modified" item for this collection it just verifies if it is  
equal to cached version or not, if not - it needs to download it once  
more.

The same holds for cases when client has this collection in cache  
already: it just uploads new version and increments locally its  
version by 1, if later it sees "modified" result which is not equal to  
local version - it means other changes happened and collection needs  
to be downloaded.

One note here is that version number should be internally hold for  
"removed" items also (though it's not necessary to display this number  
during "modified" response for removed collections) and reused when  
collection is re-created, as if someone removes collection and later  
re-creates it - versioning should be kept continuous so that other  
clients can detect the change. However, this doesn't seem to be a real  
problem, as info about "removed" items has to be kept anyway for  
"modified" responses.

For me this seems to be superior to "by" solution as it is simple,  
does not involve overheads and should cover all cases.

What do you think about this solution?

> I think file format is implementation detail, and should NOT be part of that
> XEP at all.
> That section should be removed.
>
> Maybe it can be part of a separate XEP later (there are already other im log
> specification elsewhere anyway)  or extention to XEP-0227, but it's not
> really related

Well, I'm not really familiar with approaches & practices for XEP standards,
so I do not have my own opinion on that. The only thing is that having  
at least some standard at least somewhere is nice thing, as it may  
improve interoperability.

Good luck!                                     Alexander




More information about the Standards mailing list