[Standards] XEP-0231 (Data Element) - local caching

Peter Saint-Andre stpeter at stpeter.im
Thu Jul 31 09:07:04 CDT 2008


Pavel Simerda wrote:
> On Wed, 30 Jul 2008 07:04:16 -0600
> Peter Saint-Andre <stpeter at stpeter.im> wrote:
> 
>> Pavel Simerda wrote:
>>> On Tue, 29 Jul 2008 19:49:01 -0600
>>> Peter Saint-Andre <stpeter at stpeter.im> wrote:
>>>
>>>> Ahoj Pavle!
>>>>
>>>> Pavel Simerda wrote:
>>>>> Hello,
>>>>>
>>>>> I have some suggestions for XEP-0231 (Data Element).
>>>> Thanks for looking at this spec so thoroughly.
>>>>
>>> I actually have some questions. First, lolek from the jabbim.cz
>>> project is going to propose a XEP for text emoticons. 
>> Similar to XEP-0038? We can bring that back if someone wants to
>> maintain it.
> 
> Similar but more powerful and not file-based but most probably based on
> Data Elements. There may be a lot of other extensive changes. If these
> changes can be made, I believe Martin would maintain it if he gets the
> chance.

OK, great. I'm happy to help.

>>> I like his ideas but I
>>> suggested him to use Data Element instead of a custom solution.
>> +1
>>
>>> He still has doubts but I promised him to try to sort it out and to
>>> help him with language corrections of his document too.
>> Great, thanks.
>>
>>> I didn't find in the specs what should be used for domain ID in the
>>> CID. The examples apparently use the domain part of JID that is not
>>> unique for the clients. I looked at the RFC and still don't know a
>>> proper mapping to XMPP.
>>>
>>> His original idea was to use a cryptographic hash function and not a
>>> CID.
>> I think your idea of a UUID followed by the domain part of the JID
>> would work well.
>>
>>> He also pointed out he misses a feature that would allow a client to
>>> advertise which mimetypes it supports.
>> Yes we can add a disco feature for that.
>>
>>> This is another questions... if it's just emoticons, should we just
>>> support png and mng types or add some accept-advertisement facility?
>> I don't think it hurts to define a way to advertise what MIME types
>> you support. We'll use the data element for things other than
>> emoticons, but IMHO the simplest approach would be to advertise in
>> general which MIME types you support, not "I support these mime types
>> for emoticons" and "I support these other mime types for file
>> transfer thumbnails" etc. Does anyone think that level of complexity
>> is needed?
> 
> I'm not sure. Let's wait for other comments.

Well I'm not a fan of adding complexity if we don't need it.

>>> Is there a written policy for image formats in XMPP extensions?
>> Not yet.
> 
> PNG for static raster images, MNG for animated raster images, SVG for
> vector images? That's something I would expect from every client.

Sure. But some people think JPG and GIF are good too (e.g., I think JPG 
is the default in vCards or LDAP or somesuch).

>>>>> Right now, as the example shows:
>>>>>
>>>>> <message from='ladymacbeth at shakespeare.lit/castle'
>>>>>          to='macbeth at chat.shakespeare.lit'
>>>>>          type='groupchat'>
>>>>>   <body>Yet here's a spot.</body>
>>>>>   <html xmlns='http://jabber.org/protocol/xhtml-im'>
>>>>>     <body xmlns='http://www.w3.org/1999/xhtml'>
>>>>>       <p>
>>>>>         Yet here's a spot.
>>>>>         <img alt='A spot'
>>>>>              src='cid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 at shakespeare.lit'/>
>>>>>       </p>
>>>>>     </body>
>>>>>   </html>
>>>>>   <data xmlns='urn:xmpp:tmp:data-element' 
>>>>>         alt='A spot'
>>>>>         cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6 at shakespeare.lit'
>>>>>         type='image/png'>
>>>>>     iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
>>>>>     C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
>>>>>     AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
>>>>>     REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
>>>>>     ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
>>>>>     vr4MkhoXe0rZigAAAABJRU5ErkJggg==
>>>>>   </data>
>>>>> </message>
>>>>>
>>>>> Note: in this particular example the data is very short, this may
>>>>> not be the case in real world where people tend to ignore the size
>>>>> of data they send.
>>>> Yes, that's just about the smallest image I could find. The spec
>>>> says that the image should not be more than 8k (which is twice the
>>>> suggested size of an IBB chunk) but we don't know if people will
>>>> typically send images that are smaller or larger than 8k -- I think
>>>> smaller but I don't know that yet.
>>>>
>>> Might it be advertised by the client/server? And rejected if the
>>> other party tries to send a bigger one (just to force them to fix
>>> it)?
>> I think that's handled at a different layer (e.g., rate limiting).
>> But we do need to define better handling for stanzas that are too
>> large (there is a proto-XEP about it but the Council didn't accept it
>> and I never incorporated their feedback).
>>
> 
> Hmm. I know that people at jabbim.cz use a roster-renaming utility (for
> icq transport). They wait a long time between stanzas and the renaming
> can often takes more than just several minutes.
> 
>>>>> We send data once for every session (and omit for subsequent
>>>>> messages).
>>>> In this case it's important to define "session" (see rfc321bis). Is
>>>> it a chat session, a presence session, or something else?
>>>>
>>> Exactly.
>>>
>>>>> This has two important implications:
>>>>>
>>>>> 1) The other entity may or may not cache it for the session and
>>>>> reuse it. That is good.
>>>>>
>>>>> 2) If an entity keeps the data for a longer time (e.g. for weeks
>>>>> or even permanently), this cache will never be used. As the
>>>>> sending entity always resends the data for a new session.
>>>>>
>>>>> What I propose is:
>>>>>
>>>>>  * By default the sending entity would not send the data. It would
>>>>>    merely reference it by its cid url.
>>>>>  * Let the recieving client follow "3.4 Retrieving Uncached Media
>>>>> Data" if the data is not cached (no real change, this is already
>>>>> being done).
>>>> I think I like that approach. It introduces a round trip for the
>>>> IQ, which might introduce some latency. But it puts the burden for
>>>> "storing" and "serving" the image on the sender, which might
>>>> discourage abuse of in-band images.
>>>>
>>>>>  * Reserve the possibility of sending the data immediately with
>>>>> the message for the *specific* case that the sending client
>>>>> actually knows the recieving party cannot have the data cached
>>>>> (e.g. the data was never sent before). This behavior should be
>>>>> considered optional.
>>>> In that case the sender needs to keep a list of every JID to which
>>>> it has ever sent the image. That seems suboptimal.
>>> I didn't write it exactly as I meant it. There may be cases we are
>>> knowingly sending something really new. But we might just as well
>>> drop this feature if you think it's better.
>> If it's optional, it does no great harm. In fact it's not even a 
>> feature, just an implementation note.
> 
> Ok.
> 
>>> I'm afraid some people will object.
>> Don't be afraid -- some people will always object. :)
>>
> 
> :D
> 
>>>> And I suppose the recipient might have received the image from
>>>> another sender at some point, or might have received the image
>>>> through other means (e.g., an emoticon "bundle").
>>> The problem is... that we really want the users to get what we send
>>> them. If they got it from someone else, we need to secure it by a
>>> hash function, not a mere ID. It would have to actually check the
>>> hash when caching.
>> Isn't that a bit paranoid for something as lightweight as emoticon
>> bundles?
>>
> 
> The problem is that the Data Element could very soon be used for other
> purposes. For me this is a grave security hole that might cause a real
> headache in the future.
> 
> But I'm not only a bit paranoid :). Working privacy and security is
> what originally brought me from ICQ to Jabber... only then I realized
> how cool it actually is in other areas.

Perhaps you could describe the possible attacks?

>>> Another issue would be the particular hash functions. Some client
>>> authors or users may want to prevent using data from third parties
>>> protected by weak hash functions.
>>>
>>> That's why I only considered caching per sender JID.
>> I suppose caching per sender JID makes sense, yes.
>>
> 
> I suggest this if we don't take the cryptograhic way. Or we could take
> both ways (let the implementors choose).

No, you're probably right that caching per sender JID is reasonable.

>>> If we want to use hashes... and third party data, we should use some
>>> specific "hostnames", possibly sha256.cid.xmpp.org for sha256 or
>>> something like that.
>> Sure. If desired.
>>
> 
> It would be - for globally-shared data, so the IDs actually match.
> The global-sharing feature should be optional anyway, so it can be
> added at any time. No reason to defer implementations.

Agreed.

>>>>> I further propose we add some informational section about
>>>>> generation of CIDs. Although it's specified elsewhere, I believe
>>>>> this XEP will be very useful and will be referenced from many
>>>>> future XEPs (and maybe improved as well - possibly some server
>>>>> caching etc). I think the informational section could suggest
>>>>> UUIDs generated by hashing the actual content.
>>>> Yes I think that would be helpful.
>>>>
>>>>> Another thing that could be considered... is to add some sort of
>>>>> caching hint attribute that would suggest how long its reasonable
>>>>> to cache a particular resource. 
>>>> Do you think that would really be helpful? I'm still thinking about
>>>> it...
>>>>
>>> This feature would be optional, so it's easy to add it when we think
>>> it's useful. Right now I have no idea :).
>>>
>>>>> Maybe we could borrow from HTTP Cookies
>>>>> but allow (suggest) the clients to have some mechanisms for
>>>>> limiting the time, size and number of cached objects.
>>>>>
>>>>> There are many possibilities, I will just describe one of them.
>>>> Do you have examples of these?
>>>>
>>> The attribute values could be stated more abstractly... like...
>>> "session", "short", "medium", "long" with recommended defaults, for
>>> example. But usually the sender knows better.
>> Mimicking HTTP values is OK with me.
>>
> 
> No problem for me either, we can just define the syntax.

OK I'll check the HTTP cookie spec for details.

>>>>> cache="no"
>>>>>  - no reason for caching the file will not be used again
>>>> Perhaps a thumbnail related to file transfer or some other
>>>> ephemeral image?
>>>>
>>>>> cache="session"
>>>>>  - we suggest the recieving party only caches for this
>>>>>    particular session
>>>> Perhaps also a thumbnail, or an image related to a whiteboarding
>>>> session?
>>>>
>>>>> cache="12"
>>>>>  - we suggest caching for twelve days from the last use of this
>>>>> cid (!)
>>>>>  - for every use (recieved reference) the recieving client should
>>>>> reset the date we count from
>>>> Perhaps images included in an XHTML notification from a blogging
>>>> service or somesuch?
>>>>
>>>>> cache="unlimited"
>>>>>  - we suggest the client picks the longest time it allows (it
>>>>> could possibly cache some small pieces of data permanenty)
>>>> Perhaps a commonly-used emoticon?
>>>>
>>> Good use cases, thanks.
>>>
>>>>> Of course, the client MAY ignore the caching hit. In this case it
>>>>> SHOULD NOT cache at all.
>>>> Why not? My client could ignore caching hints because it has its
>>>> own local policy (e.g. cache images only from people in my
>>>> "Friends" group, but cache those forever because I want to keep
>>>> them in message history). Or my client could ignore caching hints
>>>> because it simply can't cache images (no room on the device, web
>>>> client, etc.).
>>>>
>>> I don't know, really :).
>> Well it seems a bit strong to say you SHOULD NOT cache in those 
>> instances. Just leave it up to the implementation.
> 
> If we mimic HTTP even in this respect, missing cache would mean
> session-only (possibly other user's online session).
> 
>>>>> If the cache attribute is not specified, we should decide on a
>>>>> reasonable default value ('session' or '1' day both seem good to
>>>>> me).
>>>> I think that's up to the client.
>>>>
>>> A reasonable default makes no harm, does it? :)
>> I suppose '1' day is OK, or 'session' if define what we mean by that.
>>
> 
> If we take the way of HTTP, this is a nonissue.

OK, let's do that then.

Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 7338 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mail.jabber.org/pipermail/standards/attachments/20080731/53add00c/attachment.bin 


More information about the Standards mailing list