[Standards] XEP-0231 (Data Element) - local caching

Pavel Simerda pavlix at pavlix.net
Tue Jul 29 21:40:15 CDT 2008


On Tue, 29 Jul 2008 19:49:01 -0600
Peter Saint-Andre <stpeter at stpeter.im> wrote:

> Ahoj Pavle!
> 
> Pavel Simerda wrote:
> > Hello,
> > 
> > I have some suggestions for XEP-0231 (Data Element).
> 
> Thanks for looking at this spec so thoroughly.
> 
I actually have some questions. First, lolek from the jabbim.cz project
is going to propose a XEP for text emoticons. I like his ideas but I
suggested him to use Data Element instead of a custom solution.

He still has doubts but I promised him to try to sort it out and to
help him with language corrections of his document too.

I didn't find in the specs what should be used for domain ID in the
CID. The examples apparently use the domain part of JID that is not
unique for the clients. I looked at the RFC and still don't know a
proper mapping to XMPP.

His original idea was to use a cryptographic hash function and not a
CID.

He also pointed out he misses a feature that would allow a client to
advertise which mimetypes it supports.

This is another questions... if it's just emoticons, should we just
support png and mng types or add some accept-advertisement facility?

Is there a written policy for image formats in XMPP extensions?

> > Right now, as the example shows:
> > 
> > <message from='ladymacbeth at shakespeare.lit/castle'
> >          to='macbeth at chat.shakespeare.lit'
> >          type='groupchat'>
> >   <body>Yet here's a spot.</body>
> >   <html xmlns='http://jabber.org/protocol/xhtml-im'>
> >     <body xmlns='http://www.w3.org/1999/xhtml'>
> >       <p>
> >         Yet here's a spot.
> >         <img alt='A spot'
> >              src='cid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 at shakespeare.lit'/>
> >       </p>
> >     </body>
> >   </html>
> >   <data xmlns='urn:xmpp:tmp:data-element' 
> >         alt='A spot'
> >         cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6 at shakespeare.lit'
> >         type='image/png'>
> >     iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
> >     C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
> >     AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
> >     REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
> >     ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
> >     vr4MkhoXe0rZigAAAABJRU5ErkJggg==
> >   </data>
> > </message>
> > 
> > Note: in this particular example the data is very short, this may
> > not be the case in real world where people tend to ignore the size
> > of data they send.
> 
> Yes, that's just about the smallest image I could find. The spec says 
> that the image should not be more than 8k (which is twice the
> suggested size of an IBB chunk) but we don't know if people will
> typically send images that are smaller or larger than 8k -- I think
> smaller but I don't know that yet.
> 

Might it be advertised by the client/server? And rejected if the other
party tries to send a bigger one (just to force them to fix it)?

> > We send data once for every session (and omit for subsequent
> > messages).
> 
> In this case it's important to define "session" (see rfc321bis). Is
> it a chat session, a presence session, or something else?
> 

Exactly.

> > This has two important implications:
> > 
> > 1) The other entity may or may not cache it for the session and
> > reuse it. That is good.
> > 
> > 2) If an entity keeps the data for a longer time (e.g. for weeks
> > or even permanently), this cache will never be used. As the sending
> > entity always resends the data for a new session.
> > 
> > What I propose is:
> > 
> >  * By default the sending entity would not send the data. It would
> >    merely reference it by its cid url.
> >  * Let the recieving client follow "3.4 Retrieving Uncached Media
> > Data" if the data is not cached (no real change, this is already
> > being done).
> 
> I think I like that approach. It introduces a round trip for the IQ, 
> which might introduce some latency. But it puts the burden for
> "storing" and "serving" the image on the sender, which might
> discourage abuse of in-band images.
> 
> >  * Reserve the possibility of sending the data immediately with the
> >    message for the *specific* case that the sending client actually
> >    knows the recieving party cannot have the data cached (e.g. the
> >    data was never sent before). This behavior should be considered
> >    optional.
> 
> In that case the sender needs to keep a list of every JID to which it 
> has ever sent the image. That seems suboptimal.

I didn't write it exactly as I meant it. There may be cases we are
knowingly sending something really new. But we might just as well drop
this feature if you think it's better.

I'm afraid some people will object.
 
> And I suppose the recipient might have received the image from
> another sender at some point, or might have received the image
> through other means (e.g., an emoticon "bundle").

The problem is... that we really want the users to get what we send
them. If they got it from someone else, we need to secure it by a hash
function, not a mere ID. It would have to actually check the hash
when caching.

Another issue would be the particular hash functions. Some client
authors or users may want to prevent using data from third parties
protected by weak hash functions.

That's why I only considered caching per sender JID.

If we want to use hashes... and third party data, we should use some
specific "hostnames", possibly sha256.cid.xmpp.org for sha256 or
something like that.

> > I further propose we add some informational section about generation
> > of CIDs. Although it's specified elsewhere, I believe this XEP will
> > be very useful and will be referenced from many future XEPs (and
> > maybe improved as well - possibly some server caching etc). I think
> > the informational section could suggest UUIDs generated by hashing
> > the actual content.
> 
> Yes I think that would be helpful.
> 
> > Another thing that could be considered... is to add some sort of
> > caching hint attribute that would suggest how long its reasonable to
> > cache a particular resource. 
> 
> Do you think that would really be helpful? I'm still thinking about
> it...
> 

This feature would be optional, so it's easy to add it when we think
it's useful. Right now I have no idea :).

> > Maybe we could borrow from HTTP Cookies
> > but allow (suggest) the clients to have some mechanisms for
> > limiting the time, size and number of cached objects.
> > 
> > There are many possibilities, I will just describe one of them.
> 
> Do you have examples of these?
> 

The attribute values could be stated more abstractly... like...
"session", "short", "medium", "long" with recommended defaults, for
example. But usually the sender knows better.

> > cache="no"
> >  - no reason for caching the file will not be used again
> 
> Perhaps a thumbnail related to file transfer or some other ephemeral
> image?
> 
> > cache="session"
> >  - we suggest the recieving party only caches for this
> >    particular session
> 
> Perhaps also a thumbnail, or an image related to a whiteboarding
> session?
> 
> > cache="12"
> >  - we suggest caching for twelve days from the last use of this cid
> > (!)
> >  - for every use (recieved reference) the recieving client should
> > reset the date we count from
> 
> Perhaps images included in an XHTML notification from a blogging
> service or somesuch?
> 
> > cache="unlimited"
> >  - we suggest the client picks the longest time it allows (it could
> >    possibly cache some small pieces of data permanenty)
> 
> Perhaps a commonly-used emoticon?
> 

Good use cases, thanks.

> > Of course, the client MAY ignore the caching hit. In this case it
> > SHOULD NOT cache at all.
> 
> Why not? My client could ignore caching hints because it has its own 
> local policy (e.g. cache images only from people in my "Friends"
> group, but cache those forever because I want to keep them in message
> history). Or my client could ignore caching hints because it simply
> can't cache images (no room on the device, web client, etc.).
> 

I don't know, really :).

> > If the cache attribute is not specified, we should decide on a
> > reasonable default value ('session' or '1' day both seem good to
> > me).
> 
> I think that's up to the client.
> 

A reasonable default makes no harm, does it? :)

> > Cheers,
> > Pavel
> 
> Thanks!
> 
> /psa

Good night,
Pavel

-- 

Web: http://www.pavlix.net/
Jabber & Mail: pavlix(at)pavlix.net
OpenID: pavlix.net


More information about the Standards mailing list