[Standards] XEP-0231 (Data Element) - local caching

Pavel Simerda pavlix at pavlix.net
Wed Jul 30 13:42:28 CDT 2008


On Wed, 30 Jul 2008 07:04:16 -0600
Peter Saint-Andre <stpeter at stpeter.im> wrote:

> Pavel Simerda wrote:
> > On Tue, 29 Jul 2008 19:49:01 -0600
> > Peter Saint-Andre <stpeter at stpeter.im> wrote:
> > 
> >> Ahoj Pavle!
> >>
> >> Pavel Simerda wrote:
> >>> Hello,
> >>>
> >>> I have some suggestions for XEP-0231 (Data Element).
> >> Thanks for looking at this spec so thoroughly.
> >>
> > I actually have some questions. First, lolek from the jabbim.cz
> > project is going to propose a XEP for text emoticons. 
> 
> Similar to XEP-0038? We can bring that back if someone wants to
> maintain it.

Similar but more powerful and not file-based but most probably based on
Data Elements. There may be a lot of other extensive changes. If these
changes can be made, I believe Martin would maintain it if he gets the
chance.

> > I like his ideas but I
> > suggested him to use Data Element instead of a custom solution.
> 
> +1
> 
> > He still has doubts but I promised him to try to sort it out and to
> > help him with language corrections of his document too.
> 
> Great, thanks.
> 
> > I didn't find in the specs what should be used for domain ID in the
> > CID. The examples apparently use the domain part of JID that is not
> > unique for the clients. I looked at the RFC and still don't know a
> > proper mapping to XMPP.
> > 
> > His original idea was to use a cryptographic hash function and not a
> > CID.
> 
> I think your idea of a UUID followed by the domain part of the JID
> would work well.
> 
> > He also pointed out he misses a feature that would allow a client to
> > advertise which mimetypes it supports.
> 
> Yes we can add a disco feature for that.
> 
> > This is another questions... if it's just emoticons, should we just
> > support png and mng types or add some accept-advertisement facility?
> 
> I don't think it hurts to define a way to advertise what MIME types
> you support. We'll use the data element for things other than
> emoticons, but IMHO the simplest approach would be to advertise in
> general which MIME types you support, not "I support these mime types
> for emoticons" and "I support these other mime types for file
> transfer thumbnails" etc. Does anyone think that level of complexity
> is needed?

I'm not sure. Let's wait for other comments.

> > Is there a written policy for image formats in XMPP extensions?
> 
> Not yet.

PNG for static raster images, MNG for animated raster images, SVG for
vector images? That's something I would expect from every client.

> >>> Right now, as the example shows:
> >>>
> >>> <message from='ladymacbeth at shakespeare.lit/castle'
> >>>          to='macbeth at chat.shakespeare.lit'
> >>>          type='groupchat'>
> >>>   <body>Yet here's a spot.</body>
> >>>   <html xmlns='http://jabber.org/protocol/xhtml-im'>
> >>>     <body xmlns='http://www.w3.org/1999/xhtml'>
> >>>       <p>
> >>>         Yet here's a spot.
> >>>         <img alt='A spot'
> >>>              src='cid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 at shakespeare.lit'/>
> >>>       </p>
> >>>     </body>
> >>>   </html>
> >>>   <data xmlns='urn:xmpp:tmp:data-element' 
> >>>         alt='A spot'
> >>>         cid='f81d4fae-7dec-11d0-a765-00a0c91e6bf6 at shakespeare.lit'
> >>>         type='image/png'>
> >>>     iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
> >>>     C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
> >>>     AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
> >>>     REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
> >>>     ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
> >>>     vr4MkhoXe0rZigAAAABJRU5ErkJggg==
> >>>   </data>
> >>> </message>
> >>>
> >>> Note: in this particular example the data is very short, this may
> >>> not be the case in real world where people tend to ignore the size
> >>> of data they send.
> >> Yes, that's just about the smallest image I could find. The spec
> >> says that the image should not be more than 8k (which is twice the
> >> suggested size of an IBB chunk) but we don't know if people will
> >> typically send images that are smaller or larger than 8k -- I think
> >> smaller but I don't know that yet.
> >>
> > 
> > Might it be advertised by the client/server? And rejected if the
> > other party tries to send a bigger one (just to force them to fix
> > it)?
> 
> I think that's handled at a different layer (e.g., rate limiting).
> But we do need to define better handling for stanzas that are too
> large (there is a proto-XEP about it but the Council didn't accept it
> and I never incorporated their feedback).
> 

Hmm. I know that people at jabbim.cz use a roster-renaming utility (for
icq transport). They wait a long time between stanzas and the renaming
can often takes more than just several minutes.

> >>> We send data once for every session (and omit for subsequent
> >>> messages).
> >> In this case it's important to define "session" (see rfc321bis). Is
> >> it a chat session, a presence session, or something else?
> >>
> > 
> > Exactly.
> > 
> >>> This has two important implications:
> >>>
> >>> 1) The other entity may or may not cache it for the session and
> >>> reuse it. That is good.
> >>>
> >>> 2) If an entity keeps the data for a longer time (e.g. for weeks
> >>> or even permanently), this cache will never be used. As the
> >>> sending entity always resends the data for a new session.
> >>>
> >>> What I propose is:
> >>>
> >>>  * By default the sending entity would not send the data. It would
> >>>    merely reference it by its cid url.
> >>>  * Let the recieving client follow "3.4 Retrieving Uncached Media
> >>> Data" if the data is not cached (no real change, this is already
> >>> being done).
> >> I think I like that approach. It introduces a round trip for the
> >> IQ, which might introduce some latency. But it puts the burden for
> >> "storing" and "serving" the image on the sender, which might
> >> discourage abuse of in-band images.
> >>
> >>>  * Reserve the possibility of sending the data immediately with
> >>> the message for the *specific* case that the sending client
> >>> actually knows the recieving party cannot have the data cached
> >>> (e.g. the data was never sent before). This behavior should be
> >>> considered optional.
> >> In that case the sender needs to keep a list of every JID to which
> >> it has ever sent the image. That seems suboptimal.
> > 
> > I didn't write it exactly as I meant it. There may be cases we are
> > knowingly sending something really new. But we might just as well
> > drop this feature if you think it's better.
> 
> If it's optional, it does no great harm. In fact it's not even a 
> feature, just an implementation note.

Ok.

> > I'm afraid some people will object.
> 
> Don't be afraid -- some people will always object. :)
> 

:D

> >> And I suppose the recipient might have received the image from
> >> another sender at some point, or might have received the image
> >> through other means (e.g., an emoticon "bundle").
> > 
> > The problem is... that we really want the users to get what we send
> > them. If they got it from someone else, we need to secure it by a
> > hash function, not a mere ID. It would have to actually check the
> > hash when caching.
> 
> Isn't that a bit paranoid for something as lightweight as emoticon
> bundles?
> 

The problem is that the Data Element could very soon be used for other
purposes. For me this is a grave security hole that might cause a real
headache in the future.

But I'm not only a bit paranoid :). Working privacy and security is
what originally brought me from ICQ to Jabber... only then I realized
how cool it actually is in other areas.

> > Another issue would be the particular hash functions. Some client
> > authors or users may want to prevent using data from third parties
> > protected by weak hash functions.
> > 
> > That's why I only considered caching per sender JID.
> 
> I suppose caching per sender JID makes sense, yes.
>

I suggest this if we don't take the cryptograhic way. Or we could take
both ways (let the implementors choose).

> > If we want to use hashes... and third party data, we should use some
> > specific "hostnames", possibly sha256.cid.xmpp.org for sha256 or
> > something like that.
> 
> Sure. If desired.
> 

It would be - for globally-shared data, so the IDs actually match.
The global-sharing feature should be optional anyway, so it can be
added at any time. No reason to defer implementations.

> >>> I further propose we add some informational section about
> >>> generation of CIDs. Although it's specified elsewhere, I believe
> >>> this XEP will be very useful and will be referenced from many
> >>> future XEPs (and maybe improved as well - possibly some server
> >>> caching etc). I think the informational section could suggest
> >>> UUIDs generated by hashing the actual content.
> >> Yes I think that would be helpful.
> >>
> >>> Another thing that could be considered... is to add some sort of
> >>> caching hint attribute that would suggest how long its reasonable
> >>> to cache a particular resource. 
> >> Do you think that would really be helpful? I'm still thinking about
> >> it...
> >>
> > 
> > This feature would be optional, so it's easy to add it when we think
> > it's useful. Right now I have no idea :).
> > 
> >>> Maybe we could borrow from HTTP Cookies
> >>> but allow (suggest) the clients to have some mechanisms for
> >>> limiting the time, size and number of cached objects.
> >>>
> >>> There are many possibilities, I will just describe one of them.
> >> Do you have examples of these?
> >>
> > 
> > The attribute values could be stated more abstractly... like...
> > "session", "short", "medium", "long" with recommended defaults, for
> > example. But usually the sender knows better.
> 
> Mimicking HTTP values is OK with me.
> 

No problem for me either, we can just define the syntax.

> >>> cache="no"
> >>>  - no reason for caching the file will not be used again
> >> Perhaps a thumbnail related to file transfer or some other
> >> ephemeral image?
> >>
> >>> cache="session"
> >>>  - we suggest the recieving party only caches for this
> >>>    particular session
> >> Perhaps also a thumbnail, or an image related to a whiteboarding
> >> session?
> >>
> >>> cache="12"
> >>>  - we suggest caching for twelve days from the last use of this
> >>> cid (!)
> >>>  - for every use (recieved reference) the recieving client should
> >>> reset the date we count from
> >> Perhaps images included in an XHTML notification from a blogging
> >> service or somesuch?
> >>
> >>> cache="unlimited"
> >>>  - we suggest the client picks the longest time it allows (it
> >>> could possibly cache some small pieces of data permanenty)
> >> Perhaps a commonly-used emoticon?
> >>
> > 
> > Good use cases, thanks.
> > 
> >>> Of course, the client MAY ignore the caching hit. In this case it
> >>> SHOULD NOT cache at all.
> >> Why not? My client could ignore caching hints because it has its
> >> own local policy (e.g. cache images only from people in my
> >> "Friends" group, but cache those forever because I want to keep
> >> them in message history). Or my client could ignore caching hints
> >> because it simply can't cache images (no room on the device, web
> >> client, etc.).
> >>
> > 
> > I don't know, really :).
> 
> Well it seems a bit strong to say you SHOULD NOT cache in those 
> instances. Just leave it up to the implementation.

If we mimic HTTP even in this respect, missing cache would mean
session-only (possibly other user's online session).

> >>> If the cache attribute is not specified, we should decide on a
> >>> reasonable default value ('session' or '1' day both seem good to
> >>> me).
> >> I think that's up to the client.
> >>
> > 
> > A reasonable default makes no harm, does it? :)
> 
> I suppose '1' day is OK, or 'session' if define what we mean by that.
> 

If we take the way of HTTP, this is a nonissue.

> Peter


-- 

Web: http://www.pavlix.net/
Jabber & Mail: pavlix(at)pavlix.net
OpenID: pavlix.net


More information about the Standards mailing list