[Standards] XEP-0277 "Microblogging over XMPP" and the Atom data format

Guus der Kinderen guus.der.kinderen at gmail.com
Thu May 20 15:08:37 CDT 2010


Hi Bear,

Thanks for taking the time to respond. I'm completely new to Atom, so
getting feedback from someone who has used it intensively before is very
valuable to me.

I've included my replies inline.

 - Guus

On 16 May 2010 20:44, bear <bear42 at gmail.com> wrote:

> On Sun, May 16, 2010 at 09:03, Guus der Kinderen
> <guus.der.kinderen at gmail.com> wrote:
> > Hi all,
> >
> > Recently, I have been working on an XMPP gateway (XEP-0100 "Gateway
> > Interaction" style) that exposes Twitter functionality in a way compliant
> > with XEP-0277 "Microblogging over XMPP". While coding, a number of
> questions
> > and remarks related to this last XEP popped up.
> >
> > The XEP specifies that pubsub to publish and receive microblog posts (the
> > XEP does indicate that for posting, an alternative interface can be
> used).
> > The pubsub items used in the examples are using an Atom-based data
> format.
> >
> > My first question: the XEP does not specify explicitly that the Atom data
> > format MUST/SHOULD be used. Can other formats be used as well? I feel
> that
> > there is room for interpretation here. This can lead to implementations
> that
> > are XEP compliant, but are not compatible with other implementations.
> Should
> > the Microblogging XEP specify more exact what data format should be used?
>
> I think that would be a good change as Atom has become the default
> canonical format for this realm.
>
> > Why was the Atom-based data format chosen? In my opinion, there are a
> number
> > of characteristics that do not make it "fit" to the purpose:
>
> Atom has been chosen, from what I can gather and also from my own
> opinion, because it is now the format used for ActivityStreams,
> PubSubHubbub, OStatus and the majority of the large consumers and
> providers of feed data.
>
> It is also a "proper" XML format which gives it a lot of advantages in
> the XMPP world, but that's secondary to the prior reasons IMO.
>
>
I'm not a big fan of re-inventing the wheel myself, but I do feel that for
the purpose of simple microblogging, Atom is overkill. It adds a lot of
complexity (us having this conversation as an early result), which we can do
without. XMPP can be complex enough in itself - adding yet another
technology in the mix raises the bar considerably.

Having the microblogs be very portable could be valuable, I agree with you
there completely. However, I think that ensuring portability is not a core
responsibility of the Microblogging XEP itself. Instead, I believe that this
could be the responsibility of some kind of extension to the XEP (or a
gateway implementation perhaps). Such an extension to the Microblogging XEP)
could be used to translate a light-weight XMPP representation of a microblog
entry into the more complex, but more portable Atom-based representation,
for example.

The current XEP appears to be open to something like this. It reads: "Romeo
can publish a post via any interface provided by his service, such as a
website, the Atom Publishing Protocol (see RFC
5023<http://tools.ietf.org/html/rfc5023>[
8 <http://xmpp.org/extensions/xep-0277.html#nt-id298333>]), SMS, an IM bot,
or XMPP pubsub. Here we assume that the post is provided via XMPP pubsub."
Then, it continues to wrap Atom in Pubsub in the following examples, but as
I wrote earlier, there doesn't appear to be a requirement that the pubsub
entries are Atom-formatted. I would like to see such a formatting
requirement to be explicitly defined (as that would help interoperability,
but I think we both agree there), and, for the reason mentioned above, be a
lot simpler than the current suggested Atom format.


> > Atom requires a title for each entry. In the context of a microblog, this
> > requirement doesn't make much sense to me. The examples in the XEP use
> the
> > atom:title element to hold the text of the post. I would argue that this
> is
> > done more appropriately in a atom:content element instead.
>
> In the case of a post or update that does not have a Title per se, the
> Atom spec says that the content of the post should be placed in the
> Title element and the Content element should be empty.  This rule is
> also listed as a MUST in the ActivityStreams spec:
>
> http://activitystrea.ms/schema/1.0/activity-schema-01.html#article
>
>
I believe you are referring to the definition of a title property of an
Article object, which reads:

"The title of the entry. Included as the content of the atom:title element.
This element MUST be included with empty content if the entry does not have
a title."

I believe that this means something different than what you say. I think the
line above instructs us to include an empty title element (eg:
<title></title>) if the article does not have a title. The usage of the word
"content" is horrible in this definition, as the same word is used to
describe another property of the Article later on in the text.


As for the Atom specs, are you referring to paragraph 4.1.1.1 of RFC4287?

"(...) It is advisable that each atom:entry element contain a non-empty
atom:title element, a non-empty atom:content element when that element is
present, and a non-empty atom:summary element when the entry contains no
atom:content element. (...)"

That's no requirement. To me, this is indicating a SHOULD on 'have a
non-empty title' and another SHOULD on 'have either non-empty content or
summary'. To me, this does not indicate that you should use the content as a
title to prevent a non-empty title. I would say that that is even worse, as
now you're still breaking a SHOULD-condition, but also misrepresent data.

A little bit further (section 4.1.2, one-before-last bullet point), the same
spec reads:

"atom:entry elements MUST contain exactly one atom:title element."

Combined, this again leads me to believe that the most correct
interpretation is: "include an empty title and a non-empty content element."


As an alternative, one could perhaps consider adding a semi-hardcoded title
element in cases where the implementation is known never to have a title
otherwise (eg: "Microblogged by Guus"). This would prevent empty title
elements, at the expense of allowing for content to be generated by someone
else than the original content provider. I'm not sure yet if I'll like
this...

> Atom requires a unique identifier (atom:id) for each entry. Is this
> > appropriate in a use case where content is being created by a client (as
> > opposed to created content being distributed by the service provider)? In
> my
> > gateway implementation, I can't think of a unique identifier that I can
> use
> > when a client is generating a microblog post on the legacy service.
> Instead,
> > the unique identifier is generated by the legacy domain. I feel that this
> > argument holds true, even in a more generic context than my Legacy
> Gateway
> > implementation: it is not uncommon for service providers to generate and
> add
> > a unique identifier to an object created as a result of a user request.
> By
> > using the Atom data format, the XEP is less flexible.
>
> The unique identifier in your example would be something based on date
> and time of receipt or generation of the post *and* you should then
> include a Source element that outlines what the legacy system is using
> to identify the item, including a URL to the item if possible.
>
> In general anything that flows thru a Gateway should do this, see the
> Salmon Protocol and also the Atom Threading protocol
>
> http://salmon-protocol.googlecode.com/svn/trunk/draft-panzer-salmon-00.html
> http://tools.ietf.org/html/rfc4685
>
> > Atom requires an author for each entry. This appears redundant to me -
> the
> > pep service itself is related to the author, and posts on a microblog are
> > not likely to be authored by someone else than the owner of the blog.
> > Nonetheless, Atom spec requires this element to exist.
>
> The Author element is present to allow for downstream consumers of the
> Atom item to be able to have a URL that points to the author without
> having to discover thru web crawling what that author is.  At the
> minimum you just need to provide a URL to the identity url of whoever
> generated the post.
>
> > That's what I ran into so far. I'd be happy to receive your insights,
> > comments and remarks.
>
> IMO the reason to use Atom boils down to the fact that a *lot* of
> activity by some very bright and active people have focused on Atom,
> with some of them extending Atom but that in itself is another reason
> to use it, and they are now generating content that could easily flow
> thru your gateway with minimal processing and still retain a lot of
> the metadata from the source while allowing you to add your own
> metadata.
>
> >
> > Regards,
> >
> >   Guus
> >
>
> p.s. my background in all of this is very behind the scenes and I was
> working on quite a few Atom based services at Seesmic before they
> pulled the plug on the project :(
>
> I'm hoping some of the others in the XSF who are Atom gurus will step
> in and correct or add to my thoughts above.
>
>
> --
> Bear
>
> bear at xmpp.org (email)
> bear42 at gmail.com (xmpp, email)
> bear at code-bear.com (xmpp, email)
> http://code-bear.com/bearlog (weblog)
>
> PGP Fingerprint = 9996 719F 973D B11B E111  D770 9331 E822 40B3 CD29
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20100520/035893b9/attachment.htm>


More information about the Standards mailing list