[Standards] Pub/Sub & RSS
kellyrocks at live.ca
Thu May 22 17:52:14 UTC 2008
> Date: Thu, 22 May 2008 10:53:34 -0600> From: stpeter at stpeter.im> To: standards at xmpp.org> Subject: Re: [Standards] Pub/Sub & RSS> > On 05/21/2008 10:09 PM, Kelly S wrote:> > Thanks for the reply!> > > > I took a look at what your saying about NodeID and I understand a bit> > more clearly now. Wow the Pub/Sub spec is large!> > Don't be afraid, it's just comprehensive. :)> > > Anyways. I'm thinking of writing a service where users can> > "subscribe" to feeds off the web. A service will be monitoring all> > "feeds" and pulling the RSS/Atom/whatever off of the web and> > populating the Pub/Sub nodes in batch so users get notifications.> > > > So really the service will create the new node to represent the news> > feed if its not already found in the Pub/Sub query and then add it to> > its "queue" of feeds to poll.> > > > I'm not quite sure what XEP to use to allow a user to "request"> > pulling of a feed. I haven't figured that part out yet.> > You can request the items in a feed by using this:> > http://www.xmpp.org/extensions/xep-0060.html#subscriber-retrieve>
Sorry I don't think I explained too well. Users will be requesting to add feeds to their account, which in turn will subscribe them to the corresponding pub/sub node.
But if someone requests a feed we are not "publishing" we need to begin publishing that new feed and subscribe the user to it.
>From then on that feed will be publishing new entries for when additional people start subscribing to it.
I'm not sure how to handle the user requesting a new feed that doesn't exist yet so that we can start pulling/publishing it.
> > I'm also not sure if I want to mix Atom nodes with RSS nodes etc. I> > could create 1 format to use for the "entry" and transform them to> > all match but for sites which add extra metadata to entries such as> > Digg with its DiggCount I would like to maintain that. That is the> > beauty of XML :)> > But RSS isn't XML. :P>
RSS isn't XML? I'm not sure what you mean lol. Are they not exposed as XML documents? lol.
> > That way Jabber clients who understand extensible items can display> > them.> > > > I like the idea of the NodeID being the feeds url because then this> > "polling" service can use the feeds URL easily as the SET after> > downloading content.> > Sure. :)> > > Any suggestions / recommendations would be great to hear!Another item> > I am unclear about is as I am polling for news data, how can I easily> > check if an "entry" exists already? I'd rather not have to keep a> > cache somewhere of all the items I have created already.> > > > Although performance is going to suck if I have to check every entry> > before inserting. Is there a way to batch insert, and disallow> > duplicate entries based on *something* like entry title or something?> > > > > > It's nice to be back in Jabber land :) I'm back into the old mindset> > where I have a ZILLION ideas rushing into my head about all the crazy> > things I can do with these XEPs lol.> > > > Any help would be great!> > > > Thanks so much!> > You might want to join this list for discussion:> > http://mail.jabber.org/mailman/listinfo/social> > "XMPP and Social Networking, Two Great Tastes That Taste Great Together!"> > /psa>
Also a bunch of concerns pop into my head that I'm still unclear about when maintaining all this "feed" data.
1. Is there any way I can just publish the latest feeds I pulled down and pubsub discard any duplicates that may already exist? Or is the right way to handle this is to query every single entry individually to check if they exist before publishing them?
1.a If I have to query each individually. Would I make the entry id the url of the entry so that I can check for duplicates as I pull them down off the web and read the entries? (example 4 old entries, but 1 new entry since last pull).
2. Entry requesting. Is there any sort of querying we can use against them? If we are publishing tons of entries, someone may want to browse/read them but only request X amount at a time, or only newer than a certain date etc. I don't think pulling the entire entry history of a couple months is going to be too efficient.
Some of my concerns for #2 is because we are going to mobilize this, data plans are expensive for mobile devices. In this country it can be $25/1.5MB/month.
We want to try and make our mobile client query data efficiently but do these capabilities exist in Pub/Sub?
Thanks so much for the help :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Standards