[Operators] netops meeting / radar redux

Pedro Melo melo at simplicidade.org
Tue Jun 9 13:23:21 CDT 2009


Hi,

On May 29, 2009, at 10:22 PM, Peter Saint-Andre wrote:

> A while back we talked about the possibility of building a site like
> http://www.mailradar.com/ but for XMPP. Florian Thiessen and I just  
> had
> a chat about that, so I figured I'd try to restart the conversation.
> Perhaps we can have a groupchat about it soon? I propose to make this
> (and other matters related to network operations) the topic of our  
> next
> "Monthly XMPP Meeting":
>
> Date: Tuesday, June 9, 2009
> Time: 19:00 UTC (check your local time!)

eh, 40 minutes to spare :)

At the time I really liked this idea and I started to write a small  
system to track this. I've been working on it on and off, and it  
mostly works right now. I need to write a small web interface and put  
it online.

I'm pasting my "braindump document" with all the ideas, problems,  
questions, doubts that occured to me while thinking about this. I  
don't claim it to be complete, but it should cover a lot of ground.

See you in 37 mins.

----8<-----
Notes for a XMPP Radar
======================

What to collect
---------------

The XMPP radar keeps a list of domains with related bits of information.
For each domain, we should keep:

  * XMPP-relevant DNS records: does it have C2S, S2S or TXT records
    (as defined in XEP-0156: Discovering Alternative XMPP
    Connection Methods);
  * Certificates presented on C2S and S2S connections: in particular,  
the
    CA, issue and expire date and id-on-xmppAddr entries;
  * Advertised features: for both C2S and S2S connections. Some features
    are only advertised after some negotiation (for example plain text
    authentication only after TLS). The system should try at least
    STARTTLS to see if other features are offered;
  * disco#info information: collected over S2S. We could collect this
    over C2S also if the server offers account creation, but I find that
    a bit intrusive;
  * disco#items information: collected over S2S. Each new domain would  
be
    added to the database and treated as a new standalone domain;
  * server vCard if available;

We should keep track of the date when we discovered each piece of
information, as well as the date that the information is updated. Keep a
history of the changes would be even better.

This list is based on current XEP's. There is some talk about a new
format for server contact information. If those efforts pan-out, we
should also support it.


Access methods
--------------

The information collected would be stored in a database, and queries to
the database could be made via:

  * an HTTP interface;
  * Ad-Hoc command;
  * Jabber search protocol.

Also interesting would be a light REST-based API, providing JSON, JSONP,
and XML results.


Collection Methods
------------------

How can we grow the database? These are my initial ideas:

  * use the search interface: if someone queries the domain X, and
    we don't have that in our database, add it - is this kosher,
    privacy-wise?
  * create small scripts to parse common XMPP server logs: we could
    ask big server operators to run some scripts daily and send us
    the domains found - again, privacy issues, but each server has
    his own privacy rules. Also, we wouldn't be disclosing bare
    jid's, only domains;
  * roster analysis: script provided for most common XMPP servers to  
scan
    the rosters - same issues as previous method;
  * disco#items on some servers list S2S connections;
  * list of public servers kept by the XSF.

A more aggressive approach would be to ask top level DNS maintainers for
dumps of their databases, and crawl common names:

   * domain;
   * xmpp.domain;
   * im.domain;
   * jabber.domain.

So far I wasn't able to find out if the list (for example) of the
.{com,org,net} domains is public or not. A request from a non-profit
foundation, like the XSF, has better chances of getting this lists.

A final approach that is being pursued: crawl the IP address space. I
got a BGP dump (about 300k networks), expanded all to /24 or smaller,
and generated a list of IP addresses per network. The next step is to
shuffle this set of networks. Then, we would pick an address per
network, and try ports 5222, 5223 and 5269.

If a TCP connection is successful, we can try and "fingerprint" the
server on the other side, tricking him to give us one of the domains
that he hosts.

Preliminary tests indicate that some common servers are suceptible to
this fingerprint process and would give us a decent number of domains.

My preliminary calculations give me less than a probe per minute per
network. So DoS protections shouldn't be a problem. I expect to be able
to scan 10k IPs in parallel, with a 10 second timeout. No idea on how
long it would take to scan the entire internet.

Prior art on scanning IP networks for mail and DNS servers:
http://cr.yp.to/surveys.html


Database Onwership
------------------

Who owns the database? Who has raw access to it?

I don't have answers here. I would prefer that the ownership of the
database would stay with the XSF, but I don't know if the XSF is or
isn't the proper place for this.


Privacy issues
--------------

Some domain operators might not want there domain listed, or searchable.

There is no XMPP-based protocol to specify privacy policy for a domain.

For now my best solution (not great, but should work right now until
something better) is this: publish a TXT record as _xmpp-policy.domain
with a HTTP URL for a XML document (schema TBD) with the privacy policy.

As a fallback plan, try the well known URL http://domain/xmpp-policy.xml

------8<-----

Best regards,



More information about the Operators mailing list