[Standards-JIG] bot-challenge proto-JEP

Sander Devrieze s.devrieze at pandora.be
Wed Aug 31 15:35:45 UTC 2005


Op woensdag 31 augustus 2005 13:37, schreef Bart van Bragt:
> Sander Devrieze wrote:
> > I have not yet readed the whole proto-JEPs, but I think image and audio
> > recognition should be removed because of accessibility problems.
>
> Clients that are used by blind people can specify that they don't want
> the image.

I guess spimmers will just make their bots blind then! ;-)

> Using just text questions is going to lead to a lot of 
> problems, most importantly i18n issues.

I don't think this is a problem:
* If you send a message to someone, you will do it in the language(s) he 
knows. So in that case you also will know the language of his questions. If 
you only know Dutch for example, it is no problem that your question is only 
in Dutch. In fact it is an extra protection against Chinese spimmers ;-) If 
you know different languages, you can make your question multi-lingual 
(xml:lang).
* If you are an admin of a public server with in band registration enabled, 
you probably are targeted on a few language markets only. In this way you 
just need to make questions available in all these languages. If you are 
targeted on many languages like Google, and have enough resources to make 
your whole website multi-lingual, I guess it will be no problem to also 
translate a list of questions...
* Also remark that languages are "open protocols": if you really want to 
register on a Dutch server without knowing the language, you can try to use a 
dictionary. ;-)

> > About "Text Question and Answer".
> > 1) Common parts:
> > * The easiest will be multiple-choice.
>
> Which will also be fairly easy to guess. You can't present 20 choices to
> a human user so spammers can just try it until they have a hit.

Yes and no. You are right that the spimmer can guess, but if the spimmer 
sended a wrong answer:
* He will get a new, other question to answer when used on in-band 
registration (remark that this also can help humans that accidentially 
entered the wrong answer :D ). So he has another set of posibilities and not 
the same.
* He will get at least a different set of answers. (e.g. the server will pick 
5 answers from a list of 100 answers)
* It will cost more bandwidth/CPU usage.
* The server that the spimmer is brute-force-attaching, can set an interval: 
if the question was wrong, it will not allow new answers from that IP(in-band 
registration)/user(privacy lists) for a while.
* If an IP generated (used on in-band registration) let's say 1000 question 
requests (in an hour) because of wrong answers, the server can automatically 
blacklist that IP (for some time).

But asking problems might be even better indeed:

> > * Problems should be maybe also possible. E.g: "There are three people
> > jabbering to eachother. Jef arrives with his car while he hits two of
> > them. This car contains also his child and a cat. The eldest he hit died.
> > How many living human beings do we have at the end of this story?"
>
> I have absolutely no clue, you lost me :)

<off-topic>
Maybe these question can be a killer-feature for Jabber! A teacher can oblige 
his pupils as homework to get a Jabber ID on the school's Jabber server. To 
verify their homework they need to send a subscription request to the 
teacher's Jabber ID and answer another question :-D

PS: of course schools and teachers will make more difficult questions ;-)
</off-topic>

> > * The *user* can set his own question and answers.
>
> 99% of the users are not going to do this and they are going to stick
> with the defaults which is going to make the life of spammers _very_
> easy.

* If you get much spim, there is a very good motivation to invest some time on 
it...
* Client developers should make it very easy (much easier than setting up a 
baysian filter on your mail server for example).
* At least I will be protected! :D

> Same with default questions on server installations.

There will be no default questions on server installations.

> Also keep in  
> mind that this method is not going to work if the CAPTCHA isn't really
> close to perfect. Spammers can abuse the system as soon as they can get
> a few percent hits...

CAPTCHA needs good OCR.
These questions (especially the problems that do need interpretation and 
abstract thinging) will need extremely strong A.I. and bots also need to be 
multi-lingual if they want to be applicable on all servers/users...

-- 
Mvg, Sander Devrieze.

xmpp:sander at devrieze.dyndns.org ( http://jabber.tk/ )



More information about the Standards mailing list