previous up next
Previous: 1 Introduction Up: Two Attacks against the Next: 3 Infestation of the

Subsections


2 Description of the 01/02/2005 enquiry

2.1 A larger experimentation, 01/02/2005, 23h00

In order to obtain a larger basis of investigation, we have asked for "all" answers. This new request was launched at 01/02/2005, 23h00. With a loop and &start=x00, we have collected all the "496 pertinent answers among 16100" given by Google. alg:Cutting shows how to cut the Google's answers into lines, one line per page quoted.


maths

The most apparent thing was that between our two queries, i.e. between 12h00 and 23h00, many web sites have started to publish their statistics concerning February... and it appears that many of them were under attack.

2.2 Classification according to titles

alg:Title shows how to extract the title part of a Google line. Among the titles of the 496 pages responding to "inces+incest+ncest", it appears that :


300
pages were starting by "Usage Statistics for" (selection by a simple grep)
035
pages were customized Webalizer pages (hand selected according to their title)
161
other pages were quite all related to pornography... and mostly situated at the bottom of the Google's list of 496 items. In the first page of 100 items, only 3 sites weren't Webalizer pages.


maths

In a Webalizer page, the month is given by default. Among the 300+35 hand-recognized "incest tagged" Webalizer pages,


259
pages were explicitely January related
069
pages were explicitely February related
007
pages without month in the title

2.3 Classification according to addresses

A better classification can be obtained according to the Internet address of the pages. alg:Address shows how to extract the address part of a Google line. It appears that :


220
pages were named usage_200501*
070
pages were named usage_200502*
044
pages were named search_200501* (and none search_200502*)
162
pages were named otherwise (collected in aaa_addresses, using some "grep -v")


maths

To make a selection among the 162 others, an analysis of the words used to build the site names appears to be useful. alg:Sensible describe how to obtain this list of words.


maths

Among the 346 words obtained, we have selected a sublist of 53 potentially sensible words. alg:Non-explicitely shows how to select, among the 162 "other" pages, those that doesn't contain potentially sensible words. We have obtained :


025
"other" pages with ordinary looking names
137
"other" pages were named using at least one potentially sensible words. A direct examination has shown that no false positive has occurred... only false negative (cf. § sub:Among-the-ordinary) .


maths


2.4 Among the ordinary looking

We have loaded all the 25 "ordinary looking pages" pages for a direct enquiry. In our opinion, these pages can be classified as follows :


06
Webalizer related pages
03
empty pages (when loaded from the address given by Google).
10
avatars of some of the 137 former "explicit" looking pages.
03
"active, but not malicious" pages.
03
potential threatens.

Among the Webalizer related pages, there were a message of a webmaster describing the apparition of the "incest" query in his own statistics. To see that at least another webmaster (www.radioring.de) is not sleeping is great. There were two Webalizer.current pages (internal state of the Webalizer program) and one customized Webalizer page, where the "bad" Referrers are occurring in the "User Agent" page. We have added these sites to our basis. Two other pages, created by awstats and equally under the incest attack, are to be mentioned.

The 10 "avatars" haven't been detected due to ambiguous word detection. For example, latex is not only a program for typesetting, and it seems that smoking could be latex related. Moreover, some of these Internet addresses where sharing ip addresses with other of the 496 sites. As a rule of thumb, when several Internet addresses happen to share the same ip address, you better discard than hold the corresponding pages.

The "active, but not malicious" pages were a trap for keyboard-errors (page containing many current words, each with a lot of miss-spelled keys are listed) and two javascript-scrambled pages that, after execution, appear as related to several explicit sites. This usage of lowbrow cryptography appear more related to masquerading than to threatening.

The remaining three pages are more suspect. They were respectively the 1st, 130th and 131th of the global list. Beside a cryptic title "wmf?", and a redirection towards an offending web site.
<script>document.location='http://x-incest.net/inces/';</script>  
these pages seem to contain a mechanism for replicating. A lot of similar pages are embedded inside the reply chain of the forum starting at :
http://www.angeltowns.com/members/nopsetwar/wwwboard/index.html 
where these pages are inserted despite an apparent disorder in number and date.

2.5 Among the explicit names

A query about the ip address of the explicitely named sites shows that :


028
sites have their own ip address (not shared with another of the 137)
050
different ip addresses are appearing together : the remaining 22 ip addresses are shared by 109 sites.


previous up next
Previous: 1 Introduction Up: Two Attacks against the Next: 3 Infestation of the


douillet@ensait.fr
2005-02-25