Previous: 1 Introduction
Up: Two Attacks against the
Next: 3 Infestation of the
Subsections
2 Description of the 01/02/2005 enquiry
In order to obtain a larger basis of investigation, we have asked
for "all" answers. This new request was launched
at 01/02/2005, 23h00. With a loop and &start=x00, we have collected
all the "496 pertinent answers among 16100" given
by Google. alg:Cutting shows how to cut the Google's
answers into lines, one line per page quoted.
The most apparent thing was that between our two queries, i.e. between
12h00 and 23h00, many web sites have started to publish their statistics
concerning February... and it appears that many of them were under
attack.
alg:Title shows how to extract the title part of a Google
line. Among the titles of the 496 pages responding to "inces+incest+ncest",
it appears that :
|
| pages were starting by "Usage Statistics for" (selection
by a simple grep) |
|
|
| pages were customized Webalizer pages (hand selected according to
their title) |
|
|
| other pages were quite all related to pornography... and mostly situated
at the bottom of the Google's list of 496 items. In the first page
of 100 items, only 3 sites weren't Webalizer pages. |
|
In a Webalizer page, the month is given by default. Among the 300+35
hand-recognized "incest tagged" Webalizer pages,
|
| pages were explicitely January related |
|
|
| pages were explicitely February related |
|
|
| pages without month in the title |
|
A better classification can be obtained according to the Internet
address of the pages. alg:Address shows how to extract
the address part of a Google line. It appears that :
|
| pages were named usage_200501* |
|
|
| pages were named usage_200502* |
|
|
| pages were named search_200501* (and none search_200502*) |
|
|
| pages were named otherwise (collected in aaa_addresses, using some
"grep -v") |
|
To make a selection among the 162 others, an analysis of the words
used to build the site names appears to be useful. alg:Sensible
describe how to obtain this list of words.
Among the 346 words obtained, we have selected a sublist of 53 potentially
sensible words. alg:Non-explicitely shows how to select,
among the 162 "other" pages, those that doesn't
contain potentially sensible words. We have obtained :
|
| "other" pages with ordinary looking names |
|
|
| "other" pages were named using at least one potentially
sensible words. A direct examination has shown that no false positive
has occurred... only false negative (cf. § sub:Among-the-ordinary)
. |
|
2.4 Among the ordinary looking
We have loaded all the 25 "ordinary looking pages"
pages for a direct enquiry. In our opinion, these pages can be classified
as follows :
|
|
|
| empty pages (when loaded from the address given by Google). |
|
|
| avatars of some of the 137 former "explicit" looking
pages. |
|
|
| "active, but not malicious" pages. |
|
|
|
Among the Webalizer related pages, there were a message of a webmaster
describing the apparition of the "incest" query
in his own statistics. To see that at least another webmaster (www.radioring.de)
is not sleeping is great. There were two Webalizer.current pages (internal
state of the Webalizer program) and one customized Webalizer page,
where the "bad" Referrers are occurring in the "User
Agent" page. We have added these sites to our basis. Two
other pages, created by awstats and equally under the incest attack,
are to be mentioned.
The 10 "avatars" haven't been detected due to ambiguous
word detection. For example, latex is not only a program for typesetting,
and it seems that smoking could be latex related. Moreover, some of
these Internet addresses where sharing ip addresses with other of
the 496 sites. As a rule of thumb, when several Internet addresses
happen to share the same ip address, you better discard than hold
the corresponding pages.
The "active, but not malicious" pages were a trap
for keyboard-errors (page containing many current words, each with
a lot of miss-spelled keys are listed) and two javascript-scrambled
pages that, after execution, appear as related to several explicit
sites. This usage of lowbrow cryptography appear more related to masquerading
than to threatening.
The remaining three pages are more suspect. They were respectively
the 1st, 130th and 131th of the global list. Beside a cryptic title
"wmf?", and a redirection towards an offending web
site.
<script>document.location='http://x-incest.net/inces/';</script>
these pages seem to contain a mechanism for replicating. A lot of
similar pages are embedded inside the reply chain of the forum starting
at :
http://www.angeltowns.com/members/nopsetwar/wwwboard/index.html
where these pages are inserted despite an apparent disorder in number
and date.
A query about the ip address of the explicitely named sites shows
that :
|
| sites have their own ip address (not shared with another of the 137) |
|
|
| different ip addresses are appearing together : the remaining 22 ip
addresses are shared by 109 sites. |
|
Previous: 1 Introduction
Up: Two Attacks against the
Next: 3 Infestation of the
douillet@ensait.fr
2005-02-25