Previous: 2 Description of the
Up: Two Attacks against the
Next: 4 Infestation of the
Subsections
3 Infestation of the January Usage pages
Let us now consider the January usage page of all the Webalizer sites
found among the 496 answers to our Google query. Collected together,
all the Webalizer pages found in the preceding section were belonging
to 311 different directories. alg:Collect describes a
php script that allows a parallel call to all the usage_200501.html
pages of these directories (a collect using a for loop will result
into a serial processing, and a too long delay).
Undertaken the 03/02/2005, this collect has conducted to :
|
| pages not available (while the web site was still responding) ; |
|
|
| restricted pages. Namely one "forbidden (404)" and
one password protected ; |
|
|
| ordinary January "Usage Statistic Pages". |
|
A standard Webalizer monthly page contains a Referrer table that list
the top referrers of the site, i.e. the sites whose consultation has
resulted to a jump to the site under study. Among the 305 usage pages,
we have found :
|
| pages without Referrer table (the Webalizer program can be configured
to disallow the appearance of this table). This can be checked by
the non appearance of the string "#TOPREFS" in
the page ; |
|
|
| pages with a regular Referrer table, the presence of which can be
checked by the appearance of the string "Total Referrers"
; |
|
|
| pages without a regular Referrer table, but with a link to this table
in the top line of the page. |
|
In our opinion these ten strange pages could be the result of a manual
destruction of the Referrer table in response of the infestation we
are describing in this section.
Among the 292 pages displaying a Referrer table, there were :
|
| pages with 35 referrers or more, the maximum being 100 |
|
|
| pages with 30 referrers (the Webalizer standard) |
|
|
|
|
| pages with 10 or less referrers, the minimum being 1 |
|
Using two recognizable tags, the Referrer tables themselves can be
extracted from the usage pages. alg:Table-ref describe
how to proceed.
A rough diagnosis about the infestation cross over the whole Internet
can be obtained by organizing these collected tables in a "ring"
as we have done at :
http://www.douillet.info/~douillet/ansecpb/ring_01/
The next step deals with what we call the Referrer lines. These lines
are the entries of one or another Referrer table that doesn't correspond
to a direct Request (i.e. from the web site itself). The best way
to detect the direct Requests... is an efficient configuration of
the Webalizer. When coming afterwards, one can only guess which requests
are local, a process that cannot be easily automated.
For example, gumicsizma.hu uses a special web address to display its
statistics (and six variants for it's web name). The same happens
for www.skynet.ie, marigold.cz. This can be detected by a comparison
between the effective address and the name given at the top of the
page. Another hint is the number of hits : a seven digits number denotes
quite necessarily an internal reference.
In any way, a quite efficient selection can be done using the Internet
names, that can be corrected using the ip addresses when one detects
something special. A pre-processing is needed to correct some "errors"
appearing in the 7181 lines collected, namely :
|
| web site are designated by a numerical (ip) address. This is, for
example, the case with many Google sites with an ip address ending
by 104. This defect can be solved using resolveip... and
a direct translation for the remaining cases. |
|
|
| lines, corresponding to four web sites named hotel*.go.ro, were
starting by http://http://. We have not discover how that
happens, only fixed the result. |
|
After these corrections, let us consider the ip addresses
of the referring sites.
|
| different web names, some of them appearing twice (with and without
the www prefix). This could be the result of different configurations
of the Webalizer program. |
|
|
| of this web names appear as alive and can be resolved into an ip address. |
|
|
| different sites can be found (collecting sites having the same ip,
and adding the 13 not resolved web names). |
|
Let us recall that our figures have been obtained from the Referrer
tables of quite 300 web sites across the world. If we consider separately
the Referrers occurring at most twice, and the others, we obtain :
|
| Referrers are occurring three times or more, generating quite all
the lines (6796 among 7106). |
|
|
| Referrers are occurring at most twice, leading to 310 lines. Roughly,
one per site. |
|
Work in progress.
Sorted by ip, by number of lines, by number of names and by hits per
line.
http://www.douillet.info/~douillet/ansecpb/ring_01/final_table_lin.html
Previous: 2 Description of the
Up: Two Attacks against the
Next: 4 Infestation of the
douillet@ensait.fr
2005-02-25