previous up next
Previous: 2 Description of the Up: Two Attacks against the Next: 4 Infestation of the

Subsections


3 Infestation of the January Usage pages

3.1 Collect of the usage pages

Let us now consider the January usage page of all the Webalizer sites found among the 496 answers to our Google query. Collected together, all the Webalizer pages found in the preceding section were belonging to 311 different directories. alg:Collect describes a php script that allows a parallel call to all the usage_200501.html pages of these directories (a collect using a for loop will result into a serial processing, and a too long delay).


maths

Undertaken the 03/02/2005, this collect has conducted to :


004
pages not available (while the web site was still responding) ;
002
restricted pages. Namely one "forbidden (404)" and one password protected ;
305
ordinary January "Usage Statistic Pages".

3.2 Collect of the Referrers tables

A standard Webalizer monthly page contains a Referrer table that list the top referrers of the site, i.e. the sites whose consultation has resulted to a jump to the site under study. Among the 305 usage pages, we have found :


003
pages without Referrer table (the Webalizer program can be configured to disallow the appearance of this table). This can be checked by the non appearance of the string "#TOPREFS" in the page ;
292
pages with a regular Referrer table, the presence of which can be checked by the appearance of the string "Total Referrers" ;
010
pages without a regular Referrer table, but with a link to this table in the top line of the page.

In our opinion these ten strange pages could be the result of a manual destruction of the Referrer table in response of the infestation we are describing in this section.

Among the 292 pages displaying a Referrer table, there were :


010
pages with 35 referrers or more, the maximum being 100
235
pages with 30 referrers (the Webalizer standard)
031
pages with 20 referrers
016
pages with 10 or less referrers, the minimum being 1

Using two recognizable tags, the Referrer tables themselves can be extracted from the usage pages. alg:Table-ref describe how to proceed.


maths

A rough diagnosis about the infestation cross over the whole Internet can be obtained by organizing these collected tables in a "ring" as we have done at :
http://www.douillet.info/~douillet/ansecpb/ring_01/

3.3 Extracting and sorting

The next step deals with what we call the Referrer lines. These lines are the entries of one or another Referrer table that doesn't correspond to a direct Request (i.e. from the web site itself). The best way to detect the direct Requests... is an efficient configuration of the Webalizer. When coming afterwards, one can only guess which requests are local, a process that cannot be easily automated.

For example, gumicsizma.hu uses a special web address to display its statistics (and six variants for it's web name). The same happens for www.skynet.ie, marigold.cz. This can be detected by a comparison between the effective address and the name given at the top of the page. Another hint is the number of hits : a seven digits number denotes quite necessarily an internal reference.

In any way, a quite efficient selection can be done using the Internet names, that can be corrected using the ip addresses when one detects something special. A pre-processing is needed to correct some "errors" appearing in the 7181 lines collected, namely :


017
web site are designated by a numerical (ip) address. This is, for example, the case with many Google sites with an ip address ending by 104. This defect can be solved using resolveip... and a direct translation for the remaining cases.
061
lines, corresponding to four web sites named hotel*.go.ro, were starting by http://http://. We have not discover how that happens, only fixed the result.

After these corrections, let us consider the ip addresses of the referring sites.


618
different web names, some of them appearing twice (with and without the www prefix). This could be the result of different configurations of the Webalizer program.
595
of this web names appear as alive and can be resolved into an ip address.
403
different sites can be found (collecting sites having the same ip, and adding the 13 not resolved web names).

Let us recall that our figures have been obtained from the Referrer tables of quite 300 web sites across the world. If we consider separately the Referrers occurring at most twice, and the others, we obtain :


142
Referrers are occurring three times or more, generating quite all the lines (6796 among 7106).
253
Referrers are occurring at most twice, leading to 310 lines. Roughly, one per site.

3.4 Who are these "ubiquitous" sites ?

Work in progress.

Sorted by ip, by number of lines, by number of names and by hits per line.

http://www.douillet.info/~douillet/ansecpb/ring_01/final_table_lin.html


previous up next
Previous: 2 Description of the Up: Two Attacks against the Next: 4 Infestation of the


douillet@ensait.fr
2005-02-25