HTML version of the RSS feed

Introduction to website mirrors / useless RSS

Link

What's new ?

Date of publication: Sat, 08 Jan 2005 18:45:25 GMT

Last update: Tue, 1 Jul 2013 09:11:57 GMT

Statistics update

Link

The User Agents of browsers only seen less than three times before July 1, 2011 have been deleted. User Agents that have not requested a page since October, 1 2009 have been deleted.

Date of publication: Tue, 1 Jul 2013 09:11:57 GMT

Spammers

The site is now visited by hundreds of spammers. They usually request a page, then try to post an advertising link, sometimes using the same IP, sometimes changing IP. The list is diplayed in the page about webot traps.

Link

Date of publication: 08 Oct 2012 19:31:39 GMT

New contact page

Link

The script used by the experimental contact page (tested since April) has replaced www.free.fr perl script which began to let spammers send a few messages.

Date of publication: Tue, 28 Aug 2007 09:53:37 GMT

Vulnerabilities interesting robots / botnets (continuation)

Link

The page vulnerabilites.php being removed from Google and other search engines index, the number of botnet visits has dropped from 500 to 1000 a day in July to about 10 to 20 one month later.

Date of publication: Tue, 28 Aug 2007 09:48:33 GMT

Vulnerabilities interesting robots / botnets (continuation)

Link

The page vulnerabilites.php being removed from Google and other search engines index, the number of botnet visits has been divided in 2 or 3.

Date of publication: Sun, 22 Jul 2007 19:30:32 GMT

BEP and BAC PRO

Link

Hot potatoes exercises on BEP and BAC PRO examinations have been moved to http://danzcontrib.free.fr/index.php

Date of publication: Thu, 12 Jul 2007 09:54:42 GMT

Vulnerabilities interesting robots / botnets (continuation)

Link

This site is being visited by a minimum of 500 robots trying to install malware every day (a minimum of 1000 requests from 300 different IPs. The user Agent string is libwww-perl most of the time but other User Agents such as IE 6 or Firefox ones are also used). The number of vulnerabilies is impressive. The number of scripts too. The requests are now split in two parts, the page requested and the name of the variable used. The addresses of the scripts are normally removed. Multiple slashes and subdirectories have also been removed.

Date of publication: Thu, 12 Jul 2007 09:46:19 GMT

IEAutodiscovery is a add-on for IE by DeskShare

Link

This utility runs in the background when a user loads a page and looks for RSS feeds by prefetching all links on the page thus behaving like a bad-behaved robot...

Date of publication: Thu, 14 Jun 2007 19:38:54 GMT

Contact form being tested (continuation)

Link

The contact form seems to work as expected. Actually, only one robot coming from India and managing sessions succeeded in sending one message. Since then, a line has been added to block any message containing the following string: [url

Date of publication: Wed, 13 Jun 2007 16:48:46 GMT

Contact form being tested

Link

A contact form from an abandoned project is being tested. It should limit SPAM and allow people blocked by my ISP to send messages.

Date of publication: Wed, 18 Apr 2007 20:46:34 GMT

Vulnerabilities interesting robots

Link

List of pages requested by robots in order to replace some parts of the site by a remote page.

Date of publication: Wed, 14 Feb 2007 17:15:40 GMT

Debugging a page

Link

An introduction to the way of debugging a page using a css layout in online.

Date of publication: Fri, 25 Aug 2006 15:58:51 GMT

IE7 RC1 and new problems

Link

IE7 Beta 3 fixed a few bugs (standalone version), IE7 RC1 introduced new problems: the links are not active on the full width of a few menus (the linked one).
The issue has been fixed by adding:
#menugauche ul li a{display:inline-block;}
#menugauche ul li a{display:block;}
The problem has been fixed in almost all the pages by triggering hasLayout with inline-block and then setting the value back to block.
The standalone procedure for Beta 3 works for me.

Date of publication: Fri, 25 Aug 2006 16:10:06 GMT

Random UAs and IPs

Link

A list of IPs used by robots declaring a random User Agent String is now online. These robots are interested in pages containing "email" or "@" or "@" in their names or in their content and do not follow a redirection.

Date of publication: Fri, 21 Jul 2006 20:14:40 GMT

Microsoft Data Access Internet Publishing Provider DAV and REQUEST method

Link

The User Agent string "Microsoft Data Access Internet Publishing Provider DAV" uses all sorts of requests (PUT, LOCK...) .

Date of publication: Mon, 12 Jun 2006 17:26:29 GMT

Google Analytics

Link

The site has been using Google Analytics for a few days. The results are not different from the site stats for the visitors with javascript enabled. The precision for the cities is not very good.

Date of publication: Thu, 25 May 2006 10:45:45 GMT

Crash with ie 7 and div:first-letter

Link

A few pages using div:first-letter crashed the last msie7 beta 2 preview.
The instruction has been moved until it can be tested with ie 7.

Date of publication: Sat, 13 May 2006 19:57:01 GMT

Standalone IE7 preview beta 2

Link

You can have a standalone version of IE7 preview beta and keep IE6 on your system.
1. Download IE7 preview using the link above.
2. Extract to a folder with 7zip or any uncompressing utility.
3. Delete the file named shlwapi.dll in this folder.
4. Create an empty file named iexplore.exe.local.
Now you can use this standalone copy by clicking on iexplore.exe.
To test conditional comments, replace IE by zIE in the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\Version Vector
The 3 standalone versions I have all show different and new bugs.
The last version (March 17, 2006) needs HasLayout for a span, still doesn't accept div:first-letter and p:first-letter in a few contexts etc...
The file styleie7.css tries to fix those problems.

Date of publication: Mon, 10 Apr 2006 15:01:01 GMT

request_method

Link

Googlebot has been using HEAD requests for indexed pages since today. It used to do that for sitemap.gz, 404 error and identity checks only.

Date of publication: Sat, 25 Mar 2006 15:45:31 GMT

IP and HTTP_X_FORWARDED_FOR

Link

The site uses HTTP_X_FORWARDED_FOR and REMOTE_ADDR to find out the most probable IP of a visitor. A few explanations and the code are now online even if REMOTE_ADDR is fine for almost 100% of the hits.

Date of publication: Sun, 01 Jan 2006 15:30:11 GMT

request_method

Link

The method used to request a page has been tested since March in order to know if it is a good way to detect a user of a "well behaved" bot.
As of today:
Send a HEAD request:
- Xenu, Powermarks, MS FrontPage 6.0, InternetSeer, PostFavorites, Yooda Checkbot, NuSearch Spider, PostFavorites, W3C-checklink/4.2 [4.20] libwww-perl/5.803, PeerFactor 404 crawler, PHP version tracker, GT::WWW/1.025
Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows 95)
- Mozilla/4.0 (compatible; Netcraft Web Server Survey)
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)for sitemap
- dcs
- Firefox extension LinkChecker
- a few log spammers
then, depending on the options chosen by their user, a GET if necessary:
- e-SocietyRobot, HTTrack, IRLbot, Speedy spider, sygol, Wget, Xenu
send a HEAD then a GET request:
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
- libwww-perl/5.803
Sometimes use HEAD :
- cfetch (depending on the user)
- Mozilla/5.0 (Windows; U; Windows NT 5.0; fr-FR; rv:1.7.10) Gecko/20050717 Firefox/1.0.6 from a Yahoo proxy (proxy3.search.scd.yahoo.net) before a Get to translate the page
- Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) (but can also use GET)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) (from dirt*.nescape.com)
- a few empty UAs
"Forged" UA used by a few robots detected by the request method :
- Mozilla/4.5 [en] (Win98; I)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
- Mozilla/5.0 (Windows; U; Windows NT 5.1; fr-FR; rv:1.7.6) Gecko/20050318 Firefox/1.0.2
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030425 (Dmoz link checker)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) C2DB
- Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; COM+ 1.0.2204)
UA used by humans wrongly detected by the request method :
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050920 MultiZilla/1.8.1.0n
*************
Notes e-SocietyRobot sends HEAD requests to find if index.html, default.html... exist if they are not sent by default.

Date of publication: Wed, 16 Nov 2005 13:54:11 GMT

Googlebot cache and CSS

Link

It seems that Googlebot uses a CSS rendering for its cache that doesn't support height:1% (hack for msie).
Some of the cache problems are now fixed.
A test with MSIE 6 style sheets may fix the other problems for the next visit.

Date of publication: Tue, 08 Nov 2005 21:15:39 GMT

Using SWFRIP to extract links from a flash file and uncompress it

Link

SWFRIP can replace swf2html when you have to extract the links from a Flash file to complete a mirror.
How to use SWFRIP is online.

Date of publication: Sun, 06 Nov 2005 10:09:28 GMT

Redirection 301 and indexing...

Link

Requests from many proxies do not follow a 301 redirection.
Thus 10% of the requests coming from forwarded IPs are not redirected.
The majority of these hits come from robots (spam and link checking with forged User Agents).

Date of publication: Sun, 06 Nov 2005 10:09:21 GMT

Using conditional comments rather than hacks

Link

The page http://danzcontrib2.free.fr/en/pagecss6b.php has been modified (http://danzcontrib2.free.fr/en/pagecss6bie.php) so that it uses conditional comments for MSIE.
The page http://css.tests.free.fr/en/hacks2.php about conditional comments for MSIE has been added.

Date of publication: Thu, 27 Oct 2005 14:39:40 GMT

Using conditional comments rather than hacks

Link

The page http://css.tests.free.fr/pagecss6h.php has been modified so that it uses conditional comments for MSIE.
Explanations will come soon.

Date of publication: Sun, 16 Oct 2005 20:26:36 GMT

rss editor 0.0.9 and the feed stylesheet

Link

rss editor allows a stylesheet for the rss feed.
A lot of rendering differences between MSIE and Firefox/Opera (mostly because of the HTML tags that I use) justify the HTML version of the feed.
The stylesheet is named rss.css

Date of publication: Thu, 13 Oct 2005 17:21:18 GMT

request_method

Link

PeerFactor 404 crawler seems to be a forum dead link checker.
Actually it must have followed a link from a forum. They pretend here that you may earn money by fighting against content piracy.
http://www.peerfactor.info/
http://web14.viux.com/retspan-info/

Date of publication: Sun, 09 Oct 2005 17:46:51 GMT

Google Desktop and robot activity

Link

Google Desktop activity (series of 3 requests in 5 to 7 minutes followed by requests every 30 minutes and 1 second) should appear now once per IP every 48 hours.

Date of publication: Sun, 09 Oct 2005 15:15:05 GMT

Redirection 301 and indexing...

Link

Blogline robot does not follow a 301 redirection any more.

Date of publication: Mon, 03 Oct 2005 17:44:59 GMT

Who asks for robots.txt?

Link

A list of robots User Agent strings asking for robots.txt before crawling the site css.tests.free.fr is online. I started to collect the UAs yesterday. It is automatically updated.

Date of publication: Wed, 28 Sep 2005 20:06:03 GMT

Redirection 301 and indexing... Updates

Link

A few updates:
Robots that did not follow the redirection:
- in php and HTML : boitho, Spurlbot, Voilabot, ia_archiver, cfetch, cometsystems, IP*Works, Szukacz and antibot.
- in php only: Bloglines, grub, Xenu, envolk, almaden, larbin, FAST, a few log spammers and email grabbers
Robots that updated the link:
- msn, ZyBorg, WebSense, EasyDL and IRLBot
The number of robots still visiting the previous URL is increasing in spite of the redirection, it is above the average of 45 a day.
A 301 Redirection is not correctly handled by search engines : 7 months and 80% of the visitors are still redirected from the previous address.
As access and connection problems are frequent, the site disappears from Google directory from time to time and the number of visitors is divided by 3 or 4 until the next deep crawl (about 1 month later).

Date of publication: Sat, 24 Sep 2005 16:58:53 GMT

Strange robot : IEAutoDiscovery

Link

This webbot or utility triggered all the protections against email-grabbers, followed the redirection from the old site, modified the variables in the URLs to get the protected pages and did not ask for robots.txt (During its first visit, it tried to grab 41 pages in 1 minute 10).

Date of publication: Sat, 24 Sep 2005 16:42:26 GMT

Detection of the visitor's country

Link

The script detecting the visitor's country has been modified.
It relied on ip-to-country except for aol, tele2 and chello.
An increasing number of Africans, Israelis and aol users made the statistics more and more unreliable. (less than 90%)
The script, inspired by the one of the other site, still uses ip-to-country, but after using the domain. If no answer is given, the service hostip.info is used before the accepted language and the UA language.
The testing period has been reduced because connecting to MySQL has been impossible for a few hours.

Date of publication: Mon, 15 Aug 2005 17:08:01 GMT

Getting and displaying information about the visitor

Link

Most of the time, visitors are looking for a specific answer, a script allowing tracking and logging is now online on a specific site.

Date of publication: Sat, 16 Jul 2005 08:28:03 GMT

Dynamic CSS, using cookies and PHP

Link

A few explanations on a way to change a style sheet with PHP and cookies (or a form or variables) are now online in the site about CSS.
The same page is also in this site

Date of publication: Sat, 09 Jul 2005 08:10:03 GMT

Setting an RSS Feed

Link

A page with a few explanations on how to implement an RSS feed and how to parse the xml to offer a HTML version is online.

Date of publication: Mon, 27 Jun 2005 13:28:08 GMT

LTH Browser 3.02a / www.learntohack.nil / learntohack.org/

Link

The challenge organized by learntohack.org where you have to send a fake User Agent LTH Browser 3.02a (browser that doesn't exist and that you can't download) and www.learntohack.nil as referrer (the whole looks like a SPAM technique) has found candidates.
User Agent Switcher (to control all the parameters) and Multizilla to spoof the referrer (with a few changes in prefs.js) should allow to succeed (I did not test).

Date of publication: Thu, 16 Jun 2005 06:50:20 GMT

Cerberian Drtrs Version-3.2-Build-0

Link

Cerberian Drtrs Version-3.2-Build-0 announced as a proxy for content control set by parents seems to be also used as a link checker (not efficient).
It is identified as Mozilla 4 - Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0) - and gets a page which is not adapted to a recent browser.

Date of publication: Wed, 15 Jun 2005 17:14:09 GMT

Redirection 301 and indexing...

Link

Updates:
Robots that updated the link:
- Powermarks, ichiro (from dmoz), REBOL View (from dmoz), Pompos, Nutch.
Robots that did not follow the redirection:
- in php and HTML : Baiduspider, Nusearch Spider, larbin, MnoGoSearch, Indy Library, psbot.

Date of publication: Sat, 11 Jun 2005 07:36:07 GMT

HTML version of the RSS feed

Link

A HTML version of the RSS feed is now linked to the site map.

Date of publication: Sun, 15 May 2005 20:05:05 GMT

Redirection 301 and indexing

Link

After a month, the results:
Google represents 90% of the referrers of the site. Disappearing from the lists divided the number of visits by 10.
Following Google recommendations do not help.
Once I removed the redirection for Google, the site reappeared in the lists and visitors (the patient ones as the previous server is really slow or down) redirected to the new site (60% as for today). However, the number of visitors has been divided by 2.
All the other bots did not update their links or did not even accept the redirection:
- php and HTML: Websense, boitho, Spurlbot, Voilabot, Xenu, grub, ia_archiver, antibot and the log spammers (but they will come back soon!)
- php: Bloglines and email grabbers
Updated the link:
- msn, zyborg
The number of robots still visiting the previous URL fell from 35 to 20 a day.
Changing domain name must be carefully prepared if the number of visitors depend on search engines.
Having incoming links updated is difficult as contacting webmasters is not always easy.

Date of publication: Sun, 15 May 2005 20:25:47 GMT

RSS feed and webbots

Link

3 robots are interested in this page:
Bloglines (RSS feed in French only)
newsg8 (RSS feed in French only)
Jakarta Commons-HttpClient/3.0-rc1 (RSS feed in English only)

Date of publication: Mon, 25 Apr 2005 15:32:10 GMT

Indexing and redirection

Link

Moving the site and having a redirection to the new URL divided the number of vistors by 5.
Robots are still coming but as I disappeared from Google, I only see few new visitors.
Next time, I will only redirect visitors and wait for the apparition of the new site in the searches before redirecting bots...

Date of publication: Mon, 25 Apr 2005 15:35:59 GMT

New address for the site

Link

As the server danzcontrib.free.fr is almost down most of the time, the site has been moved to http://danzcontrib2.free.fr/.

Date of publication: Sat, 09 Apr 2005 08:24:34 GMT

A part of the site has moved

Link

As the server danzcontrib.free.fr is almost down most of the time, the pages about CSS are on their way to css.tests.free.fr.

Date of publication: Tue, 05 Apr 2005 17:38:23 GMT

sophisticated SPAM

Link

A robot grabs the page from an address (69.93.108.202 - USA for the last one spotted), transfers it towards other servers (Uruguay and Spain in this case) where a complete email is inserted into the form with a CarbonCopy and sent.
These messages arrive with a wrong MIME type and I cannot read them!!!

Date of publication: Thu, 24 Mar 2005 21:36:32 GMT

Poker online and spam

Link

Online poker sites use the gamblers to spam log files : The gamblers open (but it's behind their backs - iframes ?) indexed pages, and the site is bombarded by requests from many countries, IPs, User Agents with the gambling site as referrer...

Date of publication: Wed, 16 Mar 2005 11:16:47 GMT

log spam and redirections

Link

The confirmation asked last month allowed me to notice that most robots (with forged UAs an/or referrers) do not download anything (request HEAD), they just pollute the logs with their modified UAs and log spam referrers...

Date of publication: Wed, 02 Feb 2005 20:40:03 GMT

email grabbers are asked for a confirmation

Link

Visitors asking for pages with "@" or the word "email" or "mailto" with a random or a fake UA and no referrer are now redirected to a page where they can confirm so that webbots do not grab the pages anymore.

Date of publication: Sat, 08 Jan 2005 18:39:10 GMT

lwp now redirected

Link

The large number of abnormal requests by the utility lwp from many IP are now redirected.

Date of publication: Mon, 27 Dec 2004 09:53:55 GMT

Unidentified robots redirected

Link

unidentified robots or missing referrer redirect useragents.php?tous=1 (780 Ko) to useragents_select.php
Fixed Netscape 4 crash when it encountered <td><div>...</div></td>

Date of publication: Tue, 14 Dec 2004 21:48:26 GMT