Introduction to website mirrors / useless RSS
LinkWhat's new ?
Date of publication: Sat, 08 Jan 2005 18:45:25 GMT
Last update: Sat, 13 Jan 2024 09:11:57 GMT
Site update
LinkThe page about bot traps has been updated.
Date of publication: Sat, 13 Jan 2024 09:11:57 GMT
Site update
LinkThe site has been updated. It now uses PHP 7.4.3 and MySQL 8.0. Many bugs have been fixed, some new bugs are still to be fixed, the site is mainly visited by robots and spammers.
Date of publication: Sat, 27 Mar 2021 09:11:57 GMT
Statistics update
LinkThe User Agents of browsers only seen less than three times before July 1, 2011 have been deleted. User Agents that have not requested a page since October, 1 2009 have been deleted.
Date of publication: Tue, 1 Jul 2013 09:11:57 GMT
Spammers
It now uses PHP 7The site is now visited by hundreds of spammers. They usually request a page, then try to post an advertising link, sometimes using the same IP, sometimes changing IP. The list is diplayed in the page about webot traps.
LinkDate of publication: 08 Oct 2012 19:31:39 GMT
New contact page
LinkThe script used by the experimental contact page (tested since April) has replaced www.free.fr perl script which began to let spammers send a few messages.
Date of publication: Tue, 28 Aug 2007 09:53:37 GMT
Vulnerabilities interesting robots / botnets (continuation)
LinkThe page vulnerabilites.php being removed from Google and other search engines index, the number of botnet visits has dropped from 500 to 1000 a day in July to about 10 to 20 one month later.
Date of publication: Tue, 28 Aug 2007 09:48:33 GMT
Vulnerabilities interesting robots / botnets (continuation)
LinkThe page vulnerabilites.php being removed from Google and other search engines index, the number of botnet visits has been divided in 2 or 3.
Date of publication: Sun, 22 Jul 2007 19:30:32 GMT
BEP and BAC PRO
LinkHot potatoes exercises on BEP and BAC PRO examinations have been moved to http://danzcontrib.free.fr/index.php
Date of publication: Thu, 12 Jul 2007 09:54:42 GMT
Vulnerabilities interesting robots / botnets (continuation)
LinkThis site is being visited by a minimum of 500 robots trying to install malware every day (a minimum of 1000 requests from 300 different IPs. The user Agent string is libwww-perl most of the time but other User Agents such as IE 6 or Firefox ones are also used). The number of vulnerabilies is impressive. The number of scripts too. The requests are now split in two parts, the page requested and the name of the variable used. The addresses of the scripts are normally removed. Multiple slashes and subdirectories have also been removed.
Date of publication: Thu, 12 Jul 2007 09:46:19 GMT
IEAutodiscovery is a add-on for IE by DeskShare
LinkThis utility runs in the background when a user loads a page and looks for RSS feeds by prefetching all links on the page thus behaving like a bad-behaved robot...
Date of publication: Thu, 14 Jun 2007 19:38:54 GMT
Contact form being tested (continuation)
LinkThe contact form seems to work as expected. Actually, only one robot coming from India and managing sessions succeeded in sending one message. Since then, a line has been added to block any message containing the following string: [url
Date of publication: Wed, 13 Jun 2007 16:48:46 GMT
Contact form being tested
LinkA contact form from an abandoned project is being tested. It should limit SPAM and allow people blocked by my ISP to send messages.
Date of publication: Wed, 18 Apr 2007 20:46:34 GMT
Vulnerabilities interesting robots
LinkList of pages requested by robots in order to replace some parts of the site by a remote page.
Date of publication: Wed, 14 Feb 2007 17:15:40 GMT
Debugging a page
LinkAn introduction to the way of debugging a page using a css layout in online.
Date of publication: Fri, 25 Aug 2006 15:58:51 GMT
IE7 RC1 and new problems
LinkIE7 Beta 3 fixed a few bugs (standalone version), IE7 RC1 introduced new problems: the links are not active on the full width of a few menus (the linked one).
The issue has been fixed by adding:
#menugauche ul li a{display:inline-block;}
#menugauche ul li a{display:block;}
The problem has been fixed in almost all the pages by triggering hasLayout with inline-block and then setting the value back to block.
The standalone procedure for Beta 3 works for me.
Date of publication: Fri, 25 Aug 2006 16:10:06 GMT
Random UAs and IPs
LinkA list of IPs used by robots declaring a random User Agent String is now online. These robots are interested in pages containing "email" or "@" or "@" in their names or in their content and do not follow a redirection.
Date of publication: Fri, 21 Jul 2006 20:14:40 GMT
Microsoft Data Access Internet Publishing Provider DAV and REQUEST method
LinkThe User Agent string "Microsoft Data Access Internet Publishing Provider DAV" uses all sorts of requests (PUT, LOCK...) .
Date of publication: Mon, 12 Jun 2006 17:26:29 GMT
Google Analytics
LinkThe site has been using Google Analytics for a few days. The results are not different from the site stats for the visitors with javascript enabled. The precision for the cities is not very good.
Date of publication: Thu, 25 May 2006 10:45:45 GMT
Crash with ie 7 and div:first-letter
LinkA few pages using div:first-letter crashed the last msie7 beta 2 preview.
The instruction has been moved until it can be tested with ie 7.
Date of publication: Sat, 13 May 2006 19:57:01 GMT
Standalone IE7 preview beta 2
LinkYou can have a standalone version of IE7 preview beta and keep IE6 on your system.
1. Download IE7 preview using the link above.
2. Extract to a folder with 7zip or any uncompressing utility.
3. Delete the file named shlwapi.dll in this folder.
4. Create an empty file named iexplore.exe.local.
Now you can use this standalone copy by clicking on iexplore.exe.
To test conditional comments, replace IE by zIE in the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\Version Vector
The 3 standalone versions I have all show different and new bugs.
The last version (March 17, 2006) needs HasLayout for a span, still doesn't accept div:first-letter and p:first-letter in a few contexts etc...
The file styleie7.css tries to fix those problems.
Date of publication: Mon, 10 Apr 2006 15:01:01 GMT
request_method
LinkGooglebot has been using HEAD requests for indexed pages since today. It used to do that for sitemap.gz, 404 error and identity checks only.
Date of publication: Sat, 25 Mar 2006 15:45:31 GMT
IP and HTTP_X_FORWARDED_FOR
LinkThe site uses HTTP_X_FORWARDED_FOR and REMOTE_ADDR to find out the most probable IP of a visitor. A few explanations and the code are now online even if REMOTE_ADDR is fine for almost 100% of the hits.
Date of publication: Sun, 01 Jan 2006 15:30:11 GMT
request_method
LinkThe method used to request a page has been tested since March in order to know if it is a good way to detect a user of a "well behaved" bot.
As of today:
Send a HEAD request:
- Xenu, Powermarks, MS FrontPage 6.0, InternetSeer, PostFavorites, Yooda Checkbot, NuSearch Spider, PostFavorites, W3C-checklink/4.2 [4.20] libwww-perl/5.803, PeerFactor 404 crawler, PHP version tracker, GT::WWW/1.025
Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows 95)
- Mozilla/4.0 (compatible; Netcraft Web Server Survey)
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)for sitemap
- dcs
- Firefox extension LinkChecker
- a few log spammers
then, depending on the options chosen by their user, a GET if necessary:
- e-SocietyRobot, HTTrack, IRLbot, Speedy spider, sygol, Wget, Xenu
send a HEAD then a GET request:
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
- libwww-perl/5.803
Sometimes use HEAD :
- cfetch (depending on the user)
- Mozilla/5.0 (Windows; U; Windows NT 5.0; fr-FR; rv:1.7.10) Gecko/20050717 Firefox/1.0.6 from a Yahoo proxy (proxy3.search.scd.yahoo.net) before a Get to translate the page
- Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) (but can also use GET)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) (from dirt*.nescape.com)
- a few empty UAs
"Forged" UA used by a few robots detected by the request method :
- Mozilla/4.5 [en] (Win98; I)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
- Mozilla/5.0 (Windows; U; Windows NT 5.1; fr-FR; rv:1.7.6) Gecko/20050318 Firefox/1.0.2
- Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3.1) Gecko/20030425 (Dmoz link checker)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) C2DB
- Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; COM+ 1.0.2204)
UA used by humans wrongly detected by the request method :
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050920 MultiZilla/1.8.1.0n
*************
Notes e-SocietyRobot sends HEAD requests to find if index.html, default.html... exist if they are not sent by default.
Date of publication: Wed, 16 Nov 2005 13:54:11 GMT
Googlebot cache and CSS
LinkIt seems that Googlebot uses a CSS rendering for its cache that doesn't support height:1% (hack for msie).
Some of the cache problems are now fixed.
A test with MSIE 6 style sheets may fix the other problems for the next visit.
Date of publication: Tue, 08 Nov 2005 21:15:39 GMT
Using SWFRIP to extract links from a flash file and uncompress it
LinkSWFRIP can replace swf2html when you have to extract the links from a Flash file to complete a mirror.
How to use SWFRIP is online.
Date of publication: Sun, 06 Nov 2005 10:09:28 GMT
Redirection 301 and indexing...
LinkRequests from many proxies do not follow a 301 redirection.
Thus 10% of the requests coming from forwarded IPs are not redirected.
The majority of these hits come from robots (spam and link checking with forged User Agents).
Date of publication: Sun, 06 Nov 2005 10:09:21 GMT
Using conditional comments rather than hacks
LinkThe page http://danzcontrib2.free.fr/en/pagecss6b.php has been modified (http://danzcontrib2.free.fr/en/pagecss6bie.php) so that it uses conditional comments for MSIE.
The page http://css.tests.free.fr/en/hacks2.php about conditional comments for MSIE has been added.
Date of publication: Thu, 27 Oct 2005 14:39:40 GMT
Using conditional comments rather than hacks
LinkThe page http://css.tests.free.fr/pagecss6h.php has been modified so that it uses conditional comments for MSIE.
Explanations will come soon.
Date of publication: Sun, 16 Oct 2005 20:26:36 GMT
rss editor 0.0.9 and the feed stylesheet
Linkrss editor allows a stylesheet for the rss feed.
A lot of rendering differences between MSIE and Firefox/Opera (mostly because of the HTML tags that I use) justify the HTML version of the feed.
The stylesheet is named rss.css
Date of publication: Thu, 13 Oct 2005 17:21:18 GMT
request_method
LinkPeerFactor 404 crawler seems to be a forum dead link checker.
Actually it must have followed a link from a forum. They pretend here that you may earn money by fighting against content piracy.
http://www.peerfactor.info/
http://web14.viux.com/retspan-info/
Date of publication: Sun, 09 Oct 2005 17:46:51 GMT
Google Desktop and robot activity
LinkGoogle Desktop activity (series of 3 requests in 5 to 7 minutes followed by requests every 30 minutes and 1 second) should appear now once per IP every 48 hours.
Date of publication: Sun, 09 Oct 2005 15:15:05 GMT
Redirection 301 and indexing...
LinkBlogline robot does not follow a 301 redirection any more.
Date of publication: Mon, 03 Oct 2005 17:44:59 GMT
Who asks for robots.txt?
LinkA list of robots User Agent strings asking for robots.txt before crawling the site css.tests.free.fr is online. I started to collect the UAs yesterday. It is automatically updated.
Date of publication: Wed, 28 Sep 2005 20:06:03 GMT
Redirection 301 and indexing... Updates
LinkA few updates:
Robots that did not follow the redirection:
- in php and HTML : boitho, Spurlbot, Voilabot, ia_archiver, cfetch, cometsystems, IP*Works, Szukacz and antibot.
- in php only: Bloglines, grub, Xenu, envolk, almaden, larbin, FAST, a few log spammers and email grabbers
Robots that updated the link:
- msn, ZyBorg, WebSense, EasyDL and IRLBot
The number of robots still visiting the previous URL is increasing in spite of the redirection, it is above the average of 45 a day.
A 301 Redirection is not correctly handled by search engines : 7 months and 80% of the visitors are still redirected from the previous address.
As access and connection problems are frequent, the site disappears from Google directory from time to time and the number of visitors is divided by 3 or 4 until the next deep crawl (about 1 month later).
Date of publication: Sat, 24 Sep 2005 16:58:53 GMT
Strange robot : IEAutoDiscovery
LinkThis webbot or utility triggered all the protections against email-grabbers, followed the redirection from the old site, modified the variables in the URLs to get the protected pages and did not ask for robots.txt (During its first visit, it tried to grab 41 pages in 1 minute 10).
Date of publication: Sat, 24 Sep 2005 16:42:26 GMT
Detection of the visitor's country
LinkThe script detecting the visitor's country has been modified.
It relied on ip-to-country except for aol, tele2 and chello.
An increasing number of Africans, Israelis and aol users made the statistics more and more unreliable. (less than 90%)
The script, inspired by the one of the other site, still uses ip-to-country, but after using the domain. If no answer is given, the service hostip.info is used before the accepted language and the UA language.
The testing period has been reduced because connecting to MySQL has been impossible for a few hours.
Date of publication: Mon, 15 Aug 2005 17:08:01 GMT
Getting and displaying information about the visitor
LinkMost of the time, visitors are looking for a specific answer, a script allowing tracking and logging is now online on a specific site.
Date of publication: Sat, 16 Jul 2005 08:28:03 GMT
Dynamic CSS, using cookies and PHP
LinkA few explanations on a way to change a style sheet with PHP and cookies (or a form or variables) are now online in the site about CSS.
The same page is also in this site
Date of publication: Sat, 09 Jul 2005 08:10:03 GMT
Setting an RSS Feed
LinkA page with a few explanations on how to implement an RSS feed and how to parse the xml to offer a HTML version is online.
Date of publication: Mon, 27 Jun 2005 13:28:08 GMT
LTH Browser 3.02a / www.learntohack.nil / learntohack.org/
LinkThe challenge organized by learntohack.org where you have to send a fake User Agent LTH Browser 3.02a (browser that doesn't exist and that you can't download) and www.learntohack.nil as referrer (the whole looks like a SPAM technique) has found candidates.
User Agent Switcher (to control all the parameters) and Multizilla to spoof the referrer (with a few changes in prefs.js) should allow to succeed (I did not test).
Date of publication: Thu, 16 Jun 2005 06:50:20 GMT
Cerberian Drtrs Version-3.2-Build-0
Link
Cerberian Drtrs Version-3.2-Build-0 announced as a proxy for content control set by parents seems to be also used as a link checker (not efficient).
It is identified as Mozilla 4 - Mozilla/4.0 (compatible; Cerberian Drtrs Version-3.2-Build-0) - and gets a page which is not adapted to a recent browser.
Date of publication: Wed, 15 Jun 2005 17:14:09 GMT
Redirection 301 and indexing...
LinkUpdates:
Robots that updated the link:
- Powermarks, ichiro (from dmoz), REBOL View (from dmoz), Pompos, Nutch.
Robots that did not follow the redirection:
- in php and HTML : Baiduspider, Nusearch Spider, larbin, MnoGoSearch, Indy Library, psbot.
Date of publication: Sat, 11 Jun 2005 07:36:07 GMT
HTML version of the RSS feed
LinkA HTML version of the RSS feed is now linked to the site map.
Date of publication: Sun, 15 May 2005 20:05:05 GMT
Redirection 301 and indexing
LinkAfter a month, the results:
Google represents 90% of the referrers of the site. Disappearing from the lists divided the number of visits by 10.
Following Google recommendations do not help.
Once I removed the redirection for Google, the site reappeared in the lists and visitors (the patient ones as the previous server is really slow or down) redirected to the new site (60% as for today). However, the number of visitors has been divided by 2.
All the other bots did not update their links or did not even accept the redirection:
- php and HTML: Websense, boitho, Spurlbot, Voilabot, Xenu, grub, ia_archiver, antibot and the log spammers (but they will come back soon!)
- php: Bloglines and email grabbers
Updated the link:
- msn, zyborg
The number of robots still visiting the previous URL fell from 35 to 20 a day.
Changing domain name must be carefully prepared if the number of visitors depend on search engines.
Having incoming links updated is difficult as contacting webmasters is not always easy.
Date of publication: Sun, 15 May 2005 20:25:47 GMT
RSS feed and webbots
Link3 robots are interested in this page:
Bloglines (RSS feed in French only)
newsg8 (RSS feed in French only)
Jakarta Commons-HttpClient/3.0-rc1 (RSS feed in English only)
Date of publication: Mon, 25 Apr 2005 15:32:10 GMT
Indexing and redirection
LinkMoving the site and having a redirection to the new URL divided the number of vistors by 5.
Robots are still coming but as I disappeared from Google, I only see few new visitors.
Next time, I will only redirect visitors and wait for the apparition of the new site in the searches before redirecting bots...
Date of publication: Mon, 25 Apr 2005 15:35:59 GMT
New address for the site
LinkAs the server danzcontrib.free.fr is almost down most of the time, the site has been moved to http://danzcontrib2.free.fr/.
Date of publication: Sat, 09 Apr 2005 08:24:34 GMT
A part of the site has moved
LinkAs the server danzcontrib.free.fr is almost down most of the time, the pages about CSS are on their way to css.tests.free.fr.
Date of publication: Tue, 05 Apr 2005 17:38:23 GMT
sophisticated SPAM
LinkA robot grabs the page from an address (69.93.108.202 - USA for the last one spotted), transfers it towards other servers (Uruguay and Spain in this case) where a complete email is inserted into the form with a CarbonCopy and sent.
These messages arrive with a wrong MIME type and I cannot read them!!!
Date of publication: Thu, 24 Mar 2005 21:36:32 GMT
Poker online and spam
LinkOnline poker sites use the gamblers to spam log files : The gamblers open (but it's behind their backs - iframes ?) indexed pages, and the site is bombarded by requests from many countries, IPs, User Agents with the gambling site as referrer...
Date of publication: Wed, 16 Mar 2005 11:16:47 GMT
log spam and redirections
LinkThe confirmation asked last month allowed me to notice that most robots (with forged UAs an/or referrers) do not download anything (request HEAD), they just pollute the logs with their modified UAs and log spam referrers...
Date of publication: Wed, 02 Feb 2005 20:40:03 GMT
email grabbers are asked for a confirmation
LinkVisitors asking for pages with "@" or the word "email" or "mailto" with a random or a fake UA and no referrer are now redirected to a page where they can confirm so that webbots do not grab the pages anymore.
Date of publication: Sat, 08 Jan 2005 18:39:10 GMT
lwp now redirected
LinkThe large number of abnormal requests by the utility lwp from many IP are now redirected.
Date of publication: Mon, 27 Dec 2004 09:53:55 GMT
Unidentified robots redirected
Linkunidentified robots or missing referrer redirect useragents.php?tous=1 (780 Ko) to useragents_select.php
Fixed Netscape 4 crash when it encountered <td><div>...</div></td>
Date of publication: Tue, 14 Dec 2004 21:48:26 GMT