Website Mirrors
This page links to a few examples of site copies with problems (using the default options gives good results with most mirrors once the browser identity has been set - if your problem is not listed below, have a look at HTTrack help page). These issues, due to a recent visit, WinHTTrack (or other website copiers) limitations, site conception or authors wanting to protect their work, can sometimes be fixed.
When sites are protected, you can try later. As robots are used by search engines to index pages, protections and limitations are often removed.
The examples should allow you to determine if a website using PHP (Personal Home Page or Hypertext Preprocessor - .php), Perl (.pl), CGI (Common Gateway Interface - .cgi), ColdFusion (.cfm), Active Server Pages (.asp), java (.class), javascript (.js), CSS (.css or .htc) and Flash (.swf and .dir) can be mirrored.
As I couldn't find how to use Netscape, Konqueror or Mozilla cache, I used the Internet Explorer cache to get the MIME type or fetch missing, corrupted or filtered files as MSIE stores by default all the files in the folder Temporary Internet Files.
But since the 30/09/2007, the utility MozillaCacheView or the add-on CacheViewer can be used as they allow you to read and extract the files in the cache of Firefox or Mozilla.
- Use the Internet Explorer cache if you have a gray box.
- Find for each "option value" the name of the file given by WinHTTrack, then replace the value by the file name and use the function described here after naming or renaming the form.
* Javascript, flash, php, asp and cfm often prevent the capture of a whole site.
- When a capture doesn't work if a website hasn't been visited recently or on another PC, you can use the Internet Explorer cache.
- For menus, add or modify functions.
- For image galleries, use Temporary Internet Files.
- If external js (or css, htc) files are missing, add the file names in the Web Addresses to mirror. Convert absolute links into relative links (if the version of Httrack you use did not do it).
* Javascript, flash, php, asp and cfm often prevent the capture of a whole site.
If files are still missing, here are a few methods:
1. Visit the missing pages. Then use the Internet Explorer cache (Temporary Internet Files) to find the missing files, copy them in the local site and delete the figure between brackets ([1]) added by the explorer.
2. After downloading a swf file, note if links are absolute or relative in the address bar of your browser.
If they are absolute and the Flash file protected or compressed, then the capture will be incomplete unless you can save them uncompressed with SWFRIP.
If links are relative, copy the names of URLs called and add them in the scan rules.
If the animation calls .asp, .php or .cfm files which exist in the mirror with a html extension,
- copy the file and rename it with an asp, php or cfm extension.
- or write a file in HTML with an asp, php or cfm extension which will redirect to the mirror html file.
It extracts some php and html links (use option -s "txt|js|php|any_extension" between outputfilename and inputfilename)- for asp or cfm use method 1 and/or 4 if the links are not extracted.
The html file produced gives the list of the links (Command-line: swf2html.exe -o outputfilename.html inputfilename.swf).
If they are relative, add them to the scan rules or Web addresses (with full path name). When the extension is not html or htm, use method 4 after the capture.
If they are absolute, add them and modify the links in the Flash file with method 4.
When the links do not exist (example Ratanga), use method 1 or 2.
4. A daredevil can use a hex-editor.
If links are relative, open the swf file, search for .htm, .html, .asp, .php and .cfm. Then, modify extensions if the files are in the capture, note file names to add them in the scan rules...
If they are absolute, replace them with relative links if there is enough room (replace unwanted characters by spaces). Alternatively, you can redirect towards a html file that will redirect to the captured link.
- Optimized and protected .swf files do not show the links. If saving them uncompressed with SWFRIP does not allow to modify them, copy the .htm files of the capture and rename them with the extension found in the Internet Explorer cache. A bit of luck, and...
- For image galleries, use Temporary Internet Files.
- Select "no robots.txt rules".
- Sometimes, authors give wrong extensions.
- Use filters.
- There are a few methods to get rid of ads and banners blocking page display or forcing you to click when you want to read another page.