A website copy with HTTrack

httrack

Tested with version WinHTTrack Website Copier 3.22 & 3.33-RC5 (+swf)

Ratanga November 2002 / January 2005

project name: Ratanga
Web(URL) address: www.ratanga.co.za
amount of time: 10 minutes (56k modem)
in the scan rules, add:
+www.ratanga.co.za/rates.php
+www.ratanga.co.za/ridesattractions.php
+www.ratanga.co.za/restaurantsretail.php
+www.ratanga.co.za/functionsevents.php
+www.ratanga.co.za/playground.php
+www.ratanga.co.za/contactus.php
+www.ratanga.co.za/ratangarangersclub.php
+www.ratanga.co.za/location.php
+www.ratanga.co.za/rigthtop2.htm

problems:

Javascript, Flash, PHP, site completely redesigned each year...

Other examples with similar difficulties: Discovery Cove | The Engine Room | Wild Waters Park | Rapids Water Park | Gulliver's Theme Park | Camelot Theme Park

solutions:

This site is redesigned each year. It was in HTML, then partly rewritten in Javascript and this year the requirements are displayed in the home page: Flash installed and a recent version Internet Explorer.
The menu is in Flash, it calls php files, animations are in Flash, the style sheets in CSS 2 and several routines in Javascript manage the links and the opening of windows.
After the capture, even if you add the missing file names in the scan rules, you can't browse offline.
In November, the site was not finished yet, but the method to download the missing the files is described below.

1. How to find the links to add to the scan rules:

During the capture with the default options, WinHTTrack examines the menu, reads the API and tries to download the files called. Here, the file names do not correspond to those which are in the site: function&events.php instead of functionevents.php, etc...
A way to find the file names, when they are not displayed in the address or status bar, is to visit the site with Internet Explorer, click on all the links and wait until the page is entirely loaded. In the cache (Temporary Internet Files), you can see something like:
Temporary Internet Files
Another solution is to use SWFRIP. For themenu.swf we will find in the file actions.txt:
getURL(ridesattraction.htm,ridesattraction.htm);
getURL(rates.htm,rates.htm);
getURL(functionevents.htm,functionevents.htm);
getURL(restretail.htm,restretail.htm);
getURL(location.htm,location.htm);
getURL(contacts.htm,contacts.htm);
Add the missing file names in the scan rules.
Then WinHTTrack downloads the pages and gives them a html extension.
As the swf file cannot be modified, even with a hex editor, you have to copy the files, rename them with a php extension, so that, in the local site, you have the same file with .php, .html and sometimes .htm.

2. How to reach the windows opened by Javascript:

In the files downloaded and renamed with a php extension, the links on images opened by window.open generate an error when they are called by main.htm. The simplest is that you modify php files so that they become a redirection to the html file with the same name:

in the file ridesattractions.php, add <meta http-equiv="refresh" content="0;URL=ridesattractions.html"> before the tag </head>, do the same with the other php and offline browsing is now possible with Internet Explorer.
For other browsers, you can modify all the occurrence of window.open:
in location.htm, replace
<A href="#"><IMG onclick="window.open('map.htm','Map','status=yes,width=786,height=590')" height=110 src="../../images/map.gif" width=150 border=0>
with <a href="map.htm"><img src="../../images/map.gif">.
Do the same in all the files of the local site and offline browsing is now possible with a few navigators.

You can also redirect and modify!
topTop of the page

With javascript

W3C XHTML 1.0
W3C CSS