A website copy with HTTrack
Areaparks October 2002
project name: areaparksWeb(URL) address: www.areaparks.com
do not tick: DOS names or ISO9660 names
amount of time: 10 hours (56k modem)
in the scan rules, add:
+*[name].areaparks.com/*
To limit the size of the capture, add in the scan rules
-ad.doubleclick.net/* -ad2.doubleclick.net/
-aeraguides.webbanners.net/*
-*.exe -*.zip
-forums.*
-*/spyellow/* (this part of the site can be mirrored later))
problems:
Ads and huge websiteOther examples with similar difficulties: Marian High | Kakadu | Travel West
solutions:
In the end of the capture, find with inforapid search and replace for example, all the files with the string http:// and replace it with #.When browsing online, you cannot load missing pages or look at ads.
When browsing the website offline, no external page is called and no sponsor image displayed, but you don't have to click 4 or 5 times when changing a page.
or
Find with inforapid search and replace for example, all the files with this string: src="http:// and replace it with src="#http://. Then replace window.open with windowopen or any instruction the javascript interpreter doesn't know.When browsing, no ad is displayed and you don't have to click 4 or 5 times when changing a page.