ICE v1.5 beta3 rev2


Try it first!

Then download the files (rightclick and save)
ice2-for.pl
ice2-idx.pl
and rename their extensions from .txt to .pl
(you couldn't have downloaded them, if I had
put them up as .pl)


Installation:

1. Edit ice2-for.pl to look like this:

#! /usr/local/bin/perl
This is the first line, called the shebang line. Ask your ISP for the exact path to the Perl interpreter on their server!

local($title)="Wild Boar Search Engine";
Replace "Wild Boar Search Engine" with the name you want to call your search engine. It will appear as the title when the script is called.

$indexfile='/www/htdocs/.../www.wildboar.net/webdocs/cgi/ice/ice1-5/index.idx';
Put the physical path (not the URL/web path!) to your index.idx file inside the single quotes! If you don't know what it is, ask your ISP or install & run the hello.pl script! This should be the same place, where you will upload your two Perl files: your /cgi or /cgi-bin directory. The ice2-idx.pl program (which you will have to run first) will generate/create the index.idx file and place it right next to them. If you don't have Telnet access to your website on your ISP's server, you will have to run ice2-idx.pl on your local copy of your website on your hard drive and then edit & upload the generated index.idx file to this location. This is the place (on your or your ISP's server), where ice2-for.pl will look for your index file to read the results from.

$docroot = '/www/htdocs/.../www.wildboar.net/webdocs';
%aliases = (
);

Inside the single quotes this should be the physical path (not the URL/web path!) to your root directory of your website on your ISP's server.

2. Edit ice2-idx.pl to look like this:

@SEARCHDIRS=( 
'E:\wildboar',
);

If you're generating/creating the index file from your hard drive copy of your website, then this should be your website's root directory on your hard drive. This is what area ice2-idx.pl will search through in order to generate the index file.

@SEARCHDIRS=( 
'/www/htdocs/domains/domain3/00095/www.wildboar.net/webdocs',
);

But if you do have Telnet access to your website on your ISP's server (or you have a web server yourself doing everything in one place), then the physical path (not the URL/web path!) to your website's root directory on your ISP's server should go between the single quotes instead. This is, what area ice2-idx.pl will search through, in order to generate the index file.

$INDEXFILE='E:\wildboar\cgi\ice\ice1-5\index.idx';
If you're going to index your website's copy on your hard drive, then the above line should look something like this. This is, where your index file will appear, after it was generated.

$INDEXFILE='/www/htdocs/.../www.wildboar.net/webdocs/cgi/ice/ice1-5/index.idx';
But if you're going to index your website on your ISP's server, then the above line should look rather like this. This is, where your index file will appear, after it was generated.

3. Upload (by using an FTP program such as WS_FTP for Windows or Fetch for Macintosh) or place the two Perl files into the /cgi or /cgi-bin directory of your website on your (ISP's) server!

4. Generate the index file by typing perl ice2-idx.pl at the (DOS, UNIX or Telnet) prompt and pressing ENTER (in the directory where ice2-idx.pl resides)! The Perl interpreter should already be installed on your hard drive, if you're doing it locally.

If you need one for Windows 95/98/NT, download the Win32 Intel version (free) from Active Perl at http://www.activestate.com/Products/ActivePerl/Download.html (link no longer active)! Choose the APi522e.exe file at the bottom (the other version is a hassle, since it also requires a copy of Windows Installer)!

On UNIX machines it is already there, because it's part of a UNIX installation.

5. If you generated the index file on the UNIX server (directly or by Telnet), then it needs no editing. However, if you generated it locally on your hard drive, then you should edit it now, before uploading it to your /cgi or /cgi-bin directory, as follows:

a) First completely remove all sections with a _vti_cnf or _vti_pvt directory or a _vti_inf.html file in the second/bottom half of your index.idx file! These have been placed there by Microsoft FrontPage, in order to keep track of (manage) your website. If you leave them in there, they make the index file triple as large as without them and that slows down every search a lot!

Now that you removed them from the second half of index.idx, they still do come up in the search results. Not as the full file name any more, but as a 3-digit hexadecimal number link, on which if you click, you'll just get a "The page cannot be found" error. So you'll have to remove them also from the first/top half of the index.idx file, but for that there's no easy way: you'll have to go through each keyword and remove all those 3-4 digit hexadecimal numbers which used to refer to the _vti files. However, these hex numbers are NOT the same as the ones displayed in a search result!

Example: a search result for "Japanese" listed "132" as the 18th link found online, while looking at the first half of index.idx I found "440B" as the 18th number on the line for "japanese". So "132" equaled "440B" and when I removed "440B" and uploaded index.idx again, did the search on the keyword "Japanese", that "132" wasn't there any more. How does "132" become "440B"? Don't ask me, but you'll have to do this for all dead _vti links now!

b) Now remove all references to your local hard drive from the beginning of each path! In WordPad (Windows) you can do this by
Edit > Replace... > Find what: /E:\wildboar (or whatever your path was) > Replace with: (leave this empty) > Replace All. Be sure to leave a forward slash / in front of all remaining paths or files (to start them with)!

c) Next remove all keywords which would be unnecessarily found! These are useless and since nobody would search for them, they just take up space and slow the search down.

Irrelevant conjunctions/words: and, the, with, for, from, when, this, that, these, those, just, over, upon, throughout, please, thank, you, him, her, plus, same, test, then, than, via, will, your, his, their, small, other, next, back, home, here, there, worked, clicking, wonderful, studied, available, full, now, circa, two, none, last, page, welcome, all, also, always, came, carried, finally, gotten, happened, have, honestly, like, many, mostly, only, but, not, our, today, been, has.

Irrelevant URL, file extension & e-mail remainders: com, net, org, html, cgi, aol, att, compuserve.

Programming elements (JavaScript): appName, var, timerTwo, substring, MSIE, navigator, cmd, ctrl, scrollit, setTimeout, window, else, for, alert, emptyField, formObj, textObj, function, true, style, onMouseover.

HTML labels & character elements: name, select, value, submit, reset,   " & ×

and so on, you get the idea.