ICE v1.31


Try it out first here!

Then download the files (rightclick & save)
ice-form.pl
ice-idx.pl
and rename their extensions from .txt to .pl
(you couldn't have downloaded them, if I had put them up as .pl)


Installation:

1. Edit ice-form.pl to look like this:

#! /usr/local/bin/perl
This is the first line, called the shebang line. Ask your ISP for the exact path to the Perl interpreter on their server!

local($title)="ICE Indexing Gateway";
Replace "ICE Indexing Gateway" with the name you want to call your search engine. It will appear as the title when the script is called.

$indexfile='/www/htdocs/domains/domain3/00095/www.wildboar.net/webdocs/cgi/ice/ice1-31/index.idx';
Put the physical path (not the URL/web path!) to your index.idx file inside the single quotes! If you don't know what it is, ask your ISP or install & run the hello.pl script! This should be the same place where you will upload your two Perl files: your /cgi or /cgi-bin directory. The ice-idx.pl program (which you will have to run first) will generate/create the index.idx file and place it right next to them. If you don't have Telnet access to your website on your ISP's server, you will have to run ice-idx.pl on your local copy of your website on your hard drive and then edit & upload the generated index.idx file to this location. This is the place (on your or your ISP's server) where ice-form.pl will look for your index file to read the results from.

%urltopath = (
'/www/htdocs/domains/domain3/00095/www.wildboar.net/webdocs',
);
Inside the single quotes this should be the physical path (not the URL/web path!) to your root directory of your website on your ISP's server.

2. Edit ice-idx.pl to look like this:

@SEARCHDIRS=( 
'E:\wildboar',
);
If you're generating/creating the index file from your hard drive copy of your website, then this should be your website's root directory on your hard drive. This is what area ice-idx.pl will search through in order to generate the index file.

@SEARCHDIRS=( 
'/www/htdocs/domains/domain3/00095/www.wildboar.net/webdocs',
);
But if you do have Telnet access to your website on your ISP's server (or you have a web server yourself doing everything in one place), then the physical path (not the URL/web path!) to your website's root directory on your ISP's server should go between the single quotes instead. This is what area ice-idx.pl will search through in order to generate the index file.

$INDEXFILE='E:\wildboar\cgi\ice\ice1-31\index.idx';
If you're going to index your website's copy on your hard drive, then the above line should look something like this. This is where your index file will appear after it was generated.

$INDEXFILE='/www/htdocs/domains/domain3/00095/www.wildboar.net/webdocs/cgi/ice/ice1-31/index.idx';
But if you're going to index your website on your ISP's server, then the above line should look rather like this. This is where your index file will appear after it was generated.

3. Upload (by using an FTP program such as WS_FTP for Windows or Fetch for Macintosh) or place the two Perl files into the /cgi or /cgi-bin directory of your website on your (ISP's) server!

4. Generate the index file by typing perl ice-idx.pl at the (DOS, UNIX or Telnet) prompt and pressing ENTER (in the directory where ice-idx.pl resides)! The Perl interpreter should already be installed on your hard drive, if you're doing it locally. If you need one for Windows 95/98/NT, download the Win32 Intel version (free) from Active Perl's website at http://www.activestate.com/Products/ActivePerl/Download.html! Choose the APi522e.exe file at the bottom (the other version is a hassle, since it also requires a copy of Windows Installer). On UNIX machines it is already there, because it's part of a UNIX installation.

5. If you generated the index file on the UNIX server (directly or by Telnet), then it needs no editing. However, if you generated it locally on your hard drive, then you should edit it now before uploading it to your /cgi or /cgi-bin directory as follows:

            a) First completely remove all sections (starting) with a _vti_cnf directory or a _vti_inf.html file! These have been placed there by Microsoft FrontPage in order to keep track of (manage) your website. If you leave them in there, they make the index file triple as large as without them and that slows down every search a lot! 

            b) Now remove all references to your local hard drive from the beginning of each path! In WordPad (Windows) you can do this by Edit > Replace... > Find what: /E:\wildboar (or whatever your path was) > Replace with: (leave this empty) > Replace All. Be sure to leave a forward slash / in front of all remaining paths or files (to start them with)!

            c) Next remove all keywords which would be unnecessarily found! These are useless and since nobody would search for them, they just take up space and slow the search down.

irrelevant conjunctions/words: and, the, with, for, from, when, this, that, these, those, just, over, upon, throughout, please, thank, you, him, her, plus, same, test, then, than, via, will, your, his, their, small, other, next, back, home, here, there, worked, clicking, wonderful, studied, available, full, now, circa, two, none, last, page, welcome, all, also, always, came, carried, finally, gotten, happened, have, honestly, like, many, mostly, only, but, not, our, today, been, has
irrelevant URL, file extension & e-mail remainders: com, net, org, html, cgi, aol, att, compuserve
programming elements (JavaScript): appName, var, timerTwo, substring, MSIE, navigator, cmd, ctrl, scrollit, setTimeout, window, else, for, alert, emptyField, formObj, textObj, function, true, style, onMouseover
HTML labels & character elements: name, select, value, submit, reset,   " & ×

            and so on, you get the idea.

6. Instead of generating the index file with ice-idx.pl, you might prefer to do it manually. Just create an index.txt file which you will rename later to index.idx. Inside the file you have to create an entry for each file you want to be found in a search. Example:

@f /index.html
@t TheWild Boar Home Page
@m 956849738
1 boar
1 home
10 wild

@f /multilingual/chinese/self-defense/kungfu.html
@t Kung Fu
@m 928010944
1 arts
1 bruce
2 chinese
1 defense
1 kung
5 lee
3 martial
1 self

@f means file (don't use the same file name once more below in another section, because that one will be ignored!).
@t means title (whatever text is between your <TITLE></TITLE> tags).
@m means last modified in milliseconds since the epoch (will display day, year, month, but can be left out totally).
1 means the keyword on the right came up 1 time on this page when indexing (it could be any number, even more, than 10, but you have to put something here, otherwise that keyword won't come up in the result)
bruce is a keyword (write it always in lowercase, because uppercase will be ignored!); keywords are put into alphabetical order by ice-idx.pl, but if you do it manually, you don't have to, it's just easier to look at them when they're alphabetized, if you have many (keywords).
Sections (of files) don't have to be separated by a blank line (only ice-idx.pl does it), I just did it to keep them separate.

7. Now change the files' permissions! Windows users can use WS_FTP, Macintosh users can use Fetch to do that.

For WS_FTP (Pro), Fetch, Telnet and UNIX users (in WS_FTP first highlight the file & rightclick on it > FTP Commands > SITE > type in the command from left-hand column below > OK):

chmod 755 ice-form.pl
chmod 755 ice-idx.pl
chmod 664 index.idx
-rwxr-xr-x
-rwxr-xr-x
-rw-rw-r--

For WS_FTP LE (Limited Edition) users: after making the changes you (all WS_FTP users) have to completely log out & back in to see them changed in DirInfo which will show you the above right-hand column.

OwnerGroupOther
read-write-execute
read-write-execute
read-write
read-execute
read-execute
read-write
read-execute
read-execute
read

8. Now create a link on an HTML page to call the ice-form.pl script to test it!

<a href="../../../../cgi/ice/ice1-31/ice-form.pl">Try out ICE here!</a>

or with the full path like this:
<a href="http://www.wildboar.net/cgi/ice/ice1-31/ice-form.pl">Try out ICE here!</a>

9. Click on the link and if the search works, you're done!