Try it first!
Then download the files (rightclick and save)
ice2-for.pl
ice2-idx.pl
and rename their extensions from .txt to .pl
(you couldn't have downloaded them, if I had
put them up as .pl)
Installation:
1. Edit ice2-for.pl to look like this:
#! /usr/local/bin/perl
This is the first line, called the shebang line. Ask your ISP for the exact path
to the Perl interpreter on their server!
local($title)="Wild Boar Search Engine";
Replace "Wild Boar Search Engine" with the name you want to call
your search engine. It will appear as the title when the script is called.
$indexfile='/www/htdocs/.../www.wildboar.net/webdocs/cgi/ice/ice1-5/index.idx';
Put the physical path (not the URL/web path!) to your index.idx file inside the single quotes!
If you don't know what it is, ask your ISP or install & run the
hello.pl script!
This should be the same place, where you will upload your two Perl files: your /cgi or /cgi-bin directory.
The ice2-idx.pl program (which you will have to run first) will generate/create the index.idx
file and place it right next to them. If you don't have Telnet access to your
website on your ISP's server, you will have to run ice2-idx.pl on your local copy
of your website on your hard drive and then edit & upload the generated
index.idx file to this location. This is the place (on your or your ISP's
server), where ice2-for.pl will look for your index file to read the results
from.
$docroot = '/www/htdocs/.../www.wildboar.net/webdocs';
%aliases = (
);
Inside the single quotes this should be the physical path (not the URL/web path!) to your root directory
of your website on your ISP's server.
2. Edit ice2-idx.pl to look like this:
@SEARCHDIRS=(
'E:\wildboar',
);
If you're generating/creating the index file from your hard drive copy of
your website, then this should be your website's root directory on your hard
drive. This is what area ice2-idx.pl will search through in order to generate the
index file.
@SEARCHDIRS=(
'/www/htdocs/domains/domain3/00095/www.wildboar.net/webdocs',
);
But if you do have Telnet access to your website on your ISP's server (or you have a web server yourself
doing everything in one place), then the physical path (not the URL/web path!) to your website's root
directory on your ISP's server should go between the single quotes instead. This is, what area ice2-idx.pl
will search through, in order to generate the index file.
$INDEXFILE='E:\wildboar\cgi\ice\ice1-5\index.idx';
If you're going to index your website's copy on your hard drive, then the above line should look
something like this. This is, where your index file will appear, after it was generated.
$INDEXFILE='/www/htdocs/.../www.wildboar.net/webdocs/cgi/ice/ice1-5/index.idx';
But if you're going to index your website on your ISP's server, then the above line should look rather
like this. This is, where your index file will appear, after it was generated.
3. Upload (by using an FTP program such as WS_FTP for Windows or Fetch for Macintosh) or place the two Perl files
into the /cgi or /cgi-bin directory of your website on your (ISP's) server!
4. Generate the index file by typing perl ice2-idx.pl
at the (DOS, UNIX or Telnet) prompt and pressing ENTER (in the directory where ice2-idx.pl resides)!
The Perl interpreter should already be installed on your hard drive, if you're doing it locally.
If you need one for Windows 95/98/NT, download the Win32 Intel version (free) from Active Perl at
http://www.activestate.com/Products/ActivePerl/Download.html (link no longer active)!
Choose the APi522e.exe file at the
bottom (the other version is a hassle, since it also requires a copy of
Windows Installer)!
On UNIX machines it is already there, because it's part of a UNIX installation.
5. If you generated the index file on the UNIX server (directly or by Telnet), then it needs no editing.
However, if you generated it locally on your hard drive, then you should edit it now, before uploading it
to your /cgi or /cgi-bin directory, as follows:
a) First completely remove all sections with a _vti_cnf or _vti_pvt directory or a
_vti_inf.html file in the second/bottom half of your index.idx file! These have been placed
there by Microsoft FrontPage, in order to keep track of (manage) your website. If you leave them in
there, they make the index file triple as large as without them and that slows down every search a lot!
Now that you
removed them from the second half of index.idx, they still do come up in the
search results. Not as the full file name any more, but as a 3-digit hexadecimal
number link, on which if you click, you'll just get a "The page cannot be
found" error. So you'll have to remove them also from the first/top half of
the index.idx file, but for that there's no easy way: you'll have to go through
each keyword and remove all those 3-4 digit hexadecimal numbers which used to
refer to the _vti files. However, these hex numbers are NOT the same as
the ones displayed in a search result!
Example:
a search result for "Japanese" listed "132" as the 18th link
found online, while looking at the first half of index.idx I found
"440B" as the 18th number on the line for "japanese". So
"132" equaled "440B" and when I removed "440B" and
uploaded index.idx again, did the search on the keyword "Japanese",
that "132" wasn't there any more. How does "132" become
"440B"? Don't ask me, but you'll have to do this for all dead _vti
links now!
b) Now remove all references to your local hard drive from the beginning of each path!
In WordPad (Windows) you can do this by
Edit > Replace... > Find what: /E:\wildboar (or whatever your path was) > Replace with: (leave this empty) > Replace All. Be sure to
leave a forward slash / in front of all remaining paths or files (to start them with)!
c) Next remove all keywords which would be unnecessarily found! These are
useless and since nobody would search for them, they just take up space and slow
the search down.
Irrelevant conjunctions/words: and, the, with, for, from,
when, this, that, these, those, just, over, upon, throughout, please, thank,
you, him, her, plus, same, test, then, than, via, will, your, his, their, small,
other, next, back, home, here, there, worked, clicking, wonderful, studied,
available, full, now, circa, two, none, last, page, welcome, all, also, always,
came, carried, finally, gotten, happened, have, honestly, like, many, mostly,
only, but, not, our, today, been, has.
Irrelevant URL, file extension & e-mail remainders: com, net, org,
html, cgi, aol, att, compuserve.
Programming elements (JavaScript): appName, var, timerTwo, substring,
MSIE, navigator, cmd, ctrl, scrollit, setTimeout, window, else, for, alert,
emptyField, formObj, textObj, function, true, style, onMouseover.
HTML labels & character elements: name, select, value, submit, reset,
" & ×
and so on, you get the idea.