|
This file contains detailed instructions about the perl script ExplorerIndex.pl as written by Nigel Swinson. ExplorerIndex creates a tree control on a web page where the tree control mirrors either the filestructure below a named directory, or the contents of a specific html file.
Configuration is through a text file, with html templates used to describe the visual output.
ExplorerIndex, and the most recent version of this file, can be obtained from the explorerindex homepage found at http://scripts.carrubbers.org/scripts/explorerindex.
ExplorerIndex is an indexing script that will index both a directory, and the contents of a file. Where it indexes a directory, it will produce a tree control that mimics the directory structure of the directory. Where it indexes a file, it will capture each instance of the tags of interest, at the levels that you specified, and construct a tree control to represent the structure of the file.
The Perl script produces an html page that contains Javascript, and it is the Javascript that drives the tree control using Cookies. The Perl script was written for Linux (apache), but will probably work on Unix too and works fine under Windows (iis).
To configure the script, you must supply 4 lists of files and directories:
Everything that you list in what to index will appear as a root node in the tree control. If one of the entries is a directory, then it will be recursively searched to produce the expandable node, but will miss out any files/directories that you name in the what not to index list.
If at any time it comes across a file that is named in the what to content index list, then this file will appear not as a leaf node in the tree, but as a sub tree where the sub tree represents the contents of the file. If it comes across a directory which you have named to be content indexed, then it will content every file that is held below this directory, with the exception of those files/directories that you have listed in the what not to content index list. An example of content indexing is the contents tree of this file.
To configure the content indexing, you supply a list of tags, and a corresponding level at which to index each of these tags.
The files contained in this release are as follows:
ExplorerIndex can be run from either the command line, or from the internet. If you supply any argument that it doesn't understand, you will be given a usage message to remind you of the information presented here.
ExplorerIndex.pl [config=<file>] [debug]
This is the method for describing which file contains the configuration details. It is with respect to the execution directory of the script itself.
Example
If the script was stored as: /home/username/htdocs/scripts/ExplorerIndex.pl, and the execution directory was the same directory, then with the config parameter of ../Config.txt , it would be looking for the file /home/username/htdocs/Config.txt .
If no config parameter is given, then it will default to Config.txt in the execution directory.
If you supply the debug parameter, then the output will include debugging information. Use this option if you want to find out more acurately what the script is processing.
This output will include information on how the script has interpreted the configuration information, both from the command line, and from the file. It will also describe its actions as it indexes the files/directories to give you an idea of what is going wrong should the script not perform as you would expect. At the end of the output, it will give the file that was generated by this call to the script as plain/text. That is to say that if you use this parameter from the internet then it will display the html as plain/text as opposed to as the interpreted marked up text.
The configuration file is a powerful and easy way of using this same script more than once on your site. By using multiple configuration files you can have several different indexes on your site, each with it's own configuration file, each driven by the same script. This may be useful for example if you want a full index of your site, a partial index for users who are not allowed into the restricted areas of the site, or a parital index of only those secure areas. You could even have an index that displays only the images on the site, and an index that contains only all the scripts on a site.
I currently use the script to index this readme file, and others like it, and also to create an index into marked-up RFCs that I use often. Additionally it is used on the websites that I help to maintain in more than one place. The use of the configuration file makes this relatively straightforward.
This config file is split into 6 sections, each described below.
This section contains essential pathnames that you must set if you want this script to work. It's the hardest section to configure, as anyone with any script installation experience will know.
Before we go for a variable by variable description, here's the overview. The script is either producing a file to disk, or a file to STDOUT. We must consider from what environment this file will be viewed from. For example, suppose you produce an index to STDOUT, then the page that is produced will be as though it was a normal page within the directory that the script lives in. So if the script lives in:
\home\username\public_html\scripts\
then the file produced will be relative to this directory. So any link in the produced file to a resource called "images/image1.gif" will be a link to the file:
\home\username\public_html\scripts\images\image1.gif
If however you run the script but ask it to produce an index called:
\home\username\public_html\docs\index.html
Then this same link to images\image1.gif is now the file:
\home\username\public_html\docs\images\image1.gif
In order to conveniently handle this, you provide what is called the DocumentRoot and DocumentURL. The links in the index will be in relation to DocumentRoot, as opposed to the directory where the script ran from, and with a prefix of DocumentURL. So in the above example if the actual resource was:
\home\username\public_html\docs\images\image1.gif
but you wanted to produce a dynamic index to STDOUT, then you could provide DocumentURL as ../docs/. This means that the link to "images/image1.gif" becomes "../docs/images/image1.gif". This is the file:
\home\username\public_html\scripts\..\docs\images\image1.gif
which of course is the intended resource.
ScriptRoot
All template files, javascript files, and output files will be with respect to this directory.
It is used as a prefix to the files it accesses so for example if you have defined this as
\home\username\public_html\scriptsand the IndexTemplateFile istemplates\Index.html, then it will open the\home\username\public_html\scripts\templates\Index.htmlfile, regardless of where the script is executed from.It is strongly advised that this is an absolute address, not a relative, ie it begins with /.
Example:
ScriptRoot /home/username/htdocs/scripts/explorerindexNote that you do not need to use this property, and indeed I have come across one situation so far where this property is unusable. When using the script in Windows, and you are indexing a file on one drive with the script living on another drive, then this property just messes things up as there is no common root to the Index file, the javascript file and the template files.
DocumentRoot
This is the directory from which the indexing will occur. That is to say all links in the index will be with respect to this directory, and preceeded by DocumentRootURL.
When you are building a site index, you will want this to point to the directory that you are indexing. When you are building a table of contents, you will want this to point to the directory that the files lives in.
This value will be used to do a chdir command so that the script runs from the document root. The paths to all the files to be indexed are with respect to document root so this will mean that the links in the Index will actually work.
Use absolute addressing, ie begin the entry with /, as this is least prone to bugs and confusion.
To build a complete site index, you will find it most helpful for this to point to the wwwroot directory of your website. So if the script lives in:
/home/username/htdocs/scripts/explorerindex/then point this to
/home/username/htdocs/Example:
DocumentRoot /home/ccc/htdocs/DocumentRootURL
This is the url to DocumentRoot. So if you go to DocumentRootURL, you are accessing the files in DocumentRoot. This property will prefix every entry in the produced HTML index.
If you provide this property as a complete url, ie including the http:// bit, then you will have produced a portable index that can be placed anywhere in your site or on the net. This is the easiest format to make work.
If you provide this property as an absolute url, ie beginning with /, then it will be a portable index usable only on this site.
If you provide this property as a relative url, ie not beginning with a /, then you will have created a semi-portable table of contents that only requires that the table of contents is in the same place with respect to the files it links to. This is the most suitable form for a table of contents.
Example:DocumentRootURL http://www.carrubbers.org/IndexFile
If you want, you can have this script write its output to a file. To do this provide an IndexFile parameter which describes the location of the file that you would like this script to produce. This is an internal path, so it is with respect to ScriptRoot. If you ommit this parameter then the output will be written to STDOUT so it will go straight to the browser if called from the internet.
This facility could be useful if you don't have a separate development server, only a production server, and are worried about exposing access to files that are currently under construction, or contains sensitive information and you have not yet got round to excluding them from the index. You would then run the indexing script every so often to produce a "static" index that you could check to ensure it contained only those entries that you wanted it to.
Beware when using the script in this mode, as all the index links will be activeted when run from the IndexFile page. You will therefore need to configure DocumentRootUrl correctly to be sure to that the links work.
This is an internal path, and when used will be preceeded by ScriptRoot.
Example:IndexFile utils/SiteIndex.htmlIndexTemplateFile
This parameter describes the location of the IndexTemplateFile. Template files are a means of easily tailoring the formatting of the output of the script. For more information on the template files, and what they should contain, please see the section on Template Files.
This is an internal path, and when used will be preceeded by ScriptRoot.
Example:
IndexTemplateFile scripts/explorerindex/Template.htmlExample
As this is the most complicated section to configure, and bytes are cheap, here's some examples of how these work together.
Example 1: Dynamic Index
A site index produced as a dynamic index to STDOUT.
ScriptRoot /home/username/htdocs/scripts/explorerindex/
DocumentRoot /home/username/htdocs/
DocumentRootURL ../../
IndexTemplateFile Template.html
(IndexFile not set)The template file lives as /home/username/htdocs/scripts/explorerindex/Template.html. All links in the index are prefixed by ../../ to get to /home/username/htdocs/ (DocumentRoot) from the executable root /home/username/htdocs/scripts/explorerindex (ScriptRoot).
Example 2: Static Index
A site index, produced as a static index to file /docs/siteindex.html
ScriptRoot /home/username/htdocs/scripts/explorerindex/
DocumentRoot /home/username/htdocs/
DocumentRootURL ../
IndexTemplateFile Template.html
IndexFile ../../docs/siteindex.htmlThe template file lives as /home/username/htdocs/scripts/explorerindex/Template.html as before. The output file is /home/username/htdocs/scripts/explorerindex/../../docs/siteindex.html which is /home/username/htdocs/docs/siteindex.html. All links in the index are prefixed by ../ to get to /home/username/htdocs/ (DocumentRoot) from the executable root /home/username/htdocs/docs (IndexFile path).
Example 3: Static portable index.
A site index that once generated can be used anywhere on any site.
ScriptRoot /home/username/htdocs/scripts/explorerindex/
DocumentRoot /home/username/htdocs/
DocumentRootURL http://www.example.com/
IndexTemplateFile Template.html
IndexFile /home/username/htdocs/docs/siteindex.htmlThe template file lives as /home/username/htdocs/scripts/explorerindex/Template.html as before. The output file is /home/username/htdocs/docs/siteindex.html. All links in the index are prefixed by http://www.example.com/ to get to /home/username/htdocs/ (DocumentRoot) from anywhere at all on any website in any directory.
Example 4: Static Table of Contents
A content index of a file where the table of contents is placed inside the file that is content indexed itself. Exactly like this index.html in fact. The file is initially primed with the template tag (more about this later) to say where the index has to go in the file. The file is /home/username/htdocs/articles/article1.html
ScriptRoot
DocumentRoot /home/username/htdocs/articles/
(DocumentRootURL not set)
IndexTemplateFile /home/username/htdocs/articles/article1.html
IndexFile /home/username/htdocs/articles/article1.htmlHere the template file is the same as the Index file, so the template file will actually be replaced by the new file with index inserted. All links in the index are with respect to /home/username/htdocs/articles/ (DocumentRoot), which is the same directory as our file, so our index will be portable and independant of its location on the web.
Note in this example we have left ScriptRoot blank, as we have chosen to explicitly describe the complete path to IndexTemplateFile and to IndexFile. We could have defined ScriptRoot as
/home/username/htdocs/articles/and defined IndexTemplateFile and IndexFile asarticle1.html, but this would have made it difficult to define the JavascriptFile propert that we will come to later.
What you want to index.
Directory and file indexing
List of directories and files to index. Paths are internal and with respect to DocumentRoot. If the entry is preceeded by:
exclude- then it will not be included in the index.index content- then it's content will be indexed.exclude content- don't content index these files/directories.Do not include a trailing / for directories. The special file "." means include every file below DocumentRoot. Besides this use of "." and ".." are not allowed.
Example:<DIRECTORYINDEX> ################################################ # File indexing # Include everything . # But exclude the following directories exclude:scripts exclude:testarea exclude:stylesheets exclude:images exclude:forum/private_forum57263 exclude:javaclasses exclude:wap exclude:admin # With the exception of.. admin/public # image directories exclude:images # and Individual pages exclude:index.html exclude:error401.html exclude:error404.html exclude:error500.html ################################################ # Content indexing # But please content index the following directories index content:docs # And the following files. index content:articles/article23.html # But do not content index these directories exclude content: docs/private # And not these files index content:docs/long.html </DIRECTORYINDEX>IncludeExtensions
"." separated list of extensions to include. If not defined, all extensions will be included. This value takes precedence over those entries in the DirectoryIndex. That is to say if you ask for a .txt file to be included in the index, but then do not include .txt in the list here, then it will not be in the index.
Example:IncludeExtensions html.htm.gifExcludeExtensions
"." separated list of extensions to exclude. This is only valid if the IncludeExtensions parameter is not defined. This value takes the highest precedence. If an extension appears here then it will not be indexed even if it's extension is in the IncludeExtensions list, or if it is listed in the DirectoryIndex.
Example:ExcludeExtensions class.txt.css.jarDirectoryBrowsingOK
If your webserver allows you to browse directories, ie allows "http://www.example.com/other/" where no default document exists, then include this flag. If you include this flag then the pictures in the toc to the actual directories will link to the corresponding directory on the server.
Example:DirectoryBrowsingOKContent indexing
(This section was taken from the htmltoc readme.html written by Earl Hood)
The ContentIndexMap allows you to tell the script what significant elements to include in the index when we are content indexing a file. You specify what level they should appear in the index, and any text to include before and/or after the ToC entry. The format of the map file is as follows:
SignificantHtmlElement:Level:SigElementEnd:BeforeText,AfterText SignificantHtmlElement:Level:SigElementEnd:BeforeText,AfterText ...Each line of the map file contains a series of fields separated by the `
:' character. The definition of each field is as follows:
- SignificantHtmlElement
The tag name of the significant element. Example values are
H1,H2,H5. This field is case-insensitive.
- Level
What level the significant element occupies in the Index. This valid must be numeric, and non-zero. The number specifies the extra level depth from the depth of the file itself in the index. So if your file is at index level 4 and the element is to be at level 3, then the matching tags will be at level 4 + 3 = 7 in the index.
- SigElementEnd (Optional)
The tag name that signifies the termination of the SignificantElement.
Example: The
DTtag is a marker in HTML and not a container. However, one can indexDTsections of a definition list by using the valueDDin the SigElementEnd field (this does assume that eachDThas aDDfollowing it).If the SigElementEnd is empty, then the corresponding end tag of the specified SignificantElement is used. Example: If
H1is the SignificantElement, than the script looks for a "</H1>" for terminating the SignificantElement.Caution: the SigElementEnd value should not contain the `
<` and `>' tag delimiters. If you want the SigElementEnd to be the end tag of another element than that of the SignificantElement, than use "/element_name".The SigElementEnd field is case-insensitive.
- BeforeText, AfterText (Optional)
This is literal text that will be inserted before and/or after the index entry for the given SignificantElement. The BeforeText is separated from the AfterText by the `
,' character (which implies a comma cannot be contained in the before/after text). See examples following for the use of this field.In the map file, the first two fields MUST be specified.
Following are a few examples to help illustrate how a ToC map file works.
Example 1
The following map file reflects the default mapping the index uses if no map file is explicitly specified:
<CONTENTINDEXMAP> # Default mapping for the script # Comments can be inserted in the map file via the '#' character H1:1 # H1 are level 1 index entries H2:2 # H2 are level 2 index entries </CONTENTINDEXMAP>Example 2
The following map file makes use of the before/after text fields:
<CONTENTINDEXMAP> # A map file that adds some formatting H1:1::<STRONG>,</STRONG> # Make level 1 index entries <STRONG> H2:2::<EM>,</EM> # Make level 2 entries <EM> H2:3 # Make level 3 entries as is </CONTENTINDEXMAP>Example 3
The following map file tries to index definition terms:
<CONTENTINDEXMAP> # A index map file that can work for Glossary type documents H1:1 H2:2 DT:3:DD:<EM>,</EM> # Assumes document has a DD for each DT, otherwise ToC # will get entries with alot of text. </CONTENTINDEXMAP>Example 4
The following map file demonstrates how you can use HTML elements:
<CONTENTINDEXMAP> # A ToC map file that wraps index entries in header tags. This is illegal # HTML, but it looks pretty good in Mosaic. H1:1::<H3>,</H3> H2:2::<H4>,</H4> H3:3::<H5>,</H5> </CONTENTINDEXMAP>ContentIndexExtensions
"." separated list of extensions that can be content indexed. When you specify that you want every file in a directory to be content indexed, obviously some files are not suitable for content indexing such as images so list the extensions of the files that you want to be content indexed.
Example:ContentIndexExtensions html.htm.shtm.shtml
How the indexed entries will appear in the tree control.
FileTitleStartTag and FileTitleEndTag
The FileTitleStart/EndTag describes the patterns that will be searched for to determine what to put in the index. If it fails to find start-text, end-text then the filename itself will be used. Typically of course you will be looking for the <TITLE> tag.
Example:
FileTitleStartTag <Title>Example:
FileTitleEndTag </Title>You may prefix the text in the TITLE tag with the name of the site, so you could define instead a special comment such as
<!--IndexTitle Title goes here IndexTitle-->and then use these instead:Example:
FileTitleStartTag <!--IndexTitleExample:
FileTitleEndTag -->FileTitleUrlFormat
The format for the url of each index entry. If you want each to trigger a script with the file path as a parameter for example then you would use this feature. Each entry occurrence of <url> (case insensitive) will be replaced with the url for this entry. ie if this is:
MyScript?MyParam1=<url>&MyParam2=Value1
...then each entry will execute the MyScript script with the url as parameter 1, and Value2 as Parameter2. The default is:
<url>
...which provides a link to the file associated with the url by clicking on the description.
Example:FileTitleUrlFormat <url>AllowableIndexTags
The script will look for all the text between the speficied tags in the content index map or the FileTitleStartTag/EndTag. It may be however that there is an unsuitable tag such as <br> or <hr> between the delimiters. This property allows you to specify those tags which are appropriate for use around some or all of the text in an entry in the index.
Tags are comma separated and case-insensitive.
Example:AllowableIndexTags b,i,font,strong,em
Configuration of things in the generated javascript.
JavascriptFile
The location of the javascript file that contains the code to drive the explorer index. Internal path with respect to ScriptRoot. This file is copied in it's entirity into the generated file. No checking or parsing is done at all, so if you find that the index does not work, it may be the javascript as opposed to the Perl that contains the bug. At the time of writing this paragraph, it was known that Netscape 6 would freeze up when viewing the javascript, yet it works fine on Netscape 4.7 and IE 5.
Example:JavascriptFile scripts/explorerindex/ExplorerIndex.jsImagesPath
The path to the images for the navigation. Make this an absolute external path. Defaults to "images/"
Example:ImagesPath /scripts/explorerindex/images/
How the script will cope with errors.
ErrorTemplateFile
This parameter describes the location of the ErrorTemplateFile. Template files are a means of easily tailoring the formatting of the output of the script. For more information on the template files, and what they should contain, please see the section on Template Files.
This is an internal path with respect to Script Root.
Example:
ErrorTemplateFile scripts/explorerindex/Error.htmlErrorReturnURL
On the error page we will have a
Example:<A HREF="ReturnURL">ReturnTo </A>link. Define these two if you wish to use these tags on the error page.ErrorReturnTo Main page.
Example:ErrorReturnURL http://www.carrubbers.org/
Things that apply only to the running of the script server side.
KeepBackups
Keep the backup files when you are finished running the script. How much do you trust this script? Backup files will be called ".org".
Example:KeepBackups
Any script is only useful if you can tailor the look of the output. You tailor the output of this script by means of template files.
A template file looks like a normal html file, but it contains special tags that the Perl script references. The script acts as a preprocessor which reads in the template page, and replaces these tags, and repeats the html code between some of them. It is the resulting processed page that is sent to the browser or to the output file. The mechanism is much the same as what I believe ASP, PHP and JSP to use.
This is the template page to use to customise what the user will see if an error occurs whilst using the script.
The error template accepts the following special tags:
Example:
<html>
<head><title>Error</title>
</head>
<body bgcolor="#ffffff">
<!---- HEADER ---->
<h1>I'm sorry, you encountered an error with the ExplorerIndex script.</h1>
<hr>
<!---- END HEADER ---->
<B>The reason reported was:</B>
<!--Error#ErrorMessage-->
<A HREF="<!--Error#ReturnURL-->"><!--Error#ReturnTo--></A>;
Note you could use the parameters more than once, and even inside script elements:
<SCRIPT LANGUAGE="JAVASCRIPT">
<!--
ReturnTo = '<!--Error#ReturnTo-->';
ReturnURL = '<!--Error#ReturnURL-->';
if (ReturnTo && ReturnURL) {
document.writeln('\t<TR>');
document.writeln('\t\t<TD><B>Go back to: </B></TD> ');
document.writeln('\t\t<TD><A HREF="' + ReturnURL + '">' + ReturnTo + '</A></TD> ');
document.writeln('\t</TR>');
}
// -->
</SCRIPT>
</TABLE>
<HR>
<!---- FOOTER ---->
<hr>
This is my not very exciting footer..
<!---- End FOOTER ---->
</body>
</html>
This is the template page to use to customise what the index will look like. This is where you customise what background colour, fonts, headers and footers you wish to use.
The index page template accepts the following tags:
Note please do NOT nest this parameter inside a table. It will result in a drastic reduction in the speed in which it will render the page. ie do not do this:
<TABLE> <TR><TD>This banner will probably take ages to load</TD></TR> <TR><TD> <!--ExplorerIndex#Index--!> </TD></TR> <TR><TD> <!--ExplorerIndex#IndexStart--!> This is the text of the really old index that is going to get replaced by the script when it is run. <!--ExplorerIndex#IndexEnd--!> </TD></TR> <TR><TD>This will take even longer</TD></TR> </TABLE>
Instead do this:
<TABLE> <TR><TD>This banner will take much less time to load</TD></TR> </TABLE> <!--ExplorerIndex#Index--!> <!--ExplorerIndex#IndexStart--!> This is the text of the really old index that is going to get replaced by the script when it is run. <!--ExplorerIndex#IndexEnd--!> <TABLE> <TR><TD>This will take less to load too</TD></TR> </TABLE>
Nigel Swinson nigel@swinson.com
Nigel graduated from Edinburgh University in 2000 with a first class joint honours degree in Electronics & Computer Science. He currently works for Rockliffe Systems Inc, www.rockliffe.com programming in C++ and lives in Edinburgh, Scotland, UK.
Copyright (C) 2000-2001 Nigel Swinson, nigel@swinson.com
Copyright (C) 1994-1997 Earl Hood, ehood@medusa.acs.uci.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
The original program, called htmltoc http://www.oac.uci.edu/indiv/ehood/htmltoc.html, was written by Earl Hood. It generated a <UL> based toc from the index of a file so most of the code that generates the table of contents data structure in the content indexing section is original. The remaining code, including all of the indexing and generated Javascript is original work.
The main features that ExplorerIndex.pl has added to htmltoc are as follows:
Version 1.1.1 - 15th June 2001
Version 1.1 - 9th June 2001
Last updated: 17 April 2008 02:15:13.
© 2008 Carrubbers Christian Centre | Registered Charity No. SC011455