Edit these files to suit your needs.
These words aren't indexed.
This file may be created and then used to customise the output from the search form;
#ifndef CGS_CUSTOM_H #define CGS_CUSTOM_H 1 #include <stdio.h> #define CGS_STYLE_SHEET "Path/File.css" #define CGS_HTML_FOOT "\nSome_String\n\n" #endif /* custom.h */
'custom.h' is included in 'cgi-search.c';
#if __has_include ( "custom.h" ) #include "custom.h" #endif
'__has_include' does not work on older C compilers! In which case, you may need to change this bit of code;
#include "custom.h"
Or remove this bit altogether. It's just used to modify the appearance of the results a bit.
Default word-list.
Latin combining non spacing marks.
When combined with ASCII, these form accented letters.
Based on
UnicodeData.txt.
Characters which aren't considered alpha-numeric.
For as far as ASCII is concerned, the indexer considers anything but ASCII-
alnum (0 to 9 and a to z), alnum.alnum, alnum:alnum, '::', alnum::, ::alnum and
alnum::alnum a word delimiter.
With the '-u' option it has to make decisions about non-ASCII as well;
This header file is based on
UnicodeData.txt.
All but the following is in this header file;
So all of the above are considered alpha-numeric and therefore not word
delimiters.
I hope these criteria are correct. I know very little about non-latin
scripts.
Note: Without the '-u' option, the indexer will consider all non-ASCII to be
a word dilimiter.
Converts HTML-, SGML- and XML entities into UTF-8.
Based on W3C's
https://www.w3.org/2003/entities/2007xml/unicode.xml.zip.
It runs from 'AElig' ('Æ') to 'zwnj' (zero width non-joiner) and
contains 2408 entities.
A complete list here.
ASCII equivalents for non-ASCII (before 2021-11-29 this used to be wc2asc.h).
Based on field 5 in
UnicodeData.txt.
Added are;
Manually added are;
So, if you look for 'uF', the search will find 'µF' as well.
Numeric equivalents for non-ASCII.
Based on
UnicodeData.txt.