Creating Search Applications in One Step
Frequently Asked Questions
REVISED VERSION

Stephen P. Morse , San Francisco

This page has been completely rewritten and reformatted to make it more user friendly.  The original version appears here.

Sections:

 100 General
 200 Search Form
 300 Search Engine
 400 Data Base
 500 Examples
 600 Third Party Software


100 GENERAL

101. What is the purpose of this tool and who is it intended for?

This tool is intended for anyone who has a collection of data and wants to make it searchable from the web.  It enables you to easily generate the search form and search engine, create a database, and add soundex codes to the database.

There are no fees or licensing involved.  It is all being offered free of charge.  You can use it and modify it in any way you like.  At the bottom of the search form that it generates is a reference to this tool.  You are free to remove that reference if you desire.

I am not offering a service to host your search form for you.  You'll need to have your own access to a server on which to place the generated search form, generated search engine, and your database.  There are many free servers out there for you to chose from, although they usually don't let you run search engines from such servers.  If you are willing to pay a little, you can get a server for only $5/mo that will let you run a search engine.  But if you really want to do it for free, this tool even offers you a way to have the search engine run in the browser instead of the server (see the description of "Search Engine Language" in question 103), in which case you can host it on one of the free servers.
 

102. What is the difference between a search form and a search engine?

A search form is a form on which you enter values to be searched for.  It typically has a button that says "Search".  Pressing that button submits the values you entered to a search engine located out on a server.  That search engine goes through a database and finds the entries that satisfy the values you entered.  The search engine then generates the results as a web page that it displays on your screen.
 

103. How are the various entries under "Describe the Form" used?

The various entries that describe the form are:

Author: The author's name will appear below the heading line at the top of the resulting search form.  A copyright notice bearing the author's name will appear at the bottom of the form.

E-Mail Address: If an e-mail address is specified, the author's name at the top of the form becomes a link.  Clicking on that link starts an e-mail message addressed to the author.

Database Name: The name will appear in the title of the search form and also on the top line of the search form.  Specifically, it will say "Searching the xxx Database in One Step" where xxx is the Database Name you entered.  This name will also be used to find the database itself.  Specifically, the database must reside in a file named xxx.txt (for javascript search engines, the file name is xxx.js -- see question 403).

Search Engine is at: This gives the URL at which the search engine is located.  The search form needs this information in order to know where to submit the search request when the user presses the "Search" button.

This field is not needed if the Search Engine language (see below) is js.  It is needed for the other languages, including js-2.  See question 303 for more details.

Button text/value: Up to two buttons will appear at the top of the search form.  A typical use for at least one of the buttons is to link to a frequently-asked-questions page.  The button text is the wording that will appear on the button.  The button value is the URL that will be linked to when the button is pressed.

Soundex Type: Soundex is a method of finding matches that sound like the item being searched for although they might be spelled differently.  There are several different soundex encodings available.  American Soundex is typically used by the National Archives for census records.  Daitch Mokotoff soundex is useful when dealing with eastern European names.  See question 408 for additional information on the soundex systems.

Search Engine Language: There are various languages that the search engine can be written in.  The search engine typically runs on the server.  Typical languages supported by servers are "Perl" and "PHP".  Not all servers support these languages so you'll need to check with your system administrator to find out which language is supported.  If neither are supported, you can run the search engine entirely in the browser.  In that case select javascript ("js" or "js-2") as the language.  See question 303 for the difference between js and js-2.  All of these search engines use what is known as a flat database.  In addition, a search engine written in PHP can use an SQL database.  See question 411 for more details on SQL databases.

Location of raw tab-delimited data: This is an advanced feature useful for large databases.  For small databases, leave this blank.  See question 410 for more details.



200 SEARCH FORM

201. How do I define the searchable fields that I want on my search form?

There is a dropdown list that shows all the fields.  Initially a single field is defined.  You can create additional fields by clicking on the "Create New Field" button.

For each field you can specify a set of properties.  You do so by selecting the field in the dropdown list, and then entering the properties of that field into the appropriate boxes.  The properties of the selected field are:

Field Name: This is the text that will precede the box for the field on the search form.

Database Column: The database consists of one line for each record in the database.  Each line consists of columns containing the fields of that record.  This property tells which column contains the selected field.  The column numbering starts with 1.

Screen Column: Once a search is completed, the records in the database that match the search criteria are displayed on the screen.  Each line of the display consists of columns corresponds to the fields of one record.  This property specifies in which column the selected field will be placed.   The column numbering starts with 1.  If you leave this property blank, the corresponding field will not be displayed.  See question 407 for a discussion on how the user can view such fields.

Search By: Specifies the type of search that can be done on the selected field.  The first six choices (is exactly, sounds like, is phonetically, starts with, contains, ends with) are used with text string fields such as "name", whereas the last choice (is between) is used with numeric fields such as "age".  If no entry is checked, this field will not appear on the search form.  See question 207 for a description of "is phonetically".

Select value from dropdown list: Allows the user to select the desired value from a dropdown list rather than typing it in.  See question 204 for more details.

Use indexing on this field: Permits faster searches by allowing you to search through a single letter rather than the whole alphabet.  Only one field can be selected for indexing unless search engine is SQL.  See question 404 for more details about indexing.

Place details on this field: If there are more fields than can be displayed on the results page, a link is needed to be able to get to those fields not displayed.  That link by default appears at the end of the result line.  However you can have that link appear on any field of the results by checking off this box for that field.  See question 407 for more details.  (This feature is not implemented for a js search engine -- it is implemented for all other search engines, including js-2.)


202. How do I create my search form?

Once you've described your form and defined your fields, you are ready to create the search form.  You do this by pressing the button that says "Search Form".  A page will appear with the html code for the form.  Select the code and copy/paste it into a text editor.  The detailed steps for accomplishing this are shown below:

WARNING!  Netscape's Composer and Microsoft's Front Page are not text editors, they are html editors.  As such, they will modify your file in strange and unexpected ways.  Do NOT use them for creating the search form.  Use a vanilla text editor such as notepad.

Select the html text by using the edit menu or by typing ctrl-a

Copy the selected text by using the edit menu or by typing ctrl-c

Open a text editor such as notepad

Paste the selected text into the text editor by using the edit menu or by typing ctrl-v

Save the file making sure that the name ends in .htm or .html

At this point you can examine the search form by opening the file in your browser.  If you are familiar with html code, you can do some tweaking of the search from at this time by hand-modifying the html code.  Once you are satisfied with it, you can upload it to your server to make it accessible to others.

After you are finished copying the text, you can return to the main page by reloading the current page in your browser.

 

203. What does the "Test Drive" button do?

The "Test Drive" button is similar to the "Search Form" button in that both create the html code of the search form.  However the "Search Form" button displays the generated html code whereas the "Test Drive" button executes that code.  In other words, when you press the "Search Form" button you will see the generated html code, and when you press the "Test Drive" button you will see the form that the generated html code produces.

You can use the test drive to quickly test out your search form since it saves you the step of having to copy your html code to a file and then bringing up that file in your browser.  Once you are satisfied with the generated search form and are convinced that it does the search properly, you can then save the search form to a file so that you can bring up that file in your browser in the future.

The test drive let's you both see your test form and perform an actual search using the test form.  But in order to do the search, you will first have to create the search engine and database, and upload both of them to your server.

After you are finished with your test drive, you can return to the main page by reloading the current page in your browser.

There is a problem if you have selected js as your search-engine language.  The test drive will allow you to view the search form, but you will not be able to see your database file.  So you will not be able to do an actual search.  Even though you can't do a search, you can at least see what the resulting search form looks like and determine if it meets your needs.  This limitation does not apply to the other search-engine languages (Perl, PHP, or js-2).

And there is another limitation of the test drive.  If you are using a Netscape 4.x browser, you can not do more than one search during each test drive.  You will get a warning message if you try.
 

204. How can I create a field on my search form that is a dropdown list from which the user selects a value?

Assume that you have a field named "Month" and you would like the possible values to be restricted to the months of the year rather than being arbitrary text.

When you define the field properties for Month, you would define the search-by method to be "is exactly" and check the "Select value from dropdown list" box.  Then generate the search form in the normal manner.  Next look at the generated search form and find the lines that read

<select name='MonthMax'>
    <option selected></option>
    <!-- Insert your values here -->
</select>
Insert the following additional lines
<select name='MonthMax'>
    <option selected></option>
    <!-- Insert your values here -->
    <option>January</option>
    <option>February</option>
    <option>March</option>
    <option>April</option>
    <option>May</option>
    <option>June</option>
    <option>July</option>
    <option>August</option>
    <option>September</option>
    <option>October</option>
    <option>November</option>
    <option>December</option>
</select>
You can use dropdown lists with the search-by method of "is between" as well.  In that case two dropdown lists will be generated.  For example, suppose the field name is "Age" and the possible values of age are from 1 to 10.  Suppose that the user wants to search for all matches within a range of ages, say 5 to 8.  In this case you would look in the generated search form for the following two dropdown lists:
<select name='AgeMin'>
    <option selected></option>
    <!-- Insert your values here -->
</select>
<select name='AgeMax'>
    <option selected></option>
    <!-- Insert your values here -->
</select>
and modify them to be the following
<select name='AgeMin'>
    <option selected></option>
    <!-- Insert your values here -->
    <option selected>1</option>
    <option selected>2</option>
    <option selected>3</option>
    <option selected>4</option>
    <option selected>5</option>
    <option selected>6</option>
    <option selected>7</option>
    <option selected>8</option>
    <option selected>9</option>
    <option selected>10</option>
</select>

<select name='AgeMax'>
    <option selected></option>
    <!-- Insert your values here -->
    <option selected>1</option>
    <option selected>2</option>
    <option selected>3</option>
    <option selected>4</option>
    <option selected>5</option>
    <option selected>6</option>
    <option selected>7</option>
    <option selected>8</option>
    <option selected>9</option>
    <option selected>10</option>
</select>


205. How can I specify the ordering of the fields on my search form?

The ordering of the fields on the search form is the same as the order that they appear in the dropdown list associated with Select a field to Edit.  You specified that ordering as you inserted new fields into the dropdown list (each new field is inserted after the currently selected field).  If you wish to change the ordering, you will need to delete any fields that are not in the position you would like them to be in, and reinsert them where you want them.

For example, suppose you would like to change the "United States Presidents" demo application so that the presidents last name appears after (rather than before) his first and middle names on the search form.  In that case you would select the last-name field, make a note of the properties, press the "Delete This Field" button, select the middle-name field, press the "Insert New Field" button, and reenter the properties for the last name.

To have a field not appear on the search form, don't check off any of the search-by choices for that field.
 

206. Won't I get a lot of spam e-mail if my e-mail link is on the search form?

Normally you would.  Spammers use robot programs to read webpages and extract out anything that looks like an e-mail address.  So if a website has an e-mail link of the form

   <a href="mailto:johndoe@aol.com">John Doe</a>

it would find the johndoe@aol.com and add it to a spam list.  But this tool intentionally does not put the e-mail address intact into the html code of the search form.  Instead it puts in code that generates the e-mail link when the user clicks on your name, but nowhere in the html code does your e-mail address explicitly appear.  This will completely block spammers from harvesting your e-mail address.
 

207. What is the "is phonetically" search-by choice all about?

Soundex matching has the disadvantage of generating too many false hits.  A phonetic matching algorithm has the advantage of reducing the number of false hits.  The phonetic matching system used here is the Beider Morse Phonetic Matching (BMPM), developed by Alexander Beider and Stephen Morse.  However such a system has the possibility of losing valid hits.  So it doesn't replace soundex matching but rather complements it.  That is, your search form can let your users select which type of search they want to do -- phonetic or soundex.

Phonetic matching can be used with the PHP or SQL search engines only.  It cannot be used with a Perl or javascript search engine.

If you wish to do phonetic matching, you will need to obtain files containing the phonetic engine and the phonetic tables.  The files and the conditions for their use can be found here.



300 SEARCH ENGINE

301. How do I create my search engine?

The details for doing this are exactly the same as for creating the search form (see question 202), except now you press the button that says "Search Engine".  The file that you save the search engine to should end in one of the following suffixes depending on the language your search engine is written in:

PHP: Suffix is .php
SQL: Suffix is .php (the search engine is written in PHP but uses an SQL database -- see question 411 for more details)
Perl: Suffix is .cgi
js-2: Suffix is .html
js: There is no search engine.  See question 303 for more details.
If your search engine is written in Perl or PHP, it cannot be tested out on the client.  Instead it must be uploaded to a server before it can be tested.

If your search engine is written in Perl, you may have to perform some of the following steps as well.

You might need to remove the carriage return characters from the text file of your search engine.  That's because some servers have a nasty habit of requiring that line separators be a single line feed (LF) character without any carriage return (CR) character.  Unfortunately most windows-based text editors create a CR-LF pair at the end of each line.  One way to remove the CR characters is to use an editor that does not insert them.  Another is to use a utility that strips them out.  Another is to use an upload facility (such as a file manager provided by your server) which removes the CR characters as it uploads the file.  You might need to consult your system administrator for advice on how to do that.

You'll need to modify the first line of the file to point to the location of the Perl interpreter.  Check with your system administrator to find out where that is.

You might have to put your search engine into a special directory such as cgi-bin.  Check with your system administrator about this.

Files on a server have attributes indicating whether or not they are executable.  You'll probably need to have the executable attribute set.  The means of accomplishing this will depend on your server, so you should consult your system administrator to find out how to do it.


302. If my server supports both Perl and PHP, which one should I use?

Use PHP.

There are numerous pitfalls when using Perl but not when using PHP.  These include removing carriage return characters, putting the scripts in a special directory on the server, modifying the first line of the script to point to the Perl interpreter on your server, and modifying the script's access rights so that it is executable.  If you don't know how to do any of this, you certainly don't want to use a Perl search engine.  See question 301 for more details.
 

303. What is the difference between "js" and "js-2" as the search-engine language?

Both js and js-2 will result in a search engine written in javascript and running in the browser.  The difference is that js will result in the search engine being embedded in the search form itself.  In that case the search engine will not be in a separate file.  That is why you do not need to specify the location of the search engine when using js as the search-engine language.

So it would seem that js would be preferred since there is only one file involved, and there would never be a reason to use js-2.  However there were some unorthodox tricks used to allow the search engine to run inside the search form.  These tricks cause the resulting search tool to be lacking some functionality.  For example they prohibit the use of a link on the results page to display the next or previous set of matches.  And there are other limitations as well.  For these reasons, js-2 is actually more desirable to use even though it means that you will have two separate files -- one for the search form and the other for the search engine.
 

304. Do PHP and Perl stand for anything?

Yes, but you really don't want to know.  Neither is very meaningful.

Perl stands for Practical Extraction and Report Language.

PHP stands for PHP: Hypertext Pre-processor.  So replacing the "PHP" that is in the definition of PHP, we conclude that it stands for PHP: Hypertext Pre-processor: Hypertext Pre-processor.  Isn't recursion wonderful.  [Second thoughts: further research revealed that PHP originally stood for Personal Home Page.  So it is not so recursive after all, but also not very discriptive any more.]

Now aren't you sorry you asked.
 

305. Can I protect my search engine so that other people can't put up their own search form and link to my engine?

Yes you can.  Rather than presenting all the details on this page, I've put them in a separate document.  Click here to see the technical details.



400 DATA BASE

401. What format must my database be in?

The database consists of one line for each record in the database.  Each line consists of tab-separated columns containing the fields of that record.  The determination of which column corresponds to which field is specified by the "Database Column" (see question 201).

Note that adjacent fields are separated by a single tab character.  The tabs are not used to create visual alignment.  You must not add additional tabs to make the alignment nicer because such extra tabs will alter the meaning of the subsequent fields on the line.
 

402. My database is in a spreadsheet.  How do I get it to be in the tab-separated format that you require?

Your spreadsheet program has a "save as" menu item under the file menu.  You can specify a type for the file being saved.  Specify the type to be "tab-delimited text file".

Another way to to it is to select all the cells of the database (left click on upper left-hand cell, then hold the shift key down and left-click on the lower right-hand cell), copy the selected text (use the edit menu or type ctrl-c), paste it into a text editor such as notepad (use the edit menu or type ctrl-v), and then save the file from the text editor (use the file menu or type ctrl-s).

Note that the second method is preferable because the first will sometimes put quotes around the contents of certain cells.  Also the second method will preserve non-standard characters (diacriticals and accentuated characters).
 

403. What does the "Database" button do and how is it used?

If you are generating a javascript-based search engine, the data has to be reformatted slightly in order to accommodate the javascript syntax.  Also, in order to do a sounds-like search, soundex codes must be inserted into the database.  If you are not using a javascript search engine and you do not intend to do any sounds-like searches, you can skip this step.

You can reformat the data for javascript and/or add the soundex codes to the database as follows:

Press the "Database" button

A page will appear with a box into which you can insert your database.  Copy-and-paste your database into this box.

Click on the button that says "Process".

A new database will appear in the bottom box.  This database has all the fields that were in the original database, as well as additional fields that contain the soundex codes.

Copy and paste this resulting database into a file.  (Note that on the Netscape 4.x browser, you cannot use ctrl-a to select this data for copying.  Instead you must right-click the mouse and then do a "select all".)

Name the database xxx.txt where xxx is the name you selected for your database.  However if you are generating a javascript search engine and not using indexing, you must name this database xxx.js; if you are generating a javascript search engine and are using indexing, you must name this database xxx.html (see question 404 for details).

Upload the xxx.txt or xxx.js or xxx.html file to the server. Place the file in the same directory as the search engine.  If you have selected js (but not js-2) as your search-engine language, then the search engine is embedded in the search form and you need to place the xxx.js file in the same directory as the search form.

If the number of records in your database is large, it might not be practical to perform the copy-and-paste operations described above.  See question 410 for what to do in that case.

If you are using phonetic matching (see question 207 for a description of phonetic matching) on any of your fields, you cannot use the copy-and-paste method described here and must use the method described in question 410.
 

404. What is indexing and how do I use it?

Indexing is a means of speeding up the searches by looking under a specific letter rather than the whole alphabet.  For example, if the lastname of the person being searched for starts with R, it would be quicker to search only through the R's rather than search through the entire database.

If you are not using an SQL search engine then:

Only one field can be selected for indexing.  You make this selection by checking off the "Use indexing on this field" box under the field's properties (see question 103).  You will also need to subdivide your data into 27 files -- one for each letter of the alphabet and one catch-all for any values that do not start with a letter.  You can still use the "Database" button for creating your database (see question 403), but you will have to create each of the 27 files separately.

If you do a search and do not specify at least the first letter of item in the field being indexed, the search engine will not know which of the 27 files to look in.  For this reason, it is required that the first letter of an index field be specified.

If you are using a Perl or PHP search engine, name your database files xxx_a.txt, xxx_b.txt, etc. where xxx is the name you assigned to your database.  Name the catch-all file xxx_0.txt (that's the digit zero).

If you are using a js-2 search engine, name your files xxx_a.html, xxx_b.html, etc., and name the catch-all file xxx_0.html (that's the digit zero).  Note that this differs from the non-indexed case in which the database file had a .js suffix instead.

You cannot use indexing if you are using a js (as opposed to js-2) search engine.

If you are using an SQL search engine (see question 411 for more details on SQL) then:
All aspects of indexing are handled automatically by SQL.  You do not need to subdivide your data into separate files as was described above for the non-SQL case.

You can select as many fields for indexing as you desire.  You make this selection by checking off the "Use indexing on this field" box under the field's properties (see question 103).

The more fields you index on, the faster will be your search.  However the storage requirement for your tables will increase.  So you should index only on those fields that you expect a user to use frequently in his searches.


405. I have a less-than symbol in one of the entries in my database.  Why doesn't it display properly on my results page?

The less-than symbol has special meaning as an html tag.  For example, suppose you want to put an e-mail link into one of the fields of your database.  In that case you would enter the value in your database as:

<a href="mailto:john@doe.com">John Doe</a>

If you would like to enter a real less-than symbol into one of your database fields, you need to enter it as a pair of less-thans.  For example, to enter x<y into a field, you would put it in as x<<y.

See also question 406.
 

406. I have many ampersand symbols in my database.  Some display fine but others do not.  How can I get them all to display?

Unfortunately ampersand (&) has a special meaning in html code.  Specifically it denotes certain characters.  For example "&lt;" denotes the less-than character (<).  So if you happen to be unfortunate enough to have an ampersand in your database and it is followed by "lt;", it will display as "<" instead of the string "&lt;".

For the most part that shouldn't present any problems because it is unlikely that you would have a valid character designation following the ampersand.  But if you do, you will not get the display that you expected.  To avoid this problem, you can double up your ampersands.  Two ampersands together will always display as a single ampersand and will never look like the beginning of a character designation.

See also question 405.
 

407. What do I do if my data has more fields than can be displayed neatly on one line?

The results of a search are displayed to the user in tabular form, where each line of the table corresponds to a match that was found and each column on the line are the fields of that match.  For example, if one of the matches corresponds to a person named John Smith age 25, one line of the table would contain the fields "John", "Smith", and "25".

You can specify which fields are to be displayed in the table and in what order.  You do this with the "screen column" property (see question 201).  If you leave that property blank, the corresponding field will not be displayed.  This is useful when you have more fields than will fit on a single-line display.

If there are any non-displayed fields, a link entitled "details" will appear at the end of each line displayed.  If the user clicks on one of those links, a separate display corresponding to the particular match will appear.  That display will list all fields of that match.  This allows the user to see the values of the fields that are not displayed in the results table.

Rather than having a separate "details" link, it's possible to designate a specific field on the results line to be the link to the additional details.  For example, the first field on each line might be the person's name.  It might be convenient to have the additional details appear when the name is clicked on.  You can designate the field for which you would like to have this link, and you do so by checking the "Place details on this field" box when you define the field.  See question 201 for more details on defining a field.  (This feature is not implemented for a js search engine -- it is implemented for all the other search engines, including js-2.)
 

408. Who developed the various soundex systems?

The original idea for soundexing is credit to Robert Russell in 1918.  He was the developer of what is now known as the Russell Soundex System.  His system was later improved upon and became the American Soundex system that is used by the US government in such applications as indexing the US census records.  More recently the Daitch-Mokotoff (DM) Soundex System developed by Randy Daitch and Gary Mokotoff in 1985 has become the standard for use in indexing Eastern European names.

Algorithms for computing Russell Soundex codes and American Soundex codes are fairly straightforward.  DM Soundex codes are much more difficult to implement.  Michael Tobias is credited with having done some of the early work in developing the algorithms for computing DM soundex codes and for the original implementation of those algorithms.  The DM implementation used in this tool is based on Tobias's original implementation.

Click here for more details on the various soundex systems.
 

409. Can I include links and/or images in the search results?

Yes.  To do so, you need to put the correct html tags into the database.

For example, suppose you wish the result to be not just the name "John Smith" but rather a link to a biography of John Smith.  Instead of entering "John Smith" into a field in the database, you would enter the following into that field:

<a href='http://myserver.com/JohnSmith.html'>John Smith</a>
That will generate a result with a link field having the text "John Smith"; if the user clicks on that link he will be taken to the page at http://myserver.com/JohnSmith.html.

If you wish a result field to be an image of John Smith, you would enter the following into the database entry for that field:

<img src='http://myserver.com/JohnSmith.jpg'>
In this case the image itself would appear in the generated results.

Note the use of the apostrophe (') characters in the html link and image tags above.  Normally html code works equally well with quote (")  characters instead such as in

<img src="http://myserver.com/JohnSmith.jpg">
However my javascript search engine will not function correctly if you use quotes in the link or image tags.  If you are using my Perl, PHP, or SQL search engines, then you can use quotes.  But since apostrophes work in all cases, I recommend you stick with apostrophes in the link and image tags.  Outside of the html tags, there is no restriction against your using quotes in the data values, even with the javascript search engine.

A word of caution.  There appears to be a bug in the Mozilla family of browsers (mozilla, netscape 6x, netscape 7x, firefox) that prevents a relative link from being passed to a new window.  To demonstrate this, copy-and-paste the following into the address field of such a browser:

     javascript:'<a href="index.htm">Link Test</a>'

In the Internet Explorer browser, this displays the text "Link Test" as a link.  But in the mozilla browsers, the text appears but it is not a link.  The consequence of this is that if you have a relative link in one of your data fields, it will not appear as a link when the user opens the separate display showing all the details (see question 407 for a description of the separate detailed display).  The work-around is to avoid the use of relative links.  So instead of the link being "index.html", use "http://yourserver.com/index.html" instead.  In that case the link will do what it is supposed to do.
 

410. What do I do if my database is too large for me to perform the copy-and-paste operations required?

Question 403 described how to get your database in the desired format.  It involves copying-and-pasting your data into a window, pressing a button which sprinkles the appropriate soundex codes into your database, and then copying-and-pasting the results back into a file.  If the number of records in your database is large, this might not be practical.

If you have a server that is capable of executing PHP scripts (regardless of what language you have selected for your search engine), there is an alternate approach.  In that case you will put your database file on your server and fill in that server location in the field that says "Location of raw tab-delimited data".  Now when you press the Database button, you will be presented with a PHP script that can do the reformatting.   You simply copy-and-paste this script into a file (similar to the instructions in question 202).  Give that file a suffix of .php and upload it to your server in the same directory as the database.  Run that PHP script.  If you have not specified SQL for the search-engine language (see question 411), the reformatted database will be generated in that same directory and the name of that reformatted database will be displayed on your screen when you run the php script.  If you have specified SQL, no output file will be generated but instead the reformatted data will be entered in the SQL database.

Caution: do not use the same name for your raw database file as will be used for your final database file.  If your database name is X, your final database will be in X.txt (assuming you are using a PHP or PERL search engine).  So do not place your raw file in X.txt -- instead use some name like Xraw.txt.

To reiterate, this method of reformatting a large database has nothing to do with the language that you have selected for your search engine.  You can generate a search engine in PERL or in javascript, and still use the PHP script generated here to reformat your data.

If you are using phonetic matching (see question 207 for a description of phonetic matching) on any of your fields, the method described in question 403 cannot be used and you must use the alternate approach described here.  Furthermore, the phonetic tables must be in the same directory as the database.
 

411. What is the SQL choice under "search engine language" all about?

Regardless of the language you select for your search engine (javascript, PERL, or PHP), the search will be performed sequentially using what is called a flat file.  That is, each record of the database is examined, one at a time, in order to determine if it matches the search parameters that the user specified.  That means that if you double the number of records in your database, you will double the amount of time required to perform the search.  Eventually you will reach a point where the time required to  perform the search is unacceptable to the users.  (See question 412 for how many records can be searched in a reasonable amount of time.)

For very large databases, such a sequential search is unacceptable.  An alternative is to use a relational database such as mySQL.  You do that by selecting SQL as the search-engine language.  The search engine will be written in PHP (same as if you chose PHP as the language) but in this case the search engine will make calls into an SQL database rather than doing the search sequentially.  Of course you can do this only if your server supports SQL databases as well as PHP scripts.

The steps for using SQL are as follows:

Fill in the information on the form pertaining to SQL databases.  Specifically
username: This is the username that you have created for accessing your SQL database.  Details for creating the username depend on your hosting provider.

password: This is the password that you have created for accessing your SQL database.  Details for creating the password depend on your hosting provider.

host: This is the name of the machine on which your SQL database will be hosted.  For example, acme.com

database: This is the name of your sql database.  Some hosting providers force you to use a non-descriptive cryptic name.  If you have no such requirement, leave this field blank and the descriptive name that you entered previously (see question 103) will be used.

table: This is an arbitrary name that you make up.  It can be anything.

The steps for generating the search form and the search engine when using SQL are no different then they were before.  So follow the steps as already outlined in questions in the 200 section for the search form and the 300 section for the search engine.

Using the facilities provided by your server (such as a control panel), create an SQL database and enable it.  The details for doing so depend on your hosting provider and you might have to check with them for more specifics.

Create your raw database file as per the steps in question 401 and 402.  Then reformat it as described in question 410 (you cannot use the copy-and-paste methods described in question 403 when dealing with SQL).  The step described in question 410 will not only reformat the data but will enter the values into the SQL database.


412. Is there a limit on how large my database can be?

That depends on which language your search engine is written in.

If you write your search engine in javascript, the search is performed in the users browser and the entire database is downloaded to his browser for each search.  That can be time consuming, especially if the user is on a dial-up line.  A database of 10,000 records is probably the largest that you can expect your database to be and still get downloaded to the users browser in a reasonable amount of time.

If you write your search engine in PHP or PERL, the search is performed on the server and the entire database remains on the server.  Since the database is not downloaded to the users browser, the speed of downloading is not the limiting factor.  What is the limiting factor is how quickly the search engine on the server can read through the entire database looking for the records that match the search request.  The PHP and PERL search engines use what is known as a flat database which means that they go through each record sequentially.  The time to search is directly proportional to the number of records.  Assuming you don't want your user to wait more than say ten seconds before getting the result, you can probably have up to maybe 100,000 records in your database.  Furthermore, if you use indexing, you can increase that number to about 2,000,000 records (see question 404 for a discussion of indexing).

If your database grows to more than 2 million records, you'll need to do something other than a sequential search.  The solution is to use a relational database such as SQL (see question 411).  If you select SQL as the language for your search engine, the search engine will be written in PHP (same as if you chose PHP as the language) but it no longer does its searches sequentially.  So the number of records in the database can be considerably larger and the limiting factor now will probably be the amount of space that you have on your server.
 

413. Can I have fields containing non-latin characters (for example, Hebrew) in my database?

Yes but there is one precaution that you need to take.  You must save your database file (the one you created per the instructions in question 403) as a unicode file (UTF-8 with no signature or BOM character) rather than as an ascii file.  Many text editors (e.g., Microsoft's notepad) will warn you when you save the file that it contains non-latin characters and will tell you how to save it as unicode.

You cannot do a sounds-like search on a field that contains non-latin characters.  If this is something that is important to you, then you'll have to devise a soundex system on your character set and then customizing the generated search engine to use that soundex system.  Of course this requires considerable programming knowledge.

If you'd like to see a demonstration of a search application that uses Hebrew characters, click here.
 

414. How can I have my application do a search on either of two fields, for example last name or maiden name?

Create a third field called name.  The name field will appear on the form.  But leave the screen column for this field blank so that it will not appear on the results page (see question 201).  And have the search-by for this field be contains.  In the database you need to have the value for this field be a concatenation of the values of the maiden and married name fields.

For example, suppose that for a particular record the maiden name is "Jones" and the married name is "Smith".  The database entry for this record would have "Jones" in the maiden-name field, "Smith" in the married-name field, and "SmithJones" in the name field.

This has some limitations, such as not being able to do exact matches.  But there is a clever way of overcoming that.  Instead of entering the value in the database as "SmithJones", enter it as "!Smith!Jones!".  And modify the search engine so that it appends exclamation marks to the value that the user specified on the search form.  For example, if the user specified an exact search for "Smith", have the search-engine do a contains search for "!Smith!".  Similarly, if he specified a starts-with search for "Smi" or an ends-with search for "ith", have the search engine do a contains search for "!Smi" or "ith!" respectively.

Yes, this does require that you customize the search engine and means that you will need to have some programming knowledge.  Unless you are familiar with the particular programming language that the search engine is written in, you should probably not attempt this.



500 EXAMPLES

501. Can you give some examples?

EXAMPLE 1

Suppose you have a database consisting of last-name, first-name, age, and comments in that order.  However you would like your results to appear in the order first-name, last-name, age.  You do not want comments displayed.

Suppose the raw database is in the file named rawdata.txt.  It looks as follows:

Smith<tab>John<tab>21<tab>President of the food committee
Doe<tab>Jane<tab>35<tab>Lives in Brooklyn
Jones<tab>Henry<tab>18<tab>Convicted felon
The string "<tab>" above designates the tab character.  The actual file must contain the tab character and not the string "<tab>".  Also the above lines are indented to make them easy to see in this text.  The actual file must not be indented.

You would now make the following entries in the "Describe the Form" section on the "Creating Search Applications in One Step" form.

Author: Your name
E-Mail Address: Your e-mail address if you would like people to be able to contact you
Database Name: Family
Search Engine is at: http://yourserver.com/FamilySearch.php
Button 1 Text: Frequently Asked Questions
Button 1 Value: faq.html
Button 2 Text: Useful Webpages
Button 2 Value: http://stevemorse.org/index.html
Soundex Type: American
Search Engine: PHP
Next you would define your fields as follows:
Edit the first field so that it contains the following properties:
Field Name: Last Name
Database Column: 1
Screen Column: 2
Search By: is exactly, sounds like, starts with, contains, ends with
Press "Insert New Field" and edit the new field so that it contains the following properties:
Field Name: First Name
Database Column: 2
Screen Column: 1
Search By: is exactly, starts with, contains, ends with
Press "Insert New Field" again and edit the new field so that it contains the following properties:
Field Name: Age
Database Column: 3
Screen Column: 3
Search By: is between
Press "Insert New Field" one more time and edit the new field so that it contains the following properties:
Field Name: Comments
Database Column: 4
Screen Column: 0
Search By:
Now you are ready to create your search form and search engine.

Press the "Search Form" button.  The html code for it will appear.  Select all of it (ctrl-a) and copy it to the clipboard (ctrl-c).  Use notepad to open a new file.  Paste (ctrl-v) the contents of the clipboard into that file.  Save it to your disk and give it a name ending in .htm or .html.  Then upload it to a publicly-accessible place on your server.

Similarly create the search engine by pressing the "Search Engine" button.  Save it in the file named FamilySearch.php.  Upload that file to an appropriate location on your server.

Now you can append the soundex codes to the database.  Press the "Database" button.  A screen with two large boxes will appear.  Copy your entire rawdata.txt file to the clipboard (ctrl-a, ctrl-c) and then paste it (ctrl-v) into the upper box.  Press the "Process" button.  The database will now appear in the lower box with the soundex code for the lastname field added.  Note that only the lastname field gets a soundex code because it was the only field that you designated as being searchable by "sounds like".  Now copy this resulting database to the clipboard (ctrl-a, ctrl-c) and past it (ctrl-v) into a new file named Family.txt.  Upload this new file to your server in the same directory that you uploaded your search engine.

Finally you'll want a frequently-asked-questions page so you won't have to keep responding to the same questions over and over again.  Use whatever html editor you have available to create such a file, name it faq.html, and upload it to the server in the same directory that you uploaded the search form.
 

EXAMPLE 2:

Same as example 1 but your server doesn't support PHP or Perl.  So you'll want to create a javascript search engine instead.  Make the following changes to the steps for example 1.

Under the "Describe the Form" section:

Search Engine is at: http://yourserver.com/FamilySearch.html
When you create your search engine, you will save it in the file FamilySearch.html instead of FamilySearch.php.

When you create your final database file, you will save it in a new file called Family.js instead of Family.txt.
 

EXAMPLE 3:

Same as example 2 except you want to use indexing to speed up your searches.  Make the following changes to the steps for example 2.

Subdivide your rawdata.txt file into 27 files, one file for each letter of the alphabet and one catch-all file for all other characters.  Place those records having a last name starting with A into the A file, those with B into the B file, etc.  Place any records for which the last name does not start with a letter into the catch-all file.  If there are no such records, the catch-all file will be empty.

When you create your final database files, you will repeat the process 27 times, one for each of the raw database files.  And when you will save the final files, you will name them Family_a.html, Family_b.html, etc.  The catch-all file you will name Family_0.html (that's the digit zero).
 

502. Can I see some websites that were generated by this tool?

Here is a partial list.  More websites will be added as I learn about them.  If you have successfully used this tool to build your search application and would like to have it listed here, drop me an e-mail.

First is a sample search application that I wrote to demonstrate the use of this tool.

Searching the United States Presidents Database in One Step
     Search Form using javascript search engine: http://stevemorse.org/create/presidents-js.html
     Search Form using PHP search engine: http://stevemorse.org/create/presidents-php.html
     Search Form using Perl search engine: http://stevemorse.org/create/presidents-perl.html
     Database: http://stevemorse.org/create/United States Presidents.xls
And here is a partial list of some actual search applications that people have developed with this tool.  To find more, do a web search for "One-Step Search Tool generator".
Searching the Bari Database in One Step
     Introduction: http://www.rootsweb.com/~itappcnc/bari
     Search Form (surnames): http://www.rootsweb.com/~itappcnc/bari/barisearch.htm
     Search Form (microfilm): http://www.rootsweb.com/~itappcnc/bari/barifilm1.htm
Searching the Chicago Streets Database in One Step
     Introduction: http://www.rootsweb.com/~itappcnc/pipcnstreet.htm
     Search Form: http://www.rootsweb.com/~itappcnc/pipcnstreetfind.htm
Searching the Chicago Catholic Parish Database in One Step
     Search Form: http://www.rootsweb.ancestry.com/~itappcnc/cathsearch.htm
Searching the Raseinia 1816 Revision Lists in One Step
     Introduction: http://www.jewishfamilyhistory.org
     Search Form: http://www.jewishfamilyhistory.org/searchform.htm
Searching the Mass Declarations Database in One Steps
     Search Form: http://users.starpower.net/drkehs/nara/mdsf.htm
Search the KENEDER ODLER OBITUARY Database
     Introduction: http://www.jewishfamilyhistory.org/1816%20RL.htm
     Search Form: http://www.jewishfamilyhistory.org/searchform.htm
Search Existing Headstones in the Ste Luce Cemetery
     Introduction: https://www.royandboucher.com/maine/cemeteries/ste_luce/database.php
     Search Form: https://www.royandboucher.com/maine/cemeteries/ste_luce/form_cemetery.php
St. Joseph, Sinclair Maine, Search Cemetery
     Search Form: https://www.royandboucher.com/maine/cemeteries/sinclair/form_cemetery.php
Search the Fleets & Ships' Histories
     Search Form: http://www.theshipslist.com/ships/lines
Searching the Brooklyn 1925 Census in One Step
     Introduction: http://stevemorse.org/brooklyn/faqb.htm
     Search Form: http://stevemorse.org/brooklyn/brooklyn.html
Ste Luce Baptisms 1843-1860 -- Searching the Database
     Introduction: http://www.upperstjohn.com/steluce/baptisms.htm
     Search Form: http://www.upperstjohn.com/steluce/form_baptisms.php
Ste Luce Marriages 1843-1860 -- Searching the Database
     Introduction: http://www.upperstjohn.com/steluce/marriages.htm
     Simple Search Form: http://www.upperstjohn.com/steluce/simple_marriages.php
     Advanced Search Form: http://www.upperstjohn.com/steluce/form_marriages.php
Roseville Genealogical Society Member Surname Database
     Search Form: http://www.rgsca.org/surname_search.shtml
Sephardic Genealogy Databases
     Search Form: http://www.sephardicgen.com/databases/databases.html
Searching Naturalization Records on Footnote in One Step
     Search Form: http://stevemorse.org/footnote/footnote.html
List of Gura Humorului former Jewish Residents
     Search Form: http://humora.tripod.com/guramorasearch.htm
Searching Dachau Concentration Camp Records in One Step
     Introduction: http://stevemorse.org/dachau/intro.htm
     Search Form: http://stevemorse.org/dachau/dachau.html
Dawkins Genealogy Surname Database
     Main Page: http://www.dawkinsgenealogy.org
Searching the Adarphotos Database in One Step
     Search Form: http://www.belinkoff.com/photos/aform.html
Jewish Genealogy in Italy, Surname Database
     Search Form: http://www.italian-family-history.com/jewish/Familynames.html
Searching the Tilden High School Yearbooks in One Step
     Search Form: http://www.stevemorse.org/sjtilden/yearbooksearch.html

Searching the Jefferson High School Yearbooks in One Step
     Search Form: http://www.museumoffamilyhistory.com/Jefferson/yearbooksearch.html

Searching Kings County (Brooklyn NY) Naturalization Index in One Step
     Introduction: http://www.jgsny.org/brooklyn-naturalizations-1907-1924
     Search Form: http://www.jgsnydb.org/brooknats.htm
Searching the Russian Phonebooks in One Step
     Search Form: http://www.stevemorse.org/russian/phonebooks.html
Langley Essex Family History Pages
     Search Form: http://www.langleyessex.net/extrapgs/1901_search_frm.php
Searching the Aufbau Database
     Search Form: http://www.calzareth.com/aufbau/search.html
Krements Concordance Database Index
     Search Form: http://www.shtetlinks.jewishgen.org/Kremenets/web-pages/database/krem_search_frm.html
1898 Wilmette Directory Search Page
     Search Form: http://www.wilmettehistory.org/directories/1898/search1898.html
Searching the ANC Jewish Burials Database in One Step
     Introduction: http://anc.jgsgw.org/index.htm
     Search Form: http://anc.jgsgw.org/ANCSearchForm.htm
Searching the Database of Witbank's Jews in One Step
     Search Form: http://www.barrymann.net/Search/Form.htm
Searching the Zeta Psi Bloomsburg Alumni Database in One Step
     Search Form: http://home.comcast.net/~aaronrb/zetesearch3.html
Searching the Estonian Jews Database in One Step
     Search Form: http://eja.pri.ee/allname/search.html

Geissenhainer Church Records
     Introduction: http://www.magsgen.com/recordindexes.html
     Baptisms: http://www.magsgen.com/GeissenhainerWeb/Baptismsearchform.html
     Confirmations: http://www.magsgen.com/GeissenhainerWeb/Confirmsearchform.html
     Marriages: http://www.magsgen.com/GeissenhainerWeb/Marriagesearchform.html

Illinois Jewish Cemetery Records
     Search Form: http://jgs.jgsi.org/acjd/

Searching the Dunilovichi Cemetery Database in One Step
     Search Form: http://kehilalinks.jewishgen.org/dunilovichi/Searches/CemeterySearchForm.html

Searching the 1820 St. Croix Residents Database in One Step
     Search Form: http://stx.visharoots.org/1820form.html

Searching St. Peter's Grave Yard Internment Database in One Step
     Search Form: http://www.nycnuts.net/essex/cemetery/stpeters/

1890 Wilmette Directory
     Search Form: http://www.wilmettehistory.org/directories/1890/search1890.html

Search the Clark County Genealogical Society Library Catalog
     Search Form: http://www.ccgs-wa.org/catalog_form.html


503. Can you give some examples of customizing a searchform after it has been generated by this tool?

The search application that this tool generates will return every record in the database if the user enters no information on the search form.  The reason is as follows:

Whenever you leave a particular field blank it means you are willing to accept any match for that field.  For example, if you enter the last name of Smith and leave the first name blank, you want all Smiths regardless of the first name.  So if you leave all fields blank, you are saying that you don't care what's in any of them which means that every record in the database will match.
If that's not the behavior that you want, you can customize the search application so that it rejects an all-blank form.  You can do this by modifying either the search form or the search engine.  The simplest way would be to modify the search form as follows:

   After the line that reads:

     function StartSearch() {

   add the following lines:

        var blank = true;
        for (var x in document.searchform) {
          var field = document.searchform[x];
          if (field && field.name && field.name.length>3) {
            var suffix = field.name.substr(field.name.length-3,3);
            if (suffix == "Max" || suffix == "Min") {
              if (field.type == "select-one" && field[field.selectedIndex].text != "") {
                blank = false;
              } else if (field.type == "text" && field.value != "") {
                blank = false;
             }
            }
          }
        }
        if (blank) {
          alert("No values were entered on form");
          return;
        }

As another example, suppose that you want to enforce a three character minimum on the last-name field, and suppose that the name you have given to that field is "surname".  This too can be done in either the search form or the search engine, and the simplest way would be to modify the search form as follows:

To do that, add the following lines: 

   After the line that reads:

     function StartSearch() {

   add the following lines: 

       if (document.searchform.surnameMax.value.length < 3) {
          alert("Surname must be at least three characters long");
          return;
        }

Of course this requires that you are familiar with javascript programming.  And these are just two examples.  There are an infinite number of customizations possible and I can't possibly give examples for all of them.
 



600 THIRD PARTY SOFTWARE

601. How can I combine this with Wordpress?

If you have a Wordpress website and you want to add the search applicaton you just created to your website, here is a step-by-step tutorial for accomplishing that.  My thanks to Ron Miller for recognizing the need for this, and for documenting the procedure.

Adding your One-Step PHP database to a Wordpress themed website


602. Are there problems with using phpMyAdmin for managing my sql database?

One user has reported that characters in the database that have diacritics (accent marks) work fine in his search application but they do not display properly when managing his search application with phpMyAdmin.  The problem is that phpMyAdmin was not recognizing the character encoding.  He researched it and found that the solutin was to add the following as the second line of his search engine, just after the "<?php" line.

header('Content-type: text/html; charset=utf8mb4');

I could modified my search-application generator so that this line gets automatically included in the resulting search engine.  But I didn't know if this would have any side effect, so I chose instead to leave it up to the user to include the line manually.

My thanks to Martin Abramov for figuring this out.


-- Steve Morse