How to Search the World Wide Web: A Tutorial for
Transcription
How to Search the World Wide Web: A Tutorial for
How to Search the World Wide Web: A Tutorial for Beginners and Non-Experts D. P. Habib and R. L. Balliot September 19, 1999 Updated August 1, 2000 This revision of our July 24,1998 edition updates the tutorial to keep it current and make it more informative. Also, it adds a new section on conducting searches that explains how to use operators to focus your search and how to compose a search question or query. Introduction Conducting a search can be time consuming and frustrating for the non-expert. This is not surprising given the enormous amount of information available on the World Wide Web and the different ways it is stored and retrieved. The search process is made all the more difficult because of the large number of search tools, their differing information content and the lack of industry standards. This work is prompted by the inherent difficulties in searching the World Wide Web. To keep the Tutorial simple, we have eliminated unnecessary information and explanations and placed the more complex material under the Advanced Information part. Our aim is to add to your knowledge and understanding of the search process and to help improve your skills in conducting searches. Basics of Conducting a Search A. Search Tools and conducting a search. Methods. Describes the means used in B. Keyword Search compose queries. Operators. Describes use of operators to C. Search operators . Tools. Lists preferred search tools and their keyword D. Planning and Conducting conducting searches. E. Hints Internet. F. a Search. Provides a guide for and Information. Useful facts about the workings of the Comments and About the Authors. Advanced Information G. Search Tool Descriptions. Describes the contents and use of preferred search tools. H. Conducting queries. Searches. A guide on use of operators and composing I. Home Page. Explains Home Page and Popular Site contents. J. Glossary. Defines terms used in the search process. We have excluded as beyond the scope of this work specialized search tools such as news, medicine, libraries, government and law - to name a few. Instead, we describe how to use the World Wide Web to obtain information on all subject matter. The World Wide Web, also known as WWW and the Web, comprises a vast collection of documents stored in computers all over the world. These specialized computers are linked to form part of a worldwide communication system called the Internet. When you conduct a search, you direct your computer’s browser to go to Web sites where documents are stored and retrieve the requested information for display on your screen. The Internet is the communication system by which the information travels. For those just starting to learn the search process, we recommend that you first scan through the Tutorial and become familiar with its contents. Follow with hands-on experience to develop a rudimentary knowledge of the search process by using the Search Exercises at the end of Section A. The Tutorial then will be easier to understand. You will find this exposition works best as a companion to your searches, especially with use of the glossary to explain unfamiliar terms. Netscape Navigator was the WWW browser used during the development of this Tutorial. Its teachings also apply to Microsoft Internet Explorer , though some terms used are different. For example, in MS Explorer Bookmarks are called Favorite Places and links are called shortcuts. Online Service Providers, such as AOL and CompuServe offer their own browsers, also with some differences in terms. However, all the browsers work essentially the same. A. Search Tools and Methods A search tool is a computer program that performs searches. A search method is the way a search tool requests and retrieves info rmation from its Web site. A search begins at a selected search tool’s Web site, reached by means of its address or URL. Each tool’s Web site comprises a store of information called a database. This database has links to other databases at other Web sites, and the other Web sites have links to still other Web sites, and so on and so on. Thus, each search tool has extended search capabilities by means of a worldwide system of links. Types of Search Tools There are essentially four types of search tools, each of which has its own search method. The following describe these search tools and then suggests exercises for achieving a familiarity with their use. 1. A directory search tool searches for information by subject matter. It is a hierarchical search tha t starts with a general subject heading and follows with a succession of increasingly more specific sub-headings. The search method it employs is known as a subject search. • • • Tips: Choose a subject search when you want general information on a subject or topic. Often, you can find links in the references provided that will lead to specific information you want. Advantage: It is easy to use. Also, information placed in its database is reviewed and indexed first by skilled persons to ensure its value. Disadvantage: Because directory reviews and indexing is so time consuming, the number of reviews are limited. Thus, directory databases are comparatively small and their updating frequency is relatively low. Also, descriptive information about each site is limited and general. 2. A search engine tool searches for information through use of keywords and responds with a list of references or hits. The search method it employs is known as a keyword search. • • • Tip: Choose a keyword search to obtain specific information, since its extensive database is likely to contain the information sought. Advantage: Its information content or database is substantially larger and more current than that of a directory search tool. Disadvantage: Not very exacting in the way it indexes and retrieves information in its database, which makes finding relevant documents more difficult. Keyword searches require far more explanation than subject searches, because of their broader scope and greater complexity. 3. A directory with search engine uses both the subject and keyword search methods interactively as described above. In the directory search part, the search follows the directory path through increasingly more specific subject matter. At each stop along the path, a search engine option is provided to enable the searcher to convert to a keyword search. The subject and keyword search is thus said to be coordinated. The further down the path the keyword search is made, the narrower is the search field and the fewer and more relevant the hits. • • • Tip: Use when you are uncertain whether a subject or keyword search will provide the best results. Advantages: Ability to narrow the search field to obtain better results. Disadvantages: This search method may not succeed for difficult searches. Some search tools use search engine and directory searches independently. They are said to be non-coordinated. 4. A multi-engine search tool (sometimes called a meta-search)utilizes a number of search engines in parallel. The search is conducted via keywords employing commonly used operators or plain language. It then lists the hits either by search engine employed or by integrating the results into a single listing. The search method it employs is known as a meta search. • • • Tip: Use to speed up the search process and to avoid redundant hits. Advantage: Tolerant of imprecise search questions and provides fewer hits of likely greater relevance. Disadvantage: Not as effective as a search engine for difficult searches. Search Tools A search tool employs a computer program to access Web sites and retrieve information. Each search tool is owned by a single entity, such as person, company or organization, which operates it from a master computer. When you use a search tool, your request travels to the tool’s Web site. There, it conducts a search of its database and directs the response back to your computer. Of the hundred’s of search tools available, we have selected 15 that we believe are best, both singly for their performance and as a group for the diversity they provide. Table 1 lists these as Preferred Search Tools by the primary search method each use. In practice, most subject search tools provide an auxiliary keyword search, and correspondingly, keyword search tools usually provide subject searches. Table I Preferred Search Tools Directory [Subject Search] Encyclopedia Britannica LookSmart Yahoo* Search Engine [Keyword Search] AltaVista Google Excite Hotbot Infoseek Light Northern Multi-Engine [Meta Search] Dogpile Mamma Metacrawler OneKey Snap Fast SavvySearch *Provides coordinated searches Search Exercises For those just starting to learn the search process, this segment is recommended to help you understand how the process works. The following is the general procedure: • • • • • Connect to the Internet via your browser [e.g. Netscape or MS Explorer] In the browser’s location box, type the address [i.e. URL] of your search tool choice. Press Enter. The Home Page of the search tool appears on your screen. Type your query in the address box at the top of the screen. Press Enter. Your search request travels via phone lines and the electronic backbone of the Internet to the search tool’s Web site. There, your query terms are matched against the index terms in the site’s database. The matching references are returned to your computer by the reverse process and displayed on your screen. The references returned are called "hits" and are ranked according to how well they match your query. Now, conduct the following searches to become familiar with each of the four types search tools described above: 1. Directory [Subject Search] Type http://www.yahoo.com in the location box of your Internet Browser [e.g. Netscape Navigator or MS Explorer]. Press Enter. The Yahoo! Home Page is displayed. From the subject list provided, choose and click a category of your interest to follow. Choose titles that are increasingly more specific until there are no more options of interest offered. Scroll through the references or hits, and click a hit that interests you to get an abstract or title of the reference. 2. Search Engine [Keyword Search] Type http://www.infoseek.com in the location box of your Internet Browser and press Enter to access the Home Page. Using keywords, type your question or query into the location box. Click Find. Examine the hits of interest and click one to access the reference. 3. Directory with Search Engine [Subject with Keyword Search] Follow the same procedure as in [1] above, except at one of the stops along the path switch to a keyword search. Type a simple query in the location box, and examine the hits of most interest. 4. Multi-Engine Search Tool [Keyword Search] Type http://www.savvysearch.com in the location box of your Internet Browser and press Enter. Type the same keyword query as used in [2] above. Compare the hits with those obtained in [2]. Go back and review Section A again from the beginning to re-enforce your understanding of the search methods. B. Keyword Search Ope rators Operators are the rules or specific instructions used for composing a query in a keyword search. A well-defined query greatly improves the chances of finding the information you are looking for. While each search engine has its own operators, some operators are used in common by a number of search engines. The following are among the most used operators. 1. Boolean Employs AND, OR, NEAR and NOT to connect words and phrases [i.e. terms] in the query where: • • • • AND requires that both terms are present somewhere within the document being sought. NEAR requires that one term must be found within a certain number of words of the other term. OR requires that at least one of the terms is present. NOT excludes any document containing the term. When using these operators, remember to capitalize them as shown above. Query Example: search AND tutorial 2. Plus / Minus • • • Employs [+] before a term to retrieve only the documents containing that term. It is similar to the Boolean AND. Employs [-] before a term to exclude that term from the search. It is similar to the Boolean NOT. Do not leave a space between the operator and the term that follows. Query Example: +search +tutorial –course 3. Phrases • Words enclosed within double quote marks denote an exact phrase, or reasonably close to it. It is sometimes similar to the Boolean NEAR. More often, it is treated like a single term. Query Example: "tutorial for beginners" 4. Stemming [Truncation] • The use of the stem or the main part of a word to search for variations of the word [e.g. the stem "sing" searches sings, singer, singing and singalong]. Stemming can be automatic, or it may require use of a wild card, symbolized by an asterisk [*] to initiate. Query Example: sing* 5. Case Sensitive • • • Use lower case for query terms except for proper names. Treat adjacent capitalized words as a single proper name, e. g. George Washington. Separate proper names from each other a comma. Query Example: George Washington, Thomas Jefferson Operators may at first seem complex to the beginner, but become understandable with use. For more explicit information on the use of operators, go to Conducting Searches in Section H of the Appendix. C. Preferred Search Tools Because most search engines developed their systems of search independently, there is little consistency among them in terminology, database content or retrieval criteria. Thus, you will find that although keyword searches are easy to use, they require learning to use well. Table II is organized to help identify frequently used operators for our list of preferred search tools. These were selected from over 100 search tools for the size and quality of their databases and the effectiveness of their retrieval systems. Table II Preferred Keyword Search Tools and Their Operators Search Tool Operators Boolean Plus/Minus AltaVista x x Dogpile Quote Marks Stemming Case Sensitive x * x x o Encyc. Brit. x x Excite x x x o o Fast x x x x o Google o x x o o HotBot x x x x x Infoseek o x x x x x x LookSmart MetaFind x Mamma x x x x x x x o x x x o SavvySearch o x x o Snap x x Yahoo x x MetaCrawler NorthernLight OneKey x x x * x Table Symbols: [x] means supports, [o] means excludes, [*] means a wild card capability. In addition to the operators shown in the table, most search tools also have operators of their own. Searches benefit from careful adherence to a search tool’s operators, particularly for more difficult searches. Links to Help Addresses can be found under Search Tool Descriptions in Section G. To the beginner, even 15 search tools from which to choose will seem too many. However, together they provide a diversity of database content, indexing criteria and retrieval methods that considerably enlarge the information available to you. We suggest that you start with Yahoo for subject searches, Infoseek for keyword searches and Savvy for meta searches. As you gain experience, expand the number until you find the ones that best meet your particular needs. Preferred Search Tools are described under Search Tool Descriptions in Section G. As can be seen in Table II, some operators are common to a number of search engines. We designated a selected set of these as Common Operators. This aspect provides a useful search technique that is illustrated in the following section. D. Planning and Conducting a Search Your search for a specific item in a world of information can be difficult, especially if the search is done without any planning. This section recommends ways of conducting a search in an orderly and informed way. For those just beginning to learn the search process, use the following guides: • • • • Develop a general understanding of the search tools, process and language. However, it is not necessary to know everything at the beginning. Start with the information that you need to search subjects of interest. You will find that your understanding will build as your experience broadens. Avoid searching for obscure information not likely to be found without use of sophisticated search methods. Once you become familiar with the use of operators, you can move toward the more complex searches. In keyword searches, start by working with no more than two or three search tools until you gain some mastery over them. A search tool’s Help section usually describes its current keyword search practices. From these learn how best to compose a query and focus the search. The enormous amount of information on the Internet, the many search tools, and the complexities of their use may intimidate you. This need not be; just focus on the subject you wish to pursue, develop a search plan and go about finding the desired information. Searching By Keyword There are various levels of complexity in conducting a keyword search. Begin with the easier searches and work your way toward those that are more complex. 1. Natural Language Use natural language to compose your query since it does not require the use of operators or special rules. • Yahoo is particularly effective as a start, because it has a keyword option along with a its subject search. With the largest subject directory, its sub-categories frequently become detailed enough to provide choice documents. If necessary, use the keyword search option at the appropriate place on the search path and employ natural language for the query. • Go to AltaVista and compose your query in direct question form. Be sure to use the question mark at the end of the sentence to ensure a suitable search. 2. Moderately Complex Searches For a convenient way to conduct a moderately complex search, employ Common Operators to compose your query. These were selected from Table II and applied as follows: • • • • Use phrase-forming quote marks [" "] around search words that belong together. Use [+] sign before the query term to require that returned references contain that term. Use [-] sign to require that returned references omit the term. Use lower case except with proper names. Proper names are capitalized and separated by commas. For a quick search, begin with a meta search tool, using the above Common Operators to compose your query. Meta searches are normally more tolerant of inexact use of operators, and their hit list is more likely to be shorter and of higher relevance. If a meta search tool does not provide the desired results, use search engines singly to obtain a more in-depth search. At times, you will need to try many search engines to look for an obscure or difficult-tofind document. Use the following procedure to facilitate your search: • • • • Insert all of the Preferred Search Tools into "Bookmarks" or "Favorite Places". This is facilitated by use of their URL links found under "Search Tool Descriptions". Thus, the search tool listing you create becomes a convenient place to access a search tool. Compose your query using Common Operators and copy it to Clipboard for later use. Access a keyword search engine, and via Clipboard, paste the query into its Location Box. Click on "Search" and evaluate the hits. • Go to the next search tool and repeat the search procedure. Repeat for each search tool. Once set up, the procedure works rapidly. The slow part then is evaluating the hits. You will find that there will be few duplicate references among the search tool results. Many of the hits will be unique, among which may be the reference of value to you. 3. Highly Complex Searches A search for obscure information benefits from the use of search engines having a large database and advanced keyword search capabilities, such as AltaVista, HotBot, Fast and NorthernLight. Study Section H on Conducting Searches and compose your query employing appropriate operators. The use of search engines has a trade-off ; it often produces an extraordinary number of hits. But, the first 20 to 30 are the most likely to contain the useful references, because hits are normally ranked according to their relevance. Searching by Subject In comparison to keyword searches, subject searches are rather simple. Subject searches begin with broad subject categories and proceed to subject matter that is increasingly more specific. To use a subject search, follow the search path and at each stop, examine the hits that are provided. The main advantage of directory searches is that they are of significantly higher quality and relevance than those found through a search engine. This is because subject experts review all documents submitted before they are accepted. Because of this timeconsuming effort, directory databases are much smaller than those of search engines. With some exceptions, directories can take weeks, and sometimes months, to update their database contents. In marked contrast, search engines collect and update web sites automatically, often within one or two days. This is of particular value when being current is important. Summary It can be disconcerting to the beginner to find that the number of hits obtained can range from none to over a million, and their relevance or usefulness can vary from negligible to considerable. There are, however, guides that can greatly help improve your search results. • • If your subject is broad, start with a subject search such as Yahoo, LookSmart or Encyclopedia Britannica. If your subject is narrow or specific, use a keyword search such as Infoseek, Excite or Snap. • • • • If you are not sure, try Yahoo and take advantage of its keyword option if needed. This option narrows the search to the last subject title, but in a smaller field. Try a meta engine such as Savvy or Dogpile. Meta engines produce fewer hits usually of higher relevancy. Try a number of different search engines using the Clipboard-Bookmark procedure. You will find that the hits produced by the search engines are significantly different, and therefore the chances of your getting the document you want are much improved. Should your search involve an obscure or difficult-to- find topic, use a search engine ha ving a large database such as AltaVista, HotBot or NorthernLight. Too many irrelevant hits are often due to too broad a query, because of an inadequate number of defining terms. Too few hits are often caused by too restrictive a query. However, there are many reasons for poor results. For more detailed information on improving your searches, see "Conducting Searches" under Section H in the Appendix. E. Hints and Information 1. To speed searches, create short cuts to your most- used search tools utilizing Bookmarks or Favorite Places. Also, add to your shortcuts during your search, so that you can later find your way back to useful Web sites. This technique also eliminates typing errors of addresses or URLs. 2. There are times when a search tool will not connect to a Web site for one of several reasons: • • • • You may have misspelled a word or erred in an address, not uncommon errors. A careful check will detect the mistake. You may have difficulty in accessing the site, because of the high activity there. In such instances, avoid the time period when all three US Time Zones coincide in their peak use period, usually from late afternoon through early evening. At times, the search tool itself may be disabled or undergoing changes, and you will need to wait until it is operating again. The site has been discontinued, but not removed as a link. 3. Because search tools are constantly trying to improve their performance, they are apt to make frequent changes in their database content, indexing and retrieval criteria. Thus, you will likely get a different response and ranking to the same query over time. This happens more frequently in keyword than in subject searches 4. During a search, you will sometimes find long articles that you prefer not to read or print at that moment. You can defer action by selecting the text, copying it onto Clipboard and then pasting it in a word processing window. Later you can read the articles and decide which parts, if any, you wish to keep for future reference. There is one drawback to this technique; tables do not replicate well. 5. Some Web sites may give you the option of eliminating graphics. For those with computers that are slow to download, you will speed up your search by using search tools that have minimal graphics. You can assess this factor by noting how long it takes to download a search tool’s home page. Alternatively, some browsers give you the option of deleting graphics entirely. 6. Each search engine has its own way of assigning relevance. Higher weighting is normally given to query terms in the title and the first few words in a document. For some search engines, proximity and frequency of query terms use are also factors. It is unusual that the best reference ranks first, unless your query happens to precisely match the search tool’s indexing. 7. Knowledge of how information is indexed can be helpful in selecting an appropriate search engine for a query. There are three methods used in the indexing of a Web site database. • • • Full text index: A database index that is said to include all terms and URLs. In practice, each search tool uses a filter to remove words it considers unnecessary or impractical to search. Keyword Index: A database index that is based on the location and frequency of words and phrases. If a name or term is mentioned only once or twice in the Web site, it may not be included in its index. Keyword indexing is the most used and fastest growing indexing method. Person [Human] Index: This index is created by individuals who review Web sites and sele ct the most appropriate words and phrases to describe their content. It provides a directory that is high in relevance and is based on similar cataloging methods used by libraries. Unlike the above two indexing methods, which employ robots, it has the value of being reviewed. 8. There are many ways of finding information on the Internet other than by the use of the WWW. These include WAIS, Archie, Veronica, Gopher and ftp, all of which preceded the WWW but have been greatly overshadowed by it. For the beginner, it is better to master the Web first, so as not to dilute your efforts. 9. There continues to be a huge proliferation of Web sites, because the Internet provides a simple and essentially cost- free way to publish and attain worldwide exposure. Because search engines spider their input without review, the searcher needs to be careful about the validity, accuracy and authority of their references. Directories, which are reviewed, have some advantage in this respect. In any case, wherever you can, consider the reputation of the author, source of the information and date of publication. F. Comments Learning to search the Web is an incremental process that builds with experience. You will find that your search skills will improve as you gain greater understanding of search terminology, search tool use and the way information is stored and retrieved. Some searches yield the desired information quickly, while with others you may just have to plod your way through. The learning process is laborious; but the reward is a world of information that becomes readily available to you. Finally, to those who have sent comments, thank you very much. We would appreciate any thoughts or suggestions that could help make the next edition more useful. Send these to [email protected]. This tutorial is copyrighted. However, no permission is required to use it for educational, non-commercial purposes. A simple E-mail indicating where or how it will be used would be much appreciated. For permission to utilize any of this tutorial’s contents for commercial purposes, go to http://members.home.net/davehab/fee.htm. About The Authors David Habib conceived and composed the tutorial. His main qualification is that he is a recent beginner in conducting Internet searches and thus more aware of beginners’ problems. Further, he has experience in researching and presenting complex technical subjects. Robert Balliot is the Director of the Middletown Public Library in Rhode Island with broad experience in conducting computer searches. He served as an expert resource, ensured the accuracy of the tutorial’s contents and produced the Web page. Copyrighted 1998, 1999 David P. Habib, Robert L. Balliot Advanced Information G. Search Tool Descriptions This section describes the Preferred Search Tools listed in Table II and provides links to their home and help page addresses. The following explains terms used in this section and supplies some helpful hints. • • • • • • Addresses: Home Page and Help addresses of the Preferred Search Tools are shown in the listing that follows. Save these addresses for easy access under Bookmarks or Favorite Places. Automatic Document Scanning: is a means of indexing or cataloging Web sites. It employs robots or spiders to scan all registered web sites, to add new sites and update older ones. Common Operators are a selected set of the most- used keyword search operators. [See Searching by Keyword in Section D for a description of their use]. Frame -based Information: Information that resides within a box on a Web page. Some search engines will not search within frames and therefore the information there is not indexed and retrievable. Full Text indicates that all words in a Web site text are indexed. However, search engines with large databases tend to ignore commonly used words in queries, because they overload the search process. Home and Help Pages: Periodically, visit the Web sites of search tools that interest you. This will keep you current on changes in their search procedures. FAQ [Frequently Asked Questions] usually contains help as well as other useful information. Search Tools Of the more than one hundred search tools available, we selected the following for their capabilities and advantages. Because database contents of search tools tend to complement each other, they enlarge the area of available information. This makes it possible to find and retrieve even obscure information. Because competition among search tools is keen, they continuously strive to widen their scope and improve their performance. Therefore, you can expect the information that follows to undergo periodic revisions. AltaVista Home Page Address: http://www.altavista.digital.com/ Help Page Addresses: Simple Query http://www.altavista.digital.com/cgi-bin/query?pg=h Advanced Query- http://www.altavista.digital.com/cgi-bin/query?pg=ah Search Method: Primarily keyword, with a subject option that draws on LookSmart subject directories. Also provides Popular Sites on its Home Page under "Specialty Searches". Database: Full text with one of the largest and most inclusive directory indices. Operators: Employs Advanced Search that uses both simple and advanced operators. The latter are comprehensive and sophisticated. Features: Provides ways of narrowing a search. Can limit search by date and retrieve references by last date modified. Translates text into a number of languages. Also employs "Ask Jeeves" that accepts queries in simple question form. Also can be configured to filter objectionable material from searches. Comments: A leading search engine. Has one of the largest databases and most effective search systems. If not used properly, can produce an extraordinary number of irrelevant hits. Serves as the default search engine for Look Smart and Britannica Internet Guide Britannica Internet Guide Home Page Address: http://www.ebig.com/ Help Page Address: http://www.britannica.com/help/h_index.html Search Method: Primarily subject with a keyword option. Database: Subject sites reviewed by Britannica experts. Operators: Provides its own version of an advanced keyword search. Features : Employs an easy-to- use magazine format. Comments : Draws on Encyclopedia Britannica’s extensive pub lishing experience and comprehensive information base. Although moderate in size, provides a full, high quality description of its subjects. It has especially good coverage in science, arts, history and geography. DOGPILE Home Page Address: http://www.dogpile.com/ Help Page Address: http://www.dogpile.com/notes.html Search Method: Meta search. Also provides long lists of Popular Sites. Database: That of the search engines employed. Operators : "Simple Search" operators are suitable. Features : Searches 13 search tools in specified order. Also furnishes subject listing. Uses MetaFind as an auxiliary search tool, which provides an effective way to conduct single word searches. Has a large list of Popular Sites. Comments : Offers customized search, and provides means of comparing results. EXCITE Home Page Address: http://www.excite.com/ Help Page Address: http://www.excite.com/Info/searching.html?a-n-t Search Method: Primarily keyword with subject option. Provides long lists of Popular Sites under several headings. Database: Full- text search of about 75 million documents. Operators : Supports simple and advanced searches. Features : Offers keyword searches for literal or concept queries, but does better with concept searches. Concept search is the default. [A concept search looks for ideas related to a literal query. Use of Boolean Operators turns off concept searching. Its channel sites are approved by editors and sometimes have reviews. Comments : It is easy to use, its headings and links are well organized and the instructions for its use are clearly presented. Includes current news related items with the search results. Excite runs Webcrawler as an independent meta search tool. Fast Home Page Address http://www.ussc.alltheweb.com/ Help Page Address: http://www.fast.no/fast.php3?d=support_faqs&c=websrch&h=3 Advanced Query: http://www.ussc.alltheweb.com/cgi-bin/advsearch Search Method: Primarily keyword. Advanced query allows complicated boolean searches with filtering. Database: Very large - over three hundred million pages are indexed. In 2000 over a billion pages will be included. Operators:"Simple Search" operators are suitable. The plus operator [+] is automatic. Features: Provides a very fast search. The default search is filtered for inappropriate content. Comments:Has a very sophisticated database, yet it provides a very uncluttered and easyto-use format. Can limit searches to particular categories and provides specialized searches in MP3 and multimedia via its partner Lycos. Google Home Page Address http://www.google.com/ Help Page Address: http://www.google.com/help.html Search Method: Primarily keyword. By selecting ‘I'm feeling lucky’ as an option may limit the search to the most relevant site. Database: Very large. Operators:"Simple Search" operators are suitable. The plus operator [+]is automatic. Features:Returns only pages that match all the terms in the query. Also, tries to return results where the terms are in close proximity. Ranks hits based on their use popularity. Comments:Has a very sophisticated database, yet it provides a very uncluttered and easyto-use format. Can limit searches to particular categories, such as Government and Linux related sites. Has joined with Yahoo! to become the default search engine for that directory. HotBot Home Page Address: http://www.hotbot.com/ Help Page Address: http://www.hotbot.com/help/ Search Method: Primarily keyword, with subject option that draws on LookSmart subject directories. Also provides an extensive list of Popular Sites under three categories. Uses full-text search. Database: Full text search Operators : Supports simple and expert [advanced] searches. Provides detailed instructions on use of operators under Help. Features : Provides pull-down menus and buttons for refining and focusing search. Can use simple language in composing queries rather than traditional operators [e.g. can search by person, word, phrase and URL]. The first ten hits are current most sought results and thereafter the best matches are listed. Comments : It begins to rival AltaVista in database size, but can be easier to use. INFOSEEK Home Page Address: http://www.infoseek.com/ Help Page Address: http://www.go.com/Help/ Search Method: Primarily keyword with subject search option found under topics. Also provides an extensive list of Popular Sites accessible from the Home Page. Database: Full text. Searches over 50 million Web pages. Operators : Provides detailed instructions on composing query. Features : New site submissions are added immediately. Removes dead lists and duplicate pages from database. Links to other search tools, e.g. Archie, Wais, Jughead, Veronica and Libraries. Comments : One of the fastest and most accurate search tools on the Internet. Caters to the needs of both beginners and advanced users. Does not read frames or support stemming. LOOKSMART Home Page Address: http://www.looksmart.com/ Help Page Address: http://www.looksmart.com/h/info/helpmain.html Search Method: Primarily subject. Also provides a keyword search using its own database of reviewed sites. Automatically extends its search through use of AltaVista if needed. Database: Reviews the keyword sites in its own database. Operators : Not necessary. Features : Has a large very well organized database that is up-dated daily. Utilizes over 250,000 links, organized into 12,500 categories. Comments : Employs an easy-to-use magazine format. MAMMA Home Page Address: http://www.mamma.com/ Help Page Address: http://www.mamma.com/tips.html Search Method: Meta search. Provides advanced search. Also provides long lists of Popular Sites. Database: That of search engines employed Operators : "Simple Search" operators are applicable. Features : Accommodates most syntax and is tolerant of incorrect operators. Has a large listing of magazines that can be found under various subject headings. Also has a large newspaper listing by continent and country. Comments : Conducts parallel searches of 7 major search engines. METACRAWLER Home Page Address: http://www.metacrawler.com/ Help Page Address: http://www.go2net.com/help/faq/ Search Method: Meta search. Provides long lists of Popular Sites. Database: That of search engines employed Operators : "Simple Search" operators are applicable Features : Removes duplicate and invalid URLs Comments : Conducts parallel searches of 7 major search engines. NORTHERN LIGHT Home Page Address: http://www.northernlight.com/ Help Page Address: http://www.northernlight.com/docs/search_help_optimize.html Search Method: Primarily subject with auxiliary keyword Database: Among the largest and frequently updated. Operators : Supports "Simple Search" operators. Features : Provides a Special Collection listing by subject derived from 1800 journals, reviews, books, magazines and news wires. These documents are not readily accessible to other search engine robots. Search is free, and cost to utilize its database is comparatively modest. Comments : A well-organized search tool. Searches the WWW and its Special Collection separately. ONEKEY Home Page Address: http://www.onekey.com/ Help Page Address: http://www.onekey.com/live/smart.ht m#smartdef Search Method: Primarily keyword and now with advanced search. Uses humans to examine every link. Database: Recently reached 175 million WWW pages. Contains 25,000 reviewed sites. Operators: Provides simple options of either "all terms" or "a ny [of the] terms". Features : Has a Best of Net search category that provides an extensive list of general interest topics. It is family friendly with appropriate controls for use by children. Offers subjects of particular interest to children. Comments : Has an extensive topic listing. Claims it is the largest kid-safe search engine on the Web. SAVVYSEARCH Home Page Address: http://www.savvysearch.com/ Help Page Address: Not Available FAQ: http://www.savvysearch.com/faq.html Search Method: Meta search via keyword Database: That of search engines employed Operators : "Simple Search" operators are applicable Features : Searches many special databases such as Usenet, software, academics, and commercial. Comments : Utilizes nine web directories. Links to DejaNews, which provides access to discussion groups. DejaNews is found under Usenet News on the Home Page. SNAP Home Page Address: http://home.snap.com/ Help Page Address: http://home.snap.com/main/help/item/0,11,home-6736,00.html Search Method: Primarily keyword, with an extensive directory listing. Database: One of the largest and most inclusive directory indices. Operators: Employs "Simple Search" operators in addition to pull-down options for constructing a Boolean search. The latter are comprehensive and sophisticated. Features: Provides ways of narrowing a search. Can limit search by date and retrieve references by last date modified. A ‘Power Search’ option is available at http://home.snap.com/search/power/form/0,179,-0,00.html which can limit by language, time, and location. Comments: A leading search engine. Has a very effective search system. YAHOO! Home Page Address: http://www.yahoo.com/ Help Page Address: http://www.yahoo.com/docs/info/help.html FAQ: http://www.yahoo.com/docs/info/faq.html Search Methods: Primarily subject with coordinated keyword option. In keyword searches, selects only sites that contain all search words. If no exact match is found switches automatically to Google. Database: Reviews its own keyword database. Has at least 1 million subject sites listed. Operators : When a search defaults to the Inktomi search tool, both "Simple Search" and "Advanced Search" are applicable. Features : Can search by title [t] and URL [u]. Lists Popular Sites. Comments : It has the largest subject database on the Web. Its headings and links are wellorganized and easy to use. Yahoo is a great place for beginners to originate a search. H. Conducting Searches The skillful use of operators helps define a query accurately, thus greatly improving the chances of a successful search. To facilitate learning, we have divided instructions into "simple" and "advanced" searches. Before starting this segment, you may want to review "Search Operators" in Section B for a presentation on the basics. Simple Searches For both beginners and non-experts, the operators employed in simple searches are sufficient for composing most all queries. Use them in whatever combination that provides the best definition. The examples shown are illustrative and are not necessarily an ideal query. 1. Plus and Minus Use a [+] before a query term to require its presence in the Web document sought. Example: +search +www +tutorial +beginners +non-experts This query gives an enormous number of hits, because each term can be anywhere in the document and is not necessarily related to any other. Nonetheless, because the hits are ranked, the highest-ranking ones should contain all the terms and therefore likely to produce relevant documents. Use [-] similarly to prohibit the use of a term. This technique is particular useful when you wish to exclude irrelevant subject matter. The following query example searches for apples and excludes documents on Apple computers such as MacIntosh. Example: +apple –computer –macintosh 2. Stemming To include variations of a keyword, use the wild card symbol [*] after the stem of the word. This broadens a search to retrieve documents that otherwise would be missed. Example: col* This search includes the words: color, colors, colour, coloring, colorant. Do not use stemming if it introduces too many irrelevant terms. 3. Phrases A phrase is a sequence of words that has a particular meaning and is formed by enclosure within double quotes. A phrase is treated as a single term and is usually searched as such. Examples: "American customs" +"Man of the Year"+"Time Magazine" If a query asks for American customs rather than "American customs" , the responses will be for the words American and customs separately, in addition to the coupled words. This increases the number of irrelevant hits enormously. Use phrases whenever you can appropriately; they are one of the most effective means of sharpening meaning and narrowing a search. Example: +"search the www" +"tutorial for beginners and non-experts" This example is a much more definitive query than the following example: +search +www +tutorial +beginners +non-experts 4. Case Sensitive Capitalization rules apply to proper names as taught under basics. However, it is more definitive to treat a multiple-word name as a phrase, by enclosing it within double quotes. Example: "Gone With The Wind" Advanced Searches Each search tool tends to devise and organize its operators differently. Our advanced search includes both simple and advanced operators, much like that of AltaVista. 1. Boolean Although Boolean operators are somewhat complex, most professionals prefer them, because they can compose more precise queries that way. • AND does not promise any association between terms and thus broadens a search. When unrestricted, it can produce an enormous number of hits. [There can be a complication when query terms have no operators between the terms. Some search engines assume AND as the default between the terms, while others assume NEAR. Therefore, it is more exact to use a [+] before each term rather than leave a space.] • NEAR generally indicates that the query terms it connects are within about two to twenty-five words of each other, depending on the search engine. This makes it more likely that there is an association between the terms, thereby helping to narrow the search. • OR broadens a search and is best used in a phrase to designate synonyms. Example: "house OR home OR dwelling" Synonyms significantly improve the odds of finding documents that you want. The more synonyms you use, the more you weight their importance. When needed, use a dictionary or Thesaurus to find useful synonyms. • NOT excludes even a single use of the term in the document. It is most suitably employed to reduce a large number of irrelevant hits when other measures have failed. Example: "canine NOT dog* " 2. Parentheses Enclose phrases within parentheses [nesting] to further narrow a search, especially when unlike operators are used in the query. Example: search +["tutorial OR guide"] +["beginners And non-experts"] In the search process, phrases are searched before the other terms in the query, which narrows the search area for the non-phrase terms 3. Fields There are many fields, but the two you are most likely to find useful are Title and URL. When you think a term is likely to be in a particular field, use the term in that query. The field symbol that precedes the query may differ among search engines. For example, it can be title or t, and url or u. Examples: title:"search www tutorial " url:generalelectric.com Field choices are usually found in the vicinity of the query box or reached by clicking an appropriate link. 4. Refining Results Most search engines that use Advanced Searches will offer options for refining your query to improve the results. This can be quite helpful in improving your search. The options vary among the search engines and are straightforward to use. Start using advanced operators when you can do so comfortably. Some, you will find, are easy to apply and can be very helpful in improving results when searching for obscure information. Query Composition Guides Despite the differences in the way search engines select, index and retrieve documents, there are common guides that you can use to help compose your query. • Be as specific and complete as you can in selecting your keywords; they are critical to the success of your search. • When possible, employ uncommon or unique terms, for they are less likely to be ignored or filtered. Avoid adjectives and adverbs unless they are part of a phrase; alone they do not convey much meaning. Arrange your terms in a series from the most general to the most specific; it makes for a more effective search. • • When using the same query for multiple searches, employ suitable, advanced operators. Most search engines will support their use to advantage, although some may ignore them or apply them differently. • When you have located a good site about your topic, see whether it has links to other sites. Sometimes an important document is found this way. The site may also contain keywords that can improve your query. Use the Refine drop-down lists when offered. They are a simple way to narrow your search. • • Avoid misspellings, redundant terms and complicated query structure. Search Problems and Remedies Even when your query is well defined, there are times when a search engine will return totally irrelevant responses. The following explains some of the causes and suggests remedies you can try. 1. Your query terms do not have a counterpart in the search engine’s index. Cause: You may have insufficient understanding of the search engines composing criteria. Remedy: Study the engine’s help section and recompose your query accordingly. 2. The search engine has failed to index significant keywords while spidering the Internet. Cause: The search engine employs abbreviated rather than full-word spidering in creating and maintaining its database. Therefore, it can miss important keywords due to their infrequent use or unfavorable location in the document. Remedy: Use a search engine that uses "full word" spidering. such as AltaVista, HotBot, Excite and Infoseek. 3. The search engine filters out or ignores important keywords used in your query! This corrupts the meaning of the query resulting in totally irrelevant results. Cause: Search engines with large databases ignore or filter quite a few commonly used words, because of the enormous amount of processing they require. The problem arises when a query keyword you use is also the search engine’s designated common word [e.g. Internet, computer and www]. Remedies: Use a search engine with a moderate-sized database such as Infoseek, Excite or Snap For example, in searching for this tutorial, the query words are logically < +search +Internet +tutorial +beginners>. In a test, the hits that HotBot and Northern Light gave for this query were totally irrelevant, and those from AltaVista were meager. Using the same query, Infoseek with its smaller and better-indexed database returned this Tutorial as its first hit. Another approach is to use a subject search tool having a large database. Yahoo is recommended, because of its very large subject index and the keyword search option it provides. At times, despite all the skills you can apply, you may still not be able to find the document you want. Although the information indexed in the WWW is enormous, it is not necessarily complete, up-to-date or reasonably accessible. Search tools are addressing the problem, including that created by the recent unprecedented growth of Web pages. But despite its less than perfect performance, the Internet remains a remarkable source of information. I. Home Page Contents The start of a search begins on the Home Page of the Search Tool and is accessible by its address or URL. Home pages vary greatly in their content, layout and looks; they can range from a tasteful, simple listing to a garish and complex array in various formats, graphics, colors and motions. The trend had been toward more easily read contents, but increased advertising is now reversing this trend for many of the search tools. The beginner can utilize a search tool more effectively by first knowing what to expect and then charting a suitable course to identify and locate the information that is sought. The following lists and briefly describes the basic components of search tool Home Pages in the usual order of their use. 1. The Location or Address Box is used for the query in a keyword search. It is normally found at or close to the top of the Home Page. 2. Options or preferences are used for narrowing the keyword search and for reporting the results. Options are normally found under the location box. At times, however, they are under a heading that links to a listing on another Web page. 3. Subject Listing is where you originate a directory search. The process takes you through a series of sub-subjects along a search path. For coordinated searches, there are keyword search options offered at stops along the way. 4. Popular Sites is our designation for frequently used subjects and services that are situated on the Home Page. This search category has grown enormously in the past several years and for an increasing number of search tools dominates the Home Page. 5. Help and FAQs furnish links to instructions and information. Help usually provides guidelines for composing a query, and FAQs run the gamut from help to general information. 6. Advertising, Promotion and/or News are found to varying degrees, and in widely differing styles, formats and colors. Your search approach will depend on the search method you choose to use, namely subject, keyword or popular sites. These categories have been covered in the body of the Tutorial, except for Popular Sites, which is described in the following segment. Popular Sites We designated this search category to encompass the many subjects and services directly accessible from the Home Page of a browser, search tool or Internet Service Provider. Popular Sites serve as links or shortcuts to often-sought information and services. Their use by search tools vary greatly, ranging from none to substantial. Each search tool that employs Popular Sites has its own listing, sometimes designated as Channels. The Site titles are found as a listing on the left-hand side of the Home Page, as opposed to a center listing for a directory. But, they may appear anywhere. We have organized the more sought after Sites into the three categories shown in Table 3. Table 3 Popular Sites Personal Use General Interest Services Classifieds Finance E-Mail Directions and Maps Health Purchases * Employment News Groups o Yellow Pages People and Organizations x Sports Weather Stocks Table Symbols: [x] Home and Business Addresses Telephone Numbers, E-Mail Addresses[o] Usene t, Chat Rooms [*] Books, Cars, Travel At present, most providers of Popular Sites use professional services to furnish their sites, either wholly or in part. Thus, more than one search tool can employ the same service. While less convenient, Popular Sites also can be found conventionally by conducting a keyword search. The following are our recommendations for some of the current best Popular Sites: Table 4 Picks and Choices Popular Site 1. Cars URL CarPoint http://carpoint.msn.com Edmunds http://www.edmunds.com 2. People Switchboard http://www.switchboard.com 3. Maps MapQuest http://www.mapquest.com Yahoo! Maps http://maps.yahoo.com 4. Stocks Daily Stocks http://www.dailystocks.com/ 5. Tollfree Calls AT&T http://www.tollfree.att.net/index.html Internet 800 http://inter800.com/search.htm 6. Travels MS Expedia http://expedia.msn.com/daily/home/default.hts 7. Weather Accuweather http://www.accuweather.com Washington Post Weatherpost http://www.weatherpost.com 8. Yellow Pages WhoWhere http://www.whowhere.com Best Sites on the Web There are many useful though less traveled sites on the Internet. HotBot and Lycos list some of the best, and are worth exploring. They organize their listings by category with subject titles under each category. USA Today Cybertimes section also reviews web sites. For a strong subject specific approach, the Scout Report reviews many educational sites. Their URL’s are: • • • • HotBot at http://www.100hot.com/ Provides 10 categories from which 100 of the sites can be accessed Lycos at http://point.lycos.com/categoriesLists 15 categories containing up to 25 subjects each. USA Today Cybertimes http://www.usatoday.com/life/cyber/ch.htm Scout Report http://scout.cs.wisc.edu/report/sr/current /index.html Portals There is a class of Web sites referred to as portals, because they serve as entryways to the Internet. The early portals were mainly browsers and Internet Service Providers. Recently, some search tools have promoted their use as portals, mostly for the advertising revenues they bring. And to enhance their Web sites, they have added a wide selection of Popular Sites. Portals are characterized by the attractive and convenient features that have made America on Line so immensely popular and successful. Yahoo, Excite, Lycos and Infoseek are among the more prominent search tool portals. Not surprising, there is keen competition among all Internet entryways serving as portals. There is one sad note about a recent trend among Search Tools, particularly those that have sought to serve as portals. Some have become garish and cluttered, which detracts from their use for searches. Among the poorer now are AltaVista, Excite and Infoseek. Among the best for ease of use are Hotbot, Northern Light and Yahoo. Hopefully, this situation will improve when sanity returns, as usually happens in such situations. J. Glossary of Search Terms This glossary contains terms used both in this work and other articles applicable to searching the WWW and use of the Internet. For ease of use by the beginner, the definitions are brief and in simple language. Bookmark -A page on the Netscape Browser that lists URLs or Web addresses. Bookmarks serve as links for easy access to Web addresses. MS Explorer’s equivalent is called Favorite Places. To bookmark a Web page on your screen, click Bookmark on the bar, and when it is displayed, click Add Bookmark. The link then adds to the bottom of the Bookmark Listing. Favorite Places works similarly. Boolean Search - A keyword search that uses Boolean Operators for obtaining a precise definition of a query. [See Operators Used In Keyword Searches in Sections B and H] Browsing -In the WWW browsing refers to a directory search. In popular use, browsing, or surfing, is casually looking for information on the Internet. Browser - A computer program used to connect to Web sites on the World Wide Web and access information. Concept Search - A search that utilizes a term’s implied or broader meaning, rather than its literal one. Data - Information such as text, numbers, images and sound contained in a form that can be processed on a computer. Database - Stored information at a Search Tool’s Web site. For search engines, a robot is used to keep the database current by an automated procedure called spidering. For directories, the database is kept current through reviews conducted by qualified people. Directory Search - A hierarchical search that starts with a general heading and proceeds through increasingly more specific headings or subjects. It provides a means of focusing more closely on the object of the search. It is also referred to as subject search, directory guide or directory tree. False Drops - Documents that are retrieved but are not relevant to the user’s interest. Fields - Components of a Web page such as a title, URL, domain, host, link, text and images that are used by some search engines to help narrow a search. Full-Text Indexing -A database index or catalogue that includes all terms and URLs. In practice, each search tool uses a filter to remove words it considers unnecessary. Hierarchical - A ranking of subjects or things from the most general to the most specific. Hits - A list of links or references to documents that are returned in response to a query, also called matches or matching queries. Home Page - The first page that appears on your screen when you access a Web site. Hypertext Link - A highlighted word or image [shown in color] on a Web page that when clicked connects or links to another location with related information. [Links provide an easy way to move about the Internet.] Index or catalog - A file that designates the location of specific data in a search engine’s database. Internet - The Internet, with a large I, refers to a worldwide system of linked computer networks that serve as a communication system. When used with a small i, a term used to mean a group of interconnected local networks. Keyword - A term that a computer can recognize and use as the basis for executing a search. Keyword Search - A search that utilizes meaningful terms to define a user’s interest. Link - More accurately hypertext link. It is a connection between two Web pages or sites that have related information. For example, highlighted data such as text and graphics at one Web site when clicked provide related information residing at another Web site. Location Box, Also Address Box -A designated place within a browser for an address [URL]. It is the starting point for accessing a Web site. Multi-Engine Search or Meta Search- A search that uses a number of search engines in parallel to provide a response to a query. Operator - A rule or a specific instruction used in composing a query. Phrase Search -A search that uses a string of adjacent, related words enclosed in quote marks as the query. Popular Items - A search category created to cover frequently sought subjects and services. Search tools list Popular Items on their Home Page. Precision - A standard measure of information retrieval, defined as the number of relevant documents obtained divided by the total number of documents retrieved. Proximity - Proximity is how close query terms are to each other within a document. In this context, adjacency or phrase usually means that words must appear exactly in the order specified with no intervening words. Query - A search request. A combination of words and symbols that defines the information that the user is seeking. Queries are used to direct search tools to appropriate Web sites to obtain information. Query By Example -Use of an examp le to solicit more like information. Ranking - A means of listing hits in the order of their relevancy. It is usually determined by a selection of the number, location and frequency of the term in the document being searched. Relevance -The usefulness of a response to a query. Most search engines rank their hits from the best match to the query to the poorest. Robot - The software for indexing and updating Web sites. It operates by scanning documents on the Internet via a network of links. A robot is also known as a spider, crawler and indexer. Search Box -A place within a search engine’s Web site to enter a query. It is also called a location box and address box. Search Engine - A computer program that locates information in its database. A search engine functions as a service that searches for information on the Internet. It responds by matching your query terms to the search engine’s index terms in its database, ranking the matches and returning the hits to you.. Search Tool - A computer program that conducts searches on the World Wide Web. Site - The location of a Web page on the Internet. In WWW, it is called a Web site and identified by its URL. Spider – The software that scans documents on the Internet and adds them to the search engine’s database. A spider is the same as a robot. To spider is the process of scanning Web sites to add new pages and to update existing ones. Stemming - The use of a stem [i.e. root] of a word to search words that are derived from it. For example, "child" would retrieve information on child, children, childhood, childless and so on. Term - A single word or an association of words used in a query. Truncation - See Stemming. Uniform Resource Locator [URL] -Uniform Resource Locator is the Internet designation for a Web address. Web Page - The address of a Web site. It can also refer to a page within a Web site. When Web pages are part of the same document, they are also collectively known as a Web site. Web Site - In search use, it is a specific address or URL on the WWW. In function, it is a computer system that is set up to distribute documents stored in its database. Web sites range in size from as little as one page to a vast number of pages, such as those of a search engine’s database or a full textbook. Wild Card - In a query, a symbol that replaces a portion of a word to indicate that other word constructions are applicable. World Wide Web [WWW] or the Web - A global computer communication system that uses the Internet to transmit data [i.e. text, numbers, images and sound ] [email protected]