Internet Search Engines


Introduction

There are three main types of Internet Searching services.  Specialty services that concentrate on a particular field (eg the various about.com sites), categorised services (such as Yahoo) and generic search engines (such as Google.com).  Increasingly there is a blend between the categorised and generic search engines in a competitive effort to give you the best of both worlds.

While we all probably develop a favorite search site, if you're really looking for specific hard to find information, it is not enough to use only one search engine.  A recent study (Aug 00) suggests that the internet comprises about 550 billion pages of information (no-one really exactly knows for sure, of course!), and by comparison, Google currently has "only" 1 billion pages indexed in its search engine, and AltaVista a "mere" 250 million.  This means that any search engine is unlikely to give you a complete and comprehensive set of results because none of the search engines "know" about even 1% of the complete internet.  The good and bad news is that the 15,000 matches that you get back from your search request - while an overwhelming number to review - is still perhaps only 0.2% of the total number of matches that potentially are out there.

A Worked Example

The main difference between the different search engines is the way in which they decide which sites to show to you first.  There are several different ways that they can decide which is most likely to be the most useful site for you, and whole books have been written on the increasingly complicated and sophisticated techniques used both by websites to try and encourage search engines to give them a prominent listing, and used by the search engines to try and work past the "trickery" on some websites and to make an honest accurate estimate of how helpful each page may be.  With the billions of web pages out there, the days of having a real live person check every page and make a subjective decision are long gone!

Here is an example that gives a clear illustration of the differences in three of the better search engines.  We entered the phrase "russian stamps" into google, altavista, and excite.  Let's see, in the table below, the five sites returned by each search engine as their recommended best choices.  I give a subjective rating for the appropriateness of each of their suggestions, on a 0 (=bad choice) to 5 (= excellent choice) scale.

 

Google

41,600 matches Our site is #2 :)  **
Rank and Relevance Score URL Comment
1 = 5

www.zarealye.ru/english/stamps.html

A good site with lots of Russian philatelic content, also number one on Excite

2 = 5

www.rossia.com/stamps/stamps.htm

Excellent choice!

3 = 1

members.xoom.com/russtamps/

Just a blank page!

4 = 3

members.xoom.com/russtamps/new_issue.htm

A sub part of the previous site - a waste of listing space but at least it isn't another blank page and enables one to get into their site

5 = 3

members.aol.com/~salem72/russia.htm

A very limited content site offering overpriced stamps for sale and nothing else

** :  Since writing this, Google has reindexed itself.  It has grown to 52,300 matches, and our site is now, ahem, Number One!!!  :)

AltaVista

1,527 matches Our site is #82
Rank and Relevance Score URL Comment
1 = 1

www.gg-s.com/russ/russia1996set.htm

A nonspecialist philatelic site with limited Russian content and basically only stamps for sale, this is a subpage within their site not the main page

2 = 0

www.westworld.com/~ben-levy/stamps.html

A single page showing three dog themed stamps, nothing else.

3 = 0

www.sovietski.com/

A mailorder company that sells a wide range of ex-Soviet Union material.  While a fascinating site, there is nothing to do with stamps on their home page at all, and while I have seen them sometimes sell some very common stamps at very high prices, I couldn't find anything on their website today!

4 = 1

www.postage.dk/

A nonspecialist philatelic site that on a different page had three (only three!) Russian stamp sets for sale

5 = 1

msnhomepages.talkcity.com:6010/DownsizeDr/ssemyon/

An almost blank page that claims to be about Russian stamps but which has no content and no links.

 

Excite

unknown matches but more than 500 Our site is not listed in the first 500 sites :(
Rank and Relevance Score URL Comment
1 = 5

http://www.zarealye.ru/english/stamps.html

A good site with lots of Russian philatelic content, also number one on Google

2 = 2

http://www.gg-s.com/

A nonspecialist site that has a limited amount of Russian stamps for sale (and some way overpriced!)

3 = 1

http://www.gg-s.com/russ1300/russ1300m.html

Same site as above, this is one of the Russian related pages and doesn't really deserve to be in the top five site list

4 = 5

http://www.russianstamps.com/

A Russian specialised dealer

5 = 4

http://www.raster.it/stefano/a/hyciywsaatsthopo.htm

An out of date link, but still accesses a fascinating site, if you dig into it you'll find a lot of "mystery" Russian stamps

Analysis of Above Example

Using my subjective analysis, Google scores 17/25, AltaVista gets a miserable 3/25 and Excite also earns 17/25.  If one was to also add an "overall look and feel" score and broader analysis to results beyond the first five, Google keeps returning excellent sites for some time, whereas Excite doesn't provide as much information and rapidly starts offering rubber stamp sites and all sorts of other worse than useless suggestions.  Of course, the fact that Excite doesn't seem to list this site here hardly helps, either!

Google is my clear favorite by a wide margin.

As another indicator, the Rossica Society web site did not appear in Google's top 100 sites (but they do have it listed - if you do a search specifically for "Rossica" it comes up as number 3 on a list of 1290 sites to do with the word Rossica).  It comes in at number 31 on AltaVista.  An obsolete link to their old web site appeared as number 33 on Excite.

The above examples clearly show that different search engines have very different opinions on what are useful sites. None of the sites they each suggested even consistently appeared anywhere in the top five of all different search engines.

While you can rely on Google for most basic searching, if you're really wanting to do a comprehensive search for hard to find information, then there are three things you need to do - refine your search words, use multiple search engines, and surf out from sites.

How to Improve Your Searching Success

1.  Refine Your Search Words

When I am searching, I will try a mix of both broad searches and also tightly focused searches.

Now for a very important thing - different search engines work differently when you put in multiple words.  Some work on a default what is called an "and" basis where it will search for pages that contain ALL of the search words you enter - with these types of web sites, the more words you type in to search for, the more limited will be the number of pages you get back.  Other web sites work on a default of what is called an "or" basis - where it will search for pages that contain ANY of the search words you enter - with these types of web sites, the more words you type in to search for, the greater will be the number of pages you get back!

Google uses a default "and" basis, but AltaVista uses a default "or" basis.  I think that Excite also uses an "and" basis.  Make sure you know how your favorite search engine works (the differences in search engines are one reason why it is a good idea to get a "favorite" and then use it as your first choice all the time).

You can override the settings in a search engine and improve the helpfulness and accuracy of the pages it suggests to you in two easy ways.  First, you can specify which words absolutely must be in the pages you want, and you can also specify which words you do not want included (for example, you might want to exclude the word "rubber" so as not to get web pages about rubber stamps instead of postage stamps).

The second technique is to specify not only words but also phrases that must be included (or excluded) from the pages you want to find.

To require a word that must be featured on the returned pages, you add a plus symbol before it.  To require a word that must be excluded from all returned pages, you add a minus symbol.  To specify a phrase, you put quotes around it (note that some of the search engines are clever enough to automatically detect phrases by themselves!).

For example, +"russian stamps" -rubber will find all pages that have the phrase "russian stamps" and will exclude all pages that have the word rubber.

An example of a broad search is when you enter just some very generic words or phrases (like the "russian stamps" example above, returning up to 41,600 results).  If the broad search doesn't give me what I'm looking for in the first ten or twenty suggested pages, and if it returns a huge number of possible pages, then I'll try a more focused search.  Here are some examples of some groups of search words and the hits they return

Search Words

Google Hits

Comment

russian stamps

41,600

A vast number that needs to be reduced

russian philately

3,740

A great improvement - probably we've now eliminated all the "rubber stamp" pages!

russian stamps philately

1,410

Still a lot but starting to become a more reasonable sized listing 

russian stamps philately zemstvos

108

Even better still - a manageable quantity and plainly web pages that have all these words on them will be very relevant - clearly the word zemstvo is a key limiter on the number of pages we get back, getting only the more specialised and comprehensive web sites

russian stamps philately zemstvos cancel

11

Adding the word cancel brings us down to only 11 pages - maybe we've become too specific now.

The trick is to try and think of words or phrases that will be used on the pages that you most want, and which are not very common on other pages.  But, be careful - as you can see in the last line of the table above, you can get too specific and if you do that, then you start eliminating not only low value pages but also high value pages too.

2.  Use Multiple Search Engines

As the above example clearly showed, if you were using Excite to find websites about Russian Philately, you'd never find our site here, and we modestly believe this to be one of the better sites that there are on the subject!  And if you used our favorite (Google) with a simple search, you'd probably never stumble across the Rossica web site unless you went really deep into searching.  Rather than searching through the top twenty sites from one search engine, you are probably better advised to search through the top ten sites from each of two different search engines.

You also might like to try both the categorised search sites (eg Yahoo) as well as the search engine sites.  Note also that Yahoo used Google as their search engine provider, so if you do a "search" on Yahoo, it uses the Google data.

3.  Surf out from Sites

The third of the "big three" techniques is to go to the link pages on the sites recommended by the search engines.  Once you have found a website that is sort of on the topic that you are interested in, go to its links page and see what other sites the real live humans that have created this web site are suggesting for you to continue your surfing.  Many times you'll come across "hidden treasures" after you have done some surfing from site to site looking for related and similar sites.

A Trick for Faster Easier Searching

Think of searching like a tree with many branches and sub-branches all leading from a central trunk.  Now, for the trick.  When you are moving from a "branch" to a "sub-branch", open a new browser window to go down that sub-branch.  That way, when you have exhausted your searching down that sub-branch, you can simply close the browser window and you are then back to your earlier browser window at the point you were before you detoured down the sub-branch.

And, if you find an interesting site, leave it open in a window, and open a new window to continue searching on from there.  Makes it easy to find the site again an hour later!

(To open a link in a new window using either Internet Explorer or Netscape Navigator, don't click on it as you normally would, but instead click using the right side button, then select the option "Open in New Window".)

Even Faster Searching

Okay, I'm going to state the obvious here, but it is worth repeating.  If you're still connecting to the internet via a 28.8 or even a 56k modem, then you're stuck in the slow lane of life.  The internet "comes alive" and becomes amazingly more responsive and interactive once you speed up to 128k or faster, using a cable modem or DSL.

For complicated reasons, most modems do not connect at their full rated speed, and for more complicated reasons, even those speeds aren't as fast as similar speeds are with cable or DSL, so you really get a tremendous difference in performance.

Note though that there is a sort of a "natural speed limit" - for most people, there is little point in getting a connection much faster than perhaps 256k, because generally the Internet structure as a whole can't often feed data to you much faster than that.

The cost of one of these fast connection speeds isn't all that much, and if you spend a lot of time on the internet, it really is a small "luxury" that you'll appreciate again and again and again.  Go ahead and treat yourself!

Suggested General Search Sites

Google - The current champion search engine, very fast, and almost magically accurate at bringing up sensible suggestions

Yahoo - Although the logic of how they create their indexed/catalogued links is sometimes hard to follow, this remains the best cataloging site, even though the number of sites they have catalogued is not very high.

Excite -  Did well in the simple example above, but lacks features that Google offers.

AltaVista - Did astonishingly poorly in the simple example above.  Has very powerful advanced searching features for expert searchers, however.

Hotbot - Associated somehow with the Lycos site.  Another alternative perhaps worth trying.