Title List Changes

Outside U.S. and Canada

Customer Center

Product Center

Free Resources

Support.Gale.com

Reference Reviews

Péter's Digital Reference Shelf

March 2007


Title: Science.gov
Publisher: Deep Web Technologies
URL: http://www.science.gov
Cost: free
Tested: March 12-20, 2007

The Context

The U.S. government has produced many valuable open access full-text and abstracting/indexing databases and developed excellent hosting systems. Beyond the very well known and widely used databases of the National Institutes of Health hosted on the excellent Entrez system, these include many other databases and services: the ERIC database and service (with more than 100,000 full text documents), the useful Energy Citations and even more useful Information Bridge full text databases of the DOE, the excellent Transportation Research Information Services, TRIS Online, the outstanding NCJRS (National Criminal Justice Reference Service) with ever growing full text coverage and very smart software, the not so outstanding but important NTIS database with subset of its indexing/abstracting records. There are many other open access government databases which include full text scientific documents and/or indexing/abstracting records.

Many of these documents also are available through the USASearch.gov site, which offers search result clustering on the pro side, but uses the Microsoft Live search engine which does not have as good options as Explorit has. This limited software capability is also true for the Uncle Sam site of Google, which does not offer clustering as the USASearch.gov search engine does.

The Content

Science.gov claims to cover 50 million Web pages which is a reasonable number, as PubMed alone has more than 16 million records, and some of the other sources covered are also in the mega category — such as the US Patent & Trademark Office, the NASA Astrophysics Data System (ADS), or the Energy Citations Database.

In addition to the databases of the government agencies, Science.gov also searches Web sites which are identified as relevant and important sources by the 12 agencies, also known as the partners of the Science.gov Alliance.

A quick search on the word tsunami anywhere in the records brings up information about 975 scientific or technical documents. Restricting the search to the title field yields 811 hits. Without Science.gov it would take searching more than 20 databases to gather that same result set. The user may not even know in which database to start the search, unless there is a pre-determined specific angle in her mind, such as the health consequences of tsunami.

More importantly, chances are good that the user with a science angle would not jump from the USGS database having already found 57 hits, to the USGS Publications Warehouse (17) to STINet (62), to NASA/ADS (200+), to Energy Citations (22), to DefenseLink (72), and to NTIS (15) to anme just a few from which items were retrieved through Science.gov.

Even the user with an educational angle is unlikely to run a search in the National Science Digital Library which has 93 high quality teaching materials with tsunami in the title, after searching the obvious ERIC database which brings up 18 records. Even if the user is willing to do such a hunt-and gather series of search, it would be cumbersome because most of the databases have different interfaces and query styles. For that reason , the importance of Science.gov cannot be over-emphasized.

At the same time, it must be mentioned that Science.gov limits the number of retrieved records to 200 per database, and this limit was exceeded even for this rather restrictive title-only search in three databases. (Oddly, this 200-item limit does not apply all the time, as I discuss below).

For this often (but not always) imposed limit, the total number of hits reported by Science.gov for a query may be significantly less than the sum of the hits of the searches in the individual databases would be.

This is especially true when the search is not limited to the title field. For example, the native search in the Energy Citations database alone has 2,845 hits when searching for the exact phrase "oil sands" anywhere in the record, and the singular format "oil sand" yields 1,376 hits. But Science.gov brings back from the Energy Citations database only 332 items for the plural format "oil sands" (here not imposing the 200 item limit), and finds only 169 hits for the singular form "oil sand", i.e. not even reaching the 200 item limit.

In spite of this oddity and shortcoming, Science.gov is still a much better tool for searching science and technology-related documents than Google's special U.S. Government search or the Microsoft-powered USASearch.gov.

Before you applaud that Google's special U.S. Government search engine reports (makes you believe is a better term) that it found 12,200 hits for the "oil sand" query, you must realize that the hit numbers reported by any special edition of Google do not have much to do with what was really found.

When asked to show the money, Google can come up only with 305 unique hits - from the much-reduced new total of 409 which is a very far cry from the original estimate of 12,200.

The first reported hit count is always grossly inflated in any Google service, way beyond what duplicates could explain for many instances of the same item at different levels of the source site's hierarchy. When you encourage Google to show whatever it found, it stops at the 1,000th item. It could not get away with this in reporting expenses in its financial reports and when asked to show evidence it could do so only for a fraction of the claimed expenses. In addition, Google searches all government sites, and does not limit itself to publications related to research and development, so its real or quasi-real hit numbers are not really comparable with those reported by Science.gov.

To a lesser extent, the same can be said about Microsoft Live Search in general, and also specifically as the search engine of USASearch.gov. For example, it reports to have found for the "oil sand" query 336 hits, and delivers 86. As can be seen in the cluster it has merely 6 hits from the Department of Energy (DOE). It is the producer of the Energy Citations and the Information Bridge databases which have 1,376 and 146 hits, respectively, for the "oil sand" query using their native search mode. Science.gov retrieves 200 items from the Energy Citations database for the same query (the 200 item limit kicks in here) , and 159 full-text items from Information Bridge. For fairness, Energy Citations has the bibliographic records for the full text items in Information Bridge.

It must be noted also that Science.gov often grossly under-reports the number of hits. The best way to know the total number of hits is to choose the Source option when viewing the items retrieved.

The Software

Explorit version 4.0 has the usual Boolean operators and offers limiting the search to the title and author field beyond searching the complete records. The assumed Boolean operator is AND for a space. Exact phrase searching is triggered by using the usual double quote pair such as "oil sands" . The search can be also restricted to publication year or range, and to one or more of the specific agencies and/or databases (when the agency has more than for Science.gov) from an expandable database menu. It claims to search 30 databases, but I could find only 28.

You can use single and multiple character truncation symbols which is a rarity in its league. It can come in handy, but sometimes I found strange results. For example, the query stem cell in the title finds 1,058 hits. Using stem cell* (for unlimited truncation) produces fewer hits, 501. There is no automatic pluralization as stem cells retrieves 1,192 hits.

It is troubling that when you repeat the same query a minute later you don't always get the same results. It is not because the database was just updated in that minute with records that had your query term in the title. On the contrary, sometimes a minute later the result list had fewer hits. The search is pretty fast even though it is broadcast at the time of the query. Searching the 28 (or 30) databases does not take more than a minute and you see a progress bar, and an indication of how many databases were searched and how many hits were retrieved. For unknown reason the search is done in two phases, and you need to click a Yes button to add the hits retrieved in the second round.

Results are automatically sorted by rank, but they also can be sorted by author, date and source – a very useful feature. Items can be marked to retain (until the session ends) in order to create a subset of the result list. This is a common feature of the subscription-based system but not in Google, Yahoo, MSN, or Ask. It would help the users to see at least part of the abstract (say, the first 30 words) in the quick result list to make the marking of records most pertinent to the user more efficient. This is done in some of the databases but not across the board.

The glitches and oddities in the search features that I mentioned above must be fixed. I also would like to see the addition of more government databases, such as NCJRS from the Department of Justice, which has many scientific documents, especially in forensic science, psychiatry and pharmacology related fields because of the drug abuse pandemic. The Department of Transportation is just one of the contributors to the TRIS Online database, but – together with some other databases – it certainly would deserve to be included in Science.gov to broaden the scope of this worthy service.

— Péter Jacsó

Careers at Cengage   |   Contact Cengage Cengage Learning     —     Gale   |   Course Technology   |   Delmar   |   Academic   |   Nelson
Privacy Statement   |   Terms of Use   |   Copyright Notice