
Title: SpringerLink
Publisher: Springer Verlag
URL: http://www.springerlink.com
Cost: Bibliographic information, abstracts and citedness score from CrossRef journals are free for anyone
Tested: July-August, 2007
Most of the publishers of academic journals have engaged in digitizing their collections. Some did it on their own, others rely on digital facilitators like HighWire Press, Metapress, Atypon, Ingenta, or for newspapers and professional journals, ProQuest (through its PQ Archive service).
Those who do the digitization on their own rarely have the expertise to do it right, let alone very well, even if they are big players with ample financial resources. Take Wiley Interscience, which has become —after the acquisition of Blackwell earlier this year— the third largest publisher of scientific, technical and medical (STM) journals and books. Wiley’s digital collection has serious deficiencies both in design and implementation. Its retrospective coverage is very incomplete. It adds insult to injury that it still has only four of the 41 volumes of the top ranking Annual Review of Information Science Technology.
As I discussed and illustrated a few years ago, the once second largest STM publisher, Kluwer, had a deeply disappointing implementation of the top-end Verity software, and a minimalist content coverage (considering its assets). Luckily, Springer included in its large-scale retrospective digitization project the Kluwer journals after the acquisition. Springer used the same software to implement in-house its very good digital library before switching over to the Metapress platform.
It is clear that the problem is not with the software but with the systems analysts and programmers. It is a sellers’ market, and apparently even some of the most profitable publishers can afford only mediocre computer specialists who may have gotten their license or bachelor degree in one of those diploma-mill distance education courses where convenience is of prime importance rather than doing actual programming projects.
There are STM publishers —small and large— that do a very good or excellent job digitizing their assets on their own or through digital facilitators. The former include the largest STM publisher, Elsevier (ScienceDirect, which corrected the weak point I criticized a year ago in this column, and now includes citedness scores for free from Scopus), Blackwell (Synergy), the Association of Computing Machinery (ACM Digital Library), one of the very few which shows the cited references and some other information (beyond the bibliographic data and abstracts) for free. The latter includes Oxford University Press and several other clients of HighWire Press, who are willing to pay for bringing the best out of their digital collections by HighWire Press, which benefits every user (such as the display of citedness score from Web of Science of articles published in some of the journals hosted by HighWire Press) even if only subscribers can actually see the lists for the citing references. The digital collection of Annual Reviews, Inc. that I also wrote about a year ago has an excellent implementation by Atypon.
The smaller publishers who go on their own often do poorly with digitization. Just look at the painfully inconvenient and irritating design of the digital archive of Haworth Press which publishes quite a number of library and information science journals, but obviously hasn’t learned the best practices from the good articles.
As I write this column, there is a vivid discussion and discontent about the rather arrogant and absurd pricing of the digital versions of some of the journals by SLACK, Inc. For example, it charges $319 for the print edition of the Journal of Nursing Education, and ten times as much for the digital version for a college library such as the one of University of North Carolina at Charlotte. Pricing is based on the number of FTE students, so your quote may be lower or higher.
Note that this journal is at the bottom of the ladder in the Nursing category of the most recent edition of Journal Citation Reports, but as much importantly, it is also at the bottom in terms of retrospective digitization. It offers digital access for about 600 papers published in this journal, covering merely the current and previous five years of the 46 volumes of the Journal of Nursing Education. It uses a very simple-minded software for searching the paltry collections. It may not be the smartest idea to spend your budget on such a modest digital collection. Now enter SpringerLink as a good model.
SpringerLink has close to 3.5 million, fully searchable, digital documents. Fully searchable is a very important distinguishing criteria from the many other huge digital collections of publishers which offer only searching in the bibliographic elements, not the full text of documents, such as the digital collection of the American Institute of Physics.
SpringerLink’s database is only about half the size of Elsevier’s Science Direct, but more than twice as large as third-largest publisher Wiley-Blackwell’s two digital collections (Synergy and InterScience) combined.
The second placement in terms of size among commercial publishers is understandable as Springer has many fewer serials and monographic publications than Elsevier. The opening screen of SpringerLink gives an immediate and always current, bird-eye view profile about the digital collection, although it may be somewhat confusing because the numbers don’t exactly add up. Actually, the reference works are counted twice —both on their own and also as part of the book counts.
The topical profile is informative, clearly indicating how many documents there are for the 13 broad subject categories, ranging from Architecture and Design (515), to Medicine (607,413) to Physics and Astronomy (408,539). Apparently, either not all the categories are listed, or 10-12% of the documents have not been assigned a broad category code. Even with this deficiency, it is enlightening to realize that SpringerLink has 142,500 documents in the field of Humanities, Social Sciences and the Law, and 58,000 documents in Behavioral Sciences. Clearly, the disciplinary emphasis is on the disciplines of Biomedical and Life Sciences (21%), Medicine (17.5%), Chemistry and Materials Science (13.6%), and Physics and Astronomy (11.8%).
My test searches indicate that for about 70% of the items there are free abstracts. This is quite remarkable as the time span of SpringerLink goes back to the late 19th century, and its pre-1966 coverage (when abstracts were not typical) is quite substantial, close to half a million items.
The vast majority of documents are journal articles, but there are 17,000 digital books and more than half a million book chapters (including conference papers of proceedings, such as the voluminous Lecture Notes in Computer Science series, which alone contributes more than 180,000 documents).
English-language documents make up 95% of the collection, but there is a significant German-language subset of more than 170,000 documents. This understandable as Springer has been the largest German publisher of STM journals and books. It is less understandable why there are 600 documents without language. A cursory browse shows that most of them are English, the exceptions are mostly book series in German language. The geographic coverage is obviously important, as right on the home page Springer announces the Russian Library of Science subset of 450,000 documents, and the much smaller (22,000 items), but surely growing Chinese Library of Science subset.
The 30,000 items Online First subset is very remarkable, as they can not only alert users what research papers will appear in print in the next few months, but also make available their bibliographic details and abstracts free of charge even for non-subscribers.
What is the best software feature? Arguably, the automatic and instant topical profiling by the software that I mentioned under the content is applied and enhanced throughout the search when query results are shown. Although it is not yet perfect it has great potential for several reasons.
One is that it immediately alerts the searchers that their search term may have very different meaning across disciplines, and more specific search term may be needed. Searching for the term stress in the title (to strongly focus your search) will find 21,217 records. The top ranked hit is a materials science paper, the second ranked one is from the field of life sciences.
The right hand part of the screen shows that the articles are from the various hard and applied science fields (and from the special Russian collection, which should not be listed under the subject categories as it covers a variety of sciences, including 419 papers from the biomedical subcategory which does not appear among the top ten broad categories listed.)
Actually, there are thousands of papers about human stress in the result list, such as the #4 ranked item but they form a relatively small subset of the result set retrieved for the single term: stress.
It would be useful to have a See-More button to check the distribution of the set by topic beyond the top ten categories, and then simply limit the search to Behavioral Sciences category and even further down to one of its subcategory like clinical psychology and many of the other subcategories. The deeper subcategories can be seen when the search term “stress disorder” is entered which automatically limits the search to the human behavioral aspect.
Beyond showing the topical angles of any search result, the software also creates an instant profile of the results by the age, type and language of the documents in the result set. If you scroll down further you can see the ten journals and ten authors that contributed the largest number of hits to the query.
There is no hit count next to the authors, which is probably an oversight. As you can see in the digital collection of Akadémiai Kiadó (which has been publishing journals —among others Scientometrics and Acta Mathematica Hungarica— in partnership with Elsevier, Kluwer and now Springer), the frequency count can be displayed next to the author names.
I find this dynamic profiling very effective in refining the search results as the strategy evolves. Most users don’t feel comfortable to limit the search to publication year, journals, as an a priori commitment —before they would see any result. This solution promotes an interaction, which could be further improved by adding small check-boxes to the items in the profile to allow the users to choose two or more journals, for example at once as a filter.
The other improvement could be the customization of the data elements by the users, For example, if they are not particularly interested in, say, the distribution of the documents by the date they were added to the database, which is listed in the most prominent position, they should be able to remove that cluster.
To see the dynamism of the process just click here then filter your results by clicking on a subject category, or language, or year range. If you enjoyed this come back to read about my comments about other features, of the software.
There are several publisher collections where you can find information about citing papers —but they are typically limited to the ones in journals of the publisher, or in the group of journals which are in the stable of the digital facilitator which hosts the journals of several publishers— as is the case with HighWire Press.
I am particularly pleased with the fact that SpringerLink makes good use of the ever-growing CrossRef database to find documents which cite the ones you are looking at. Here is an example for papers published in journals of Springer as well as other CrossRef members which cite an article. Once again, the list of cited references (not just the citedness score) is available not only for subscribers, but also for plain vanilla users —which deserves my awe.
As I am writing this, CrossRef has 28.3 million Digital Object Identifier (DOI) links which has great potential for getting trustworthy citedness data instead of the often senseless, much deflated citedness scores dispensed by Google Scholar as I illustrated earlier in a presentation, and in this paper about deflated, inflated and phantom citation counts
The record used as a sample in the presentation was removed from Google Scholar in a harried Nixonian move, but the linked presentation of screen captures shows Google Scholar to have claimed in mid-2006 that a paper published just a few months earlier was cited about 1,200 times, mostly by papers published several years earlier.
The examples in the article are still there with different sets of purportedly citing articles —most of them still phantom citations. Apparently, beyond merely removing disturbing evidence from the database, Google Scholar also tries to improve its often brutally false citation matching algorithm— that’s why the citedness score may have decreased and the citing item lists changed since the time I made the screenshots.
It may be a little late after so many papers were published using the inflated and phantom hit counts and citedness counts, and Google Scholar still has a very long way to go to take its claimed numbers seriously. I will illustrate in my upcoming, in-depth review of Google Scholar in this column some of the most obvious absurdities in Google Scholar’s citedness score calculations, which in turn produce equally absurd Hirsch-index lists for researchers, journals and institutions through many of the automatic h-index calculating utilities relying on Google Scholar’s numbers.
Coming back to SpringerLink: there are some deficiencies in the software. For example, there are no sort options, no export options to RefWorks and other post-processing programs. The query form does not have a cell for journal name, but only for ISSN.
I understand this in the excellent federated search software of Serials Solutions because it must handle the endless variations in the abbreviations and punctuations used in journal names in the variety of databases it searches. But in SpringerLink there should be a single standardized name format for each journal defined by the publisher. In addition, looking up the ISSN of journal names is convenient, neither obvious in SpringerLink. There should be a browsable journal name index with ISSN to facilitate the process.
Another disappointment is that a German publisher does not offer an easy way to enter accented characters. With substantial (180,000 item) non-English language materials and with many European authors (who keep the accented characters in their names in spite of the excessive problems it may cause in searching, sorting and citation matching), many articles will not be easy to find, or found at all.
For example Muller for Müller is not accepted by the software. Is this a big problem? It is, because this most common German name without the accent will find only 1,107 records, while those who can figure out enter the name with the accented characters will be rewarded in this case alone with 15,599 documents with the author Müller.
True, some of the common accented letters, such as ö, ü, é and ä can be found, cut and pasted from the Springer alphabet list, but it is not easy to figure out and use, and the set should be extended to the many other accented and special characters in the digital collection of this very European publisher.
The easiest way to solve this problem is to create index entries both with and without accent, or just leave the accent in the display format but not in the index, so muller would retrieve both formats. I don’t want to ruminate over this more here, I hope you get the point why tiny points can be important.
SpringerLink would be a very important scholarly resource by the clout of its journals and books and by the large size of freely and fully searchable digital collection of more than 3.4 million journal articles, books, book chapters. The display and printing of the documents are available only for subscribers, but searching them fast is a great asset. For example, in Ingenta Connect this feature is still not available.
The free bibliographic information and the presence of abstracts for about 70% of the records makes it also an important substitute of and/or complement to some subscription-based indexing/abstracting services. The icing on the cake is the citedness score of primary Springer documents from journals covered by the CrossRef database along with an also free list of the bibliographic data of citing references.