1y ago

31 Views

2 Downloads

1.58 MB

6 Pages

Transcription

Science, 149(3683) : 510-515, July 30, 1965

field. An average of about 15 referencesin each of these 7 new papers will therefore supply about 105 references ,backto the previous 100 papers? which willtherefore be cited an average of a littlemore than once each during the year.Over the long run, and over the entireworld literature, we should find that, onthe average, every scientific paper ever l e is cited about once a year.6-incidenceof CitationsNow, although the total number ofcitations Imust exactely ‘balance the totalnumber of references, the distributionsare very different. It seems that, in anygiven year, about 35 percent of all theexisting papers are not ,cited at all,and another 49 percent are cited onlyonce (n 1) (see Fig. 2). This leavesabout 16 percent of sthe papers to becited an average of about 3.2 timeseach. About 9 percent are cited twice;3 percent, three times; 2 percent, fourtimes; 1 percent, five times; and aremaining1 percent, six times or more.For large yt, the numlber of paperscited ‘appears to decrease as n2v5 oryt3? This is rather more rapid thanthe decrease found for numbers ofreferences in papers, and indeed thenumber of lpaipers receiving many citations *is smaller than the number carrying Iarg\e bibliographies.Thus, only Ipercent of the cited papers are cited asmany as six or more times each in ayear (the average for this top 1 percentis 12 citations), and the maximum likely number of Lcitations to a. paper in ayear ‘is smaller by about an order ofmagnitudethan the maximumlikelynumber bof references in the citingpapers. There is, however, some parallelism in the findings that some 5 percent of aff papers appear to be reviewpapers, with many (25 or more) references, and some 4 percent of all papers appear to be “classics,” cited fouror Lmore times in a year.VVhat has been said of references istrue from year to year; the findingsfor individual cited papers, however,appear to vary from year to year. Apaper not cited in one year may wellFig. 1. Percentages (relative to total number of papers published in 1961) of paperspublished in 1961 which contain various numbers (n) of bibliographic references. Thedata, which represent a large sample, are from Garfield’s 196 1 Index (2).be cited in the next, and one cited oftenin one year ‘may or may not be heavilycited subsequently. Heavy citation appearstooccurinrathercapriciousbursts, but in spite of that I suspecta strong statistical regularity.I wouldconjecture that results to date couldbe explained by the hypotheses that30 JULY1965Fig. 2, Percentages (relative to total number of cited papers) of papers cited variousnumbers (n) of times, for a single year (1961). The data are from Garfield’s 1961Zndex (2), and the points represent four different samples conAated to show theconsistency of the data. Because of the rapid decline in frequency of citation withincrease in IZ, the percentages are plotted on a logarithmic scale.511

every year about 10 percent of allpapers “die,” not to be cited again, andthat for the “live” papers the chanceof being cited at least once in any yearis about 60 percent. This wouEd meanthat the major work of a paper wouldbe finished after 30 years. The processthus reaches a steady state, in whichabout 10 percent of all published papershave never been cited, about 10 percenthave been cited once, about 9 percenttwice, and so on, the percentages slowlydecreasing, so that half of all paperswill be cited eventually five times ormore, and a quarter of all papers, tentimes or more. More work is urgentlyneeded on the problem. of determiningwhether t hereis a probability that themore a paper is cited the more likelyit is to ‘be cited thereafter. It seems tome that further work in this area mightwell lead to the discovery that classicpapers could be rapidly identified, andthat perhaps even the “superclassics”woulld prove so distinctive that theycould be picked automatically bymeans of citation-index-production ‘procedures and published as a single U.X(or World) Journal of Really Impor-Unfortunately, we know little aboutany relationship between the numberof times a paper is #cited and the number of bibliographic references it contains. Since rough preliminary tests indicate thajt, for much-cited papers,there is a fairly standsard pattern ofdistribution of numbers o’f biblbgraphic references, I conjecture that the carrelation, if one exists, is very smalf,Certainly, there is no strong tendencyfor review papers ‘to be cited unusuallyoften Tf my conjecture is valid, it isworth noting that, since 10 percent oftan t Papers,all papers contain no bibliogrXapbicreferences and another, presumably almostindependent, 10 percent of all pa.persare never cited, it follows that thereIn year’is a lower Ibound of -1.percent of allpapers on the number of papers tlhat100 old papers in field91 references n i ,are totally disconnected in a pure citation network and could be found-.only by topical indexing or similar40IO cited50 papersmethods; this is a very small class, andpapersmorecitedprobaibly a most unim:portant one.not cited thanonceThe balance of references and ciin yearuncetations in a single. year indicates onevery important attribute of the net2wwork (see Fig. 3). Although most papersproduced in the year contain a nearaverage number of bibliographic refer*%ences, half of these are references to2sabout half of all the papers that havebeen published in previous years. The2Tother half of the references tie thesenew papers to a quite small group of2yearlier ones, and generate a rather tightpattern of multiple relationships. Thus2each group of new papers is “knitted”3to a small, select part of the existingscientific literature tbut connected rath3er weakly and randomly to a muchgreater part. Since only a small part of4the earlier literature is knitted togetherby the new year’s crop of papers, we6may look upon this small part as a sortof growing tip or epidermal Jayer, anactive research front. I believe it is theexistence of a research front, in thissense, that distinguishes the sciencesfrom the rest of scholarship, a.nd, because of it, I propose that one of themajor ,tasks of statistical analysis is todetermine the mechanism that enables10 miscellaneousscience to cumulate so much faster thanfrom outside fieldnonscience that it produces a literatureFig. 3. Idealized representation of the balance of papers and citations for a givencrisis,“almost closed” field in a single year. It is assumed that the field consists of 1010An analysis of the distribution ofpapers whose numbers have been growing exponentially at the normal rate. If wepublication dates of all -papers cited inassume that each of the seven new papers contains about 13 references to journalpapers and that about 11 percent of these 91 cited papers (or ten papers) are outsidea single year (Fig. 4) sheds furtherthe field, we find that 50 of the old papers are connected by one citation each to thelight on the existence of such a researchnew papers (these links are not shown) and that 40 of the old papers are not citedat all during the year. The seven new papers, then, are linked to ten sf the old ones front. Taking [from Garfield (2)] datafor 1961, the ‘most numerous countby the complex network shown here,512SCIENCE,VOL.149

available, I find that papers pu,blishedin 1961 cite earlier papers at a. ratethat falls of? by a factor of 2 for every13J-year interval measured backwardfrom 3961; this rate of vdecreasemustbe approximately equal to the exponential growth of numbers of paperspublished in that interval. Thus, thec,hance of being cited by a 1961 paperwas almost the same for all paperspublished more than about 15 yearsbefore 1961 the rate of citation presumably being the previously computedaverage rate of one citation per paperper year. It should lbe noted that, astime goes on, there are more and morepapers available to cite each one previously published. Therefore, the chancet,hat any one paper will be cited byany other, later paper decreases exponentially by about a. factor of 2every 13.5 years.For papers less tlhan 15 years old,the rate of citation is considerably greater than this standard value of onecitation ‘per paper per year. The rateincreases stea,dily, from less than twicethis value for papers 15 years old to4 times for those 5 years old; it reachesa maximum of about 6 times the standard value for papers 2% years old, andof Course declines ag,ain for papers sorecent that they zha.venot had time to benoticed.Incidentally, this curve enables oneto see and dissect out the effect of thewartime declines in production of papers. It provisdes an excellent indication, in agreement with manpowerindexes and other literature indexes,t,hat production of papers abegan todrop from expected levels at the beginning of World Wars I and II, declining to a trough of about half thenormal production in 1938 and mid1944, respectively, and then recoveringin a manner strikingly symmetrical withthe decline, attaining the normal rateagain by 1926 and 1950, respectively.Because of this decline, we must nottake dates in the intervals 19 14-25 and3939-50 for comparison with normalyears in determining growth indexes.The c mmed cy �� or more frequent citation,of recent papers relative to earlier ones-is, of course, responsi:bIe for the wellknown phenomenon of papers beingconsidered obsolescent after a decade.30 JULY1965A numerical measure of this factor canbe derived and is particularly useful.Calculation shows that about 70 percent of a11cited papers would accountfor tlhe normal growth curve, whichshows a doubling every 13.5 years,and that about 30 percent would account for the ,hump of the immediacycurve. Hence, we ma-y say that the 70percent represents a random distribution, of citations of all the scientificpapers that have, ever been published,regardless of date, and that the 30percent are highly selective referencesto recent literature; the distribution ofcitations of the recent papers is defined by the shape of the curve, halfof the 30 percent being papers between1 and 6 years old.I am surprised at the extent of thisimmediacy phenomenon and want toindicate its significance. If a11 papersfollowed a standard. pattern with respect to the proportions of early andrecent papers they cite, then it wouldfollow that 30 percent of all referencesin all Ipapers would be to the recent research front. If, instead, the papersr.5186018801900I9201940Fig. 4. Percentages(relative to total number of papers cited in 1961) of all paperscited in 196 1 and published in each of the years 1862 through 1961 [data are fromGarfield’s 1961 In&z (2)]. The curve for the data (solid line) shows dips during worldwars J and II. These dips are analyzed separately at the top of the figure and showremarkably similar reductions to about 50 percent of normal citation in the two cases.For papers published before World War I, the curve is a straight line on this logarithmic pfot, corresponding to a doubling of numbers of citations for every 135yearinterval, If we assume that this represents the rate of growth of the entire literatureover the century covered, it follows that the more recent papers have been cited disproportionately often relative to their number. The deviation of the curve from astraight line is shown at the bottom of the figure and gives some measure of the“immediacy effect .” If, for old papers, we assume a unit rate. uf citation, then wefind that the recent papers are cited at first about six times as much, this factor of 6declining to 3 in about 7 years, and to 2 after about lO# years. Since it is probablethat some of the rise of the original curve above the straight line may be due to{ anincrease in the pace of growth of the literature since World War I, it may be thatthe curve of the actual “immediacy effect” would be somewhat smaller and sharperthan the curve shown here. It is probable, however, that the straight dashed line onthe main plot gives approximately the slope of the initial falloff, which must thereforebe a halving in the number of citations for every 6 years one goes backward from thedate of the citing paper.513

Fig. 5 (top left). Ratios of numbers of1961 citations to numbers of individualcited papers published in each of the years1860 through 1960 [data are from Garfield’s 196 1 Index (2 ) 1. This ratio gives ameasure of the multiplicity of citation andshows that there is a sharp falloff in thismultiplicity with time. One would expectthe measure of multiplicity to be also ameasure of the proportion of availablepapers actually cited. Thus, recent paperscited must constitute a much larger fraction of the total available population thanold papers cited.204u6080IOU120940960180200Fig. 6. Matrix showing the bibliographical references to each other in 200 papersthat constitute the entire field from beginning to end of a peculiarly isolated subjectgroup. The subject investigated was the spurious phenomenon of N-rays, about 1904.The papers are arranged chronologically, and each column of dots represents thereferences given in the paper of the indicated number rank in the series, these references being necessarily to previous papers in the series. The strong vertical linestherefore correspond to review papers. The dashed line indicates the boundary ofa “research front” extending backward in the series about 50 papers behind theciting paper. With the exception of this research front and the review papers, littlebackground noise is indicated in the figure. The tight linkage indicated by the highdensity of dots for the first dozen papers is typical ‘of the beginning of a new field.514cited by, say, half of all papers wereevenly distributedthrough the literature with respect to publicationdate,then it must follow that 60 percentof the papers cited by the other halfwould be recent papers. I suggest, as aromugh guess, that the truth lies somewhere between- that we have here anindicationthat about half the bibliographic references in papers representtight links with rather recent papers,the other half representing a uniformand less tight linkage to all that hasbeen published before.That this is so is demonstrated bythe time distri’bution : much-cited papers are much more recent than lesscited ones. Thus, only 7 percent of thepapers listed in Garfield’s 1961 Index(2) as having been cited four or moretimes in 1961 were published before1953, as compared with 21 percentof all papers cited in 1961. This tendency for the most-cited papers to bealso the most recent may also lbe seenin Fig. 5 (based on Garfield’s data),where the number of citations per paper is shown as a function of the ageof the cited paper.It has come to my attention that R. E.Burton and R. W. Kebler (7) have already conjectured,though on somewhat tenuous evidence, that the periodical literature may be composed oftwo distinct types of literaturewithvery different half-lives, the classic andthe ephemeral parts. This conjectureis now confirmed by the present evidence. It is obviously desirable to explore further the other tentative finding of Burton and Kebler that the halflives, and therefore the relative proportions of classic and ephemeral literature, vary considerablyfrom field tofield : mathematics, geology, and botanybeing strongly classic; chemical , mechanical, and metallurgicalengineeringand physics strongly ephemeral; andchemistry and physiology a much moreeven mixture.SCIENCE,VOL.149

HistoricalExampIesA striking confirmationof the proposed existence of this research fronthas been obtained from a series of historical examples, for which we havebeen able to set up a matrix (Fig. 6).The dolts represent references1 within aset of chronologicallyarranged paperswhich constitute the entire literature ina particular field (the field happens tobe very tight and closed over the: interval under discussion). In such a. matrixthere is high pro;bability of citation ina strip near the diagonal and extendingover the 30 or 40 papers immediatelypreceding each paper in turn. Over therest of the triangularmatrix there ismuch less lcha.nce of citation; this remaining part provides, therefore, a sortof background noise. Thus, in the special circumstance of being able to, isoIate a Yight”subject field, we findthat half tlhe references are to a research front ,of recent. papers and thatthe other half are to papers scattereduniformly through the literature. It alsoappears that after every 30 or 40 papers there is need of a review paperto replace those earlier papers that havebeen lost from. sight behind the research front. Curiously enough, it appears that classical papers, distinguishedby full rows rather than columns, areall cited with about the same. frequency,making a rather symmetricalpatternthat may have some theoreticalsignificance.Two BibliographicNeedsFrom these two different types ofconnections it a,ppears that the citation network shows the existence oftwo different literature practices and oftwo different needs on the part of thescientist. (i) The research front buildson recent work, and the network becomes very tight. To cope with this,30 JULY1965the scientist (particularly,I presume,in physics and molecular biolohgy) needsan alerting service that will keep himposted, probably by citation indexing,on the work of his )peers and colleagues.(ii) The random scattering of Fig. 6corresponds to a drawing upon thetotality of previous work. In a sense,this is the portion of the network thattreats each published item as if it weretruly part of the eternal record of human knowledge. In subject fields thathave been dominated by this secondattitude, the traditionalprocedure hasbeen to systematize the added knowledge from time to time in book form,topic by topic, or to make use of asystem of c,lassification optimisticallyconsidered ‘more or less eternal, as intaxonomy and chemistry. If such classification holds over reasonably long periods, one may have an okbjective meansof reducing the world total of knowledge to fairly small parcels in whichthe items are found to be in one-to-onecorrespondence with some natural order.It seems c,lear that in any classification into research-frontsubjects andtaxonomic subjects there will remain alarge body of literature which is notcompletely the one or the other. Thepresent discussion suggests that mostpapers, through citations, are knit together rather tightly. The total researchfront of science has never, however,been a. single ‘row of knitting. It is, instead, divided by dropped stitches intoquite small segments and strips. Froma study of the citations of journals by,journals 1 come to the conclusion thatmost of these strips correspond to’ thework of, at most, a few hundred menat any one time. Such strips representobjectively defined subjects whose description may vary materially from yearto year but which remain otherwise anintellectual whole. Tf one would workout the nature of such strips, it mightlead to a method for delineating thetopography of current scientific fitera-ture. With such a topographyestablished, one could perhaps indicate theoverlapand relativeimportanceofjournals and, indeed, o*f acountries, authors, or individual papers ‘by the placethey occupied within tche map, and ‘bytheir degree of strategic centralnesswithin a given strip.Journal citations provide the mostreadily available data for a. test of suchmethods. From a preliminaryand veryrough analysis of these data I amtempted to conc3ude. that a very largefraction of the alleged 35,000 journalsnow current must be reckoned as merely a distant background noise, and asvery far from central or strategic in anyof then knitted strilps from which thecloth of science. is woven.Referencesand Notes* E.2.3.4.5.6.7.Garfield and I. H. Sher, “New factors inthe evaluation of scientific literaturethroughcitation indexing,”Am. Dot. 14, 191 (1963);---Genetics Citation Index (Institute forScientinc Information,Philadelphia,1963). Formany of the results discussed in this article Ihave used statistical informationdrawn fromE. Garfield and 1. H. Sher, Science CitationIndex(Institutefor ScientificInformation,Philadelphia,3963)) pp. ix, xvii-xviii.I wish to thank Dr. Eugene Garfield for making available to me several machine printoutsof original data used in the preparation of the1962 fn&x but not published in their entiretyin the preamble to the index.I am grateful to Dr. M. M. Kessler, Massachusetts Institute of Technology, for data forseven research reports of the followingtitlesand dates: “An ExperimentalStudy of Bibliographic Coupling between Technical Papers”(November 2961); “BibliographicCoupling Between Scientific Paners” (July 1962) ; “Analysisof BibliographicSources in the Physical Review (vol. 77, 1950, to vol. 112, 1958) (July1962); “Analysis of BibliographicSources in aGroup of Physics-RelatedJournals”(August1962); “BibliographicCouplingExtended inTime : Ten Case Histories”(August1962) ;“Concerning the Probability that a Given Paperwill be Cited” (November 1962); ‘Comparisonof the Results of BibliographicCoupling andAnalytic Subject Indexing”(January 1963).J. W. Tukey, “Keepingresearch in contactwith the literature:Citation indices and beyond,” J. Chem. l?oc. 2, 34 (1962).C. E. Osgood and L. V. Xhignesse, Characteristics of i liog a ica Coserage in Psychological Journals Published in 1950 and 1960(Institute of CommunicationsResearch, Univ.of Illinois, Urbana, 1963).D. J. de Solla Price, Little Science, Big Scienee (Columbia Univ. Press, New York, 1963).R. E. Burton and R. W. Kebler, “The ‘halflife’ of some scientific and technical literatures,” Am, Dot. 11, 18 (1960).515

papers. There is, however, some paral- lelism in the findings that some 5 per- cent of aff papers appear to be review papers, with many (25 or more) ref- erences, and some 4 percent of all pa- . Fig. 1. Percentages (relative to total number of papers published in 1961) of papers published in 1961 which contain various numbers (n) of .

Related Documents: