LogEc collects accesses statistics from several services which use the RePEc data set, the largest online collection of Economics Working Papers, Journal Articles and Software Components. LogEc provides a convenient way of tracking trends in the profession (it can be found in RePEc before it hits the journals) and the impact of your own work.
Please contact if you have any questions about this site or the statistics.
Contributing and publicizing your work
LogEc currently collects access statistics from the following sites and services providing access to the RePEc data set
The statistics are updated monthly around the third of each month when the server logs from the participating sites are collected and merged.
Producing meaningful statistics for accesses to web servers is a difficult task, especially so since we are merging data from several different sites. Rather than just counting the number of times a page or file is accessed (by a human or a piece of software indexing the web) the goal is to get as close as possible to a measure of the number of people showing an interest in a paper by reading the abstract page or downloading the full text file.
To accomplish this we
While not perfect the net effect of these steps is to produce statistics that closely approximates the number of actual persons viewing the abstract of a paper or download a full text file.
Robots and spiders (the software that index the web) account for some 60% of the hits to these sites. The statistics would be completely misleading if we did not remove the accesses made by robots.
Robots are primarily identified by checking if a host has requested the /robots.txt file. Robots adhering to the Robot Exclusion Standard checks this file to see which parts of a web site they shouldn't index.
In addition an effort is made to identify robots that don't request /robots.txt and hosts that display robot like behavior. This is a new feature of the statistics introduced with the access statistics for November 2002, at the same time the historical data was revised to make them comparable. These "robot like" hosts are identified by looking for hosts or groups of hosts that have an excessive number of accesses. A host is declared a robot and the accesses excluded from the statistics if either
In July 2010 this identified about 600 robots with 791,000 abstract views and 5,600 downloads.
This is in addition to the 4,300 robots that identified themselves by accessing robots.txt and accounted
for over 24 million abstract views and about 600,000 downloads.
Clearly search engine robots account for a significant portion of the traffic at the RePEc sites and on the Internet in general. This is obviously driven by the desire to have as fresh an index and as broad a coverage as possible. Taking a look at how many requests there have been from different search engines gives an indication of how good they are in this respect.
Double counting occurs when a person views an abstract page more than once or, perhaps being impatient, clicks on a download link more than once. In each case it would be misleading to count this as more than one abstract view or file download. To avoid this double counting we keep track of the originating IP-number of each access and count only one access to a specific resource for each IP-number.
The strategy for avoiding double counting introduces a slight undercount when, for example, several computers behind a firewall share the same external IP-number. By comparing with statistics obtained by identifying users with cookies rather than their IP-numbers we can estimate this under count to about 2% for abstract views and 1% for downloads.
Over time it has become clear that the simple filtering for robots and removal of double clicks discussed above is not enough. Many new practices has developed on the web, some for a good purpose, some for a more questionable purpose. There are spam-bots, referer spamming (a stupid idea if there ever was one), anti-malware software that checks links on a web page and warn users about dangerous links and much, much more that should not be counted. And, yes, there appears to be the occasional attempt to manipulate the statistics.
Starting from July 2010 we apply an additional set of heuristics to filter out these accesses. In conjunction with this we have also recalculated the statistics going back to January 2008. The overall effect is relatively small but there are substantial reductions in the number of accesses for a small number of papers.
We are continually working on improving the statistics and will add new filters over the coming months.
Oddities in the statistics
There are sometimes more downloads than abstract views registered for a paper. This is primarily due to two reasons. New papers are announced in the New Economics Papers service. The service regularly sends out e-mail with information about new papers and the reader can download the paper by clicking a link in the e-mail and there is no abstract view registered for this paper. In addition Google Scholar sometimes links directly to the download link at one of the RePEc services rather than to the abstract page. This can lead to more downloads than abstract views being registered for older papers as well.
Programmatic access to the data
Occasionally we get requests about data for research purposes. While we do not have the resources to run special queries to create custom data sets all the results we present on the web are available in machine readable form as well as the standard html presentation.
Simply append the argument 'format=csv' and the data will be returned as a tab separated file. For example http://logec.repec.org/scripts/itemstat.pf?topnum=50;type=redif-paper;sortby=td;format=csv will return the 50 working papers with the highest total downloads. This is available for top working papers, journal articles, books, chapters, software, authors, working paper series, journals and rankings within working paper series and journals.
In addition there is a facility to obtain a list of the works claimed by an author. This is the authorworks.pf script. It takes one argument, id - the RePEc short-id of the author - and returns a text file with the handles of the works claimed by the author. For example, http://logec.repec.org/scripts/authorworks.pf?id=pka1. Detailed download statistics for each work can then be obtained with the paperstat.pf script.
Constructing a queryUse the web interface the construct the desired query, then add the the format=csv parameter to get a downloadable file. In addition, some of the scripts takes arguments that are not directly available throught the web interface:
LogEc wouldn't be possible without the assistance and support of the maintainers of the participating services, Thomas Krichel, Christian Zimmermann, Sergei Parinov and José Manuel Barrueco.
The whole exercise would of course be pointless without the work of all the RePEc archive maintainers who provide the data about the working papers, articles and software items. And RePEc itself wouldn't be what it is without the continued effort of the RePEc Team.
Thanks also to Olaf Storbeck of the German business daily Handelsblatt for many useful suggestion on how to improve the statistics. The Handelsblatt runs a weekly "Economics" page which frequently features rankings of Economists of Economics papers based on LogEc data.
Page updated 2011-06-14