16

I am trying to find a way to search for all the citations contained in an article. What I want to do is create a workflow for my research where:

  1. I have a pool of articles that I have already read.
  2. I want to find a new article to read, based on this pool of articles, so I find all the citations of all the read articles and calculate the most cited article, which is the next article I should read (obviously skipping articles that I have already read).

But I have been unable to find a way to automatically download all of the citations of a specific article. Is there no way to do it? even payed?

vainolo
  • 363
  • 1
  • 8
  • 1
    I know of no tool, free or paid, to do this, but if you are willing to exchange money for this service, perhaps hiring someone else...? – Ben Norris Sep 02 '12 at 23:04
  • @BenNorris, you mean manually? – vainolo Sep 03 '12 at 09:55
  • @vainolo: it is an interesting problem the one that you try to solve. There are publishers who give URL links for (some of) the references. I am thinking of mining the source html page of a paper's online url (for example this one, selecting the "reference" tabs, identifying the URLs pointing to other papers, and then downloading them in the same way. – ElCid Sep 03 '12 at 11:00
  • @ElCid Yes, but i'm not sure if all sites provide the links (I know ACM and IEEE do), and more than that, it means that if they change the site's design all the mining code goes to the trash :-( – vainolo Sep 03 '12 at 13:36
  • @vainolo What is your field of work? – Noble P. Abraham Sep 03 '12 at 14:17
  • @NobleP.Abraham Software Engineering – vainolo Sep 03 '12 at 14:57
  • Thought adding your field would help in getting a better answer, relevant to your area. – Noble P. Abraham Sep 03 '12 at 15:59
  • @vainolo Yes, I mean manually. A Noble's excellent answer below states, automated scraping of citation databases tends be against the EULA and/or TOU. – Ben Norris Sep 03 '12 at 18:26
  • Readcube might have the tools to get halfway there. Worth seeing if that fits your pipeline. – bobthejoe Sep 07 '12 at 06:11
  • @bobthejoe Thanks for the link. Nice app, but it is still dependent on external web sites to gather the information. And from my first use, it couldn't identify about 75% of my PDFs. – vainolo Sep 09 '12 at 09:25
  • @vainolo. Readcube is certainly... imperfect. I personally use Mendeley since Readcube can't figure out my library. – bobthejoe Sep 10 '12 at 02:57
  • Cross-link to related question: https://academia.stackexchange.com/questions/46847/how-to-download-references-to-database-that-a-paper-includes-and-all-articles-th – aplaice Sep 24 '17 at 01:50

3 Answers3

10

There seems to be no service or tool that let's you download articles in bulk. This is probably so because most journals do have zero tolerance for bulk/automated downloads. Automated downloads are some times (may be most of the times) an attempt to infringe the copyrights.

For example, ACM Digital Library ToS has this restriction.

Under no circumstances are the following actions permitted:

  • Using scripts or spiders to automatically download articles or harvest metadata from the ACM Digital Library. This activity is a serious violation of ACM’s DL usage policy and will result in the temporary or permanent termination of download rights for the subscribing institution.

That being said, irrespective of the field of research, here are some services (free and paid) that offers you a list of citations / cited by articles (other than the journal's webpage for the article).

While ACM provide free Cited by list, IEEE provides this for subscribers/members.

Also please note, from NASA ADS FAQ

In addition, references may be incomplete due to our inability to match them with 100% accuracy (e.g. in press, private communications, author errors, some conference series, etc.). Anyone using the citations for analysis of publishing records should keep this in mind.

which is true for any citation / Cited by lists.

Noble P. Abraham
  • 2,452
  • 1
  • 21
  • 34
  • 2
    I find such an ACM policy really arrogant and silly: how could the downloading of the list of references from "a" paper possibly be against that policy? so if I automatically extract the links and then manually fetch them, would that be right I suppose? I understand that the policy is against the mass downloading of all the papers in all the conferences, but that clause is really silly... – ElCid Sep 03 '12 at 22:01
2

It seems that crossref.org is beginning to roll out providing a list of citations (the works that a given work cites):

https://www.crossref.org/blog/distributing-references-via-crossref/

[See the aptly named section "OMG! OMG! OMG! Does this mean I can get references from api.crossref.org?"]

Using the example doi from the above link (doi:10.7554/eLife.10288), you could obtain the list of citations in that work at: https://api.crossref.org/v1/works/10.7554/eLife.10288.xml

Alternatively, with content negotiation, you could just use:

curl -L -H "Accept: application/vnd.crossref.unixsd+xml" \
 https://doi.org/10.7554/eLife.10288 > data.xml

The citations are listed in the <citation_list> element.

Warning: The citation data is, according to the link above, only available in the XML, not the JSON, representation. Also, the service is not available for all works, yet.

aplaice
  • 153
  • 3
0

PubMedCentral gives you full text and their API may be able to do it.

But getting all article PMIDs which are cited by paper X is the necessary step for that and this is hard - I don't think thre is a service..

Unless you parse the full text yourself using your own code.

userJT
  • 3,005
  • 2
  • 22
  • 30