How to get data from OpenAlex in R?

IMFM

https://docs.openalex.org/quickstart-tutorial ; https://www.dataquest.io/blog/r-api-tutorial/

We will need R libraries httr and jsonlite

> wdir <- "C:/Users/vlado/work/OpenAlex/API"
> setwd(wdir)
> # install.packages(c("httr", "jsonlite"))
> library(httr)
> library(jsonlite)
> res <- GET("https://api.openalex.org/institutions?search=imfm")
> str(res)
> res$date
[1] "2024-03-07 22:51:24 GMT"
> cont <- fromJSON(rawToChar(res$content))
> names(cont)
[1] "meta"     "results"  "group_by"
> str(cont)

An alternative form of GET is

> res <- GET("https://api.openalex.org/institutions",
+     query = list(search="imfm"))

Papers with the largest number of authors

Let's try to find the papers with the largest number of authors.

> wrk <- GET("https://api.openalex.org/works",
+    query = list(filter="authors_count:>4000"))
> cont <- fromJSON(rawToChar(wrk$content))
> names(cont$results)
> names(cont$results$id)
> N <- cont$results$authorships[[1]]$author$id
> length(N)
[1] 100
> W <- cont$results$id
> length(W)
[1] 25

At https://docs.openalex.org/api-entities/authors/limitations we learn that “When retrieving a list of works in the API, the authorships list within each work will be cut off at 100 authorships objects in order to keep things running well.” … “To see the full list of authors, go to the individual record for the work, which is never truncated.”

Let's try

> i <- 1
> w <- strsplit(W[i],"/")[[1]][4]
> cat(i,w,"\n")
> wd <- GET(paste("https://api.openalex.org/works/",w,sep=""))
> names(wd)
 [1] "url"         "status_code" "headers"     "all_headers" "cookies"     "content"     "date"       
 [8] "times"       "request"     "handle" 
> cont <- fromJSON(rawToChar(wd$content))
> names(cont)
> wN <- cont$authorships
> names(wN)
[1] "author_position"         "author"                  "institutions"            "countries"              
[5] "is_corresponding"        "raw_author_name"         "raw_affiliation_string"  "raw_affiliation_strings"
> aN <- cont$authorships$author
> dim(aN)
[1] 6349    3
> names(aN)
[1] "id"           "display_name" "orcid"       
> numA <- nrow(aN)
> numA
[1] 6349

It works!

Now we can write a program to list all 25 works with more than 4000 authors

> cont <- fromJSON(rawToChar(wrk$content))
> W <- cont$results$id
> for(i in 1:length(W)){
+   w <- strsplit(W[i],"/")[[1]][4]
+   wd <- GET(paste("https://api.openalex.org/works/",w,sep="")) 
+   aN <- fromJSON(rawToChar(wd$content))$authorships$author       
+   cat(i,w,nrow(aN),"\n")
+ }
1 W4206449744 6349 
2 W1694905212 5103 
3 W3135829537 16162 
4 W4225826670 6303 
5 W4226146026 6215 
6 W3202989501 5203 
7 W4288459457 7942 
8 W3194033501 16162 
9 W4283721846 5086 
10 W4365449195 8107 
11 W4390806927 5290 
12 W4307909401 4503 
13 W3179655344 4869 
14 W4200548116 5548 
15 W4391092522 7373 
16 W2534506121 5399 
17 W3081505077 5246 
18 W4385551082 5244 
19 W4387749209 7992 
20 W2536981003 4486 
21 W2754534007 5098 
22 W3036909185 5107 
23 W2314279866 8967 
24 W2316440068 9036 
25 W4387473560 5865   
> W[3]
[1] "https://openalex.org/W3135829537"
> W[8]
[1] "https://openalex.org/W3194033501"

There are two papers with 16162 co-authors. From their OpenAlex pages we get

See also: Guinness. (2021). Guinness world record 653537: The most authors on a single peer-reviewed academic paper.

List of works

https://api.openalex.org/works?filter=openalex:W4206449744|W1694905212|W3135829537|W3194033501&select=id,title






OpenAlex

vlado/work/bib/alex/r1.txt · Last modified: 2002/01/01 09:28 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki