For Python see https://www.dataquest.io/blog/python-api-tutorial/ .
https://docs.openalex.org/quickstart-tutorial ; https://www.dataquest.io/blog/r-api-tutorial/
We will need R libraries httr and jsonlite
> wdir <- "C:/Users/vlado/work/OpenAlex/API" > setwd(wdir) > # install.packages(c("httr", "jsonlite")) > library(httr) > library(jsonlite) > res <- GET("https://api.openalex.org/institutions?search=imfm") > str(res) > res$date [1] "2024-03-07 22:51:24 GMT" > cont <- fromJSON(rawToChar(res$content)) > names(cont) [1] "meta" "results" "group_by" > str(cont)
An alternative form of GET is
> res <- GET("https://api.openalex.org/institutions", + query = list(search="imfm"))
Let's try to find the papers with the largest number of authors.
> wrk <- GET("https://api.openalex.org/works", + query = list(filter="authors_count:>4000")) > cont <- fromJSON(rawToChar(wrk$content)) > names(cont$results) > names(cont$results$id) > N <- cont$results$authorships[[1]]$author$id > length(N) [1] 100 > W <- cont$results$id > length(W) [1] 25
At https://docs.openalex.org/api-entities/authors/limitations we learn that “When retrieving a list of works in the API, the authorships list within each work will be cut off at 100 authorships objects in order to keep things running well.” … “To see the full list of authors, go to the individual record for the work, which is never truncated.”
Let's try
> i <- 1 > w <- strsplit(W[i],"/")[[1]][4] > cat(i,w,"\n") > wd <- GET(paste("https://api.openalex.org/works/",w,sep="")) > names(wd) [1] "url" "status_code" "headers" "all_headers" "cookies" "content" "date" [8] "times" "request" "handle" > cont <- fromJSON(rawToChar(wd$content)) > names(cont) > wN <- cont$authorships > names(wN) [1] "author_position" "author" "institutions" "countries" [5] "is_corresponding" "raw_author_name" "raw_affiliation_string" "raw_affiliation_strings" > aN <- cont$authorships$author > dim(aN) [1] 6349 3 > names(aN) [1] "id" "display_name" "orcid" > numA <- nrow(aN) > numA [1] 6349
It works!
Now we can write a program to list all 25 works with more than 4000 authors
> cont <- fromJSON(rawToChar(wrk$content)) > W <- cont$results$id > for(i in 1:length(W)){ + w <- strsplit(W[i],"/")[[1]][4] + wd <- GET(paste("https://api.openalex.org/works/",w,sep="")) + aN <- fromJSON(rawToChar(wd$content))$authorships$author + cat(i,w,nrow(aN),"\n") + } 1 W4206449744 6349 2 W1694905212 5103 3 W3135829537 16162 4 W4225826670 6303 5 W4226146026 6215 6 W3202989501 5203 7 W4288459457 7942 8 W3194033501 16162 9 W4283721846 5086 10 W4365449195 8107 11 W4390806927 5290 12 W4307909401 4503 13 W3179655344 4869 14 W4200548116 5548 15 W4391092522 7373 16 W2534506121 5399 17 W3081505077 5246 18 W4385551082 5244 19 W4387749209 7992 20 W2536981003 4486 21 W2754534007 5098 22 W3036909185 5107 23 W2314279866 8967 24 W2316440068 9036 25 W4387473560 5865 > W[3] [1] "https://openalex.org/W3135829537" > W[8] [1] "https://openalex.org/W3194033501"
There are two papers with 16162 co-authors. From their OpenAlex pages we get
See also: Guinness. (2021). Guinness world record 653537: The most authors on a single peer-reviewed academic paper.
https://api.openalex.org/works?filter=openalex:W4206449744|W1694905212|W3135829537|W3194033501&select=id,title