Project 2

The OpenAlex Dataset is a comprehensive, open-source bibliographic database offering extensive information on academic publications.

The data can be accessed using API that can be used from a web browser or user programs. How to use API in R?

Select an institution

Select an institution with at least 25000 published works. To prevent duplication, send me its OpenAlex ID for confirmation.

For example, for https://api.openalex.org/institutions?search=HSE and https://api.openalex.org/institutions?search=Univerza v Ljubljani we get IDs I118501908 (works_count: 45987) and I153976015 (works_count: 64266). Both are OK.

n student ID institution
1 Ilia Kazakov I126527374 Российский университет дружбы народов
2 Kayode Ahmed I1343551460 The University of Texas MD Anderson Cancer Center
3 Enrique Nuñez I28407311 University of Manchester
4 Артём Кузнецов I79576946 University of Pennsylvania
5 Александр Матвеев I2279609970 Université de Lille
6
7
8
9

Task 1

For the selected institution in the years 2011-2021, list names of the top 10 units (total values) and draw the frequency distribution by years of the top unit and the joint distribution of all units of the following variables

  1. number of works of an author in a given year
  2. fractional contribution of an author in a given year
  3. number of works of an author in a given year written in collaboration with at least one author from other Country

A fractional contribution of an author to a given work is 1/(number of authors). A fractional contribution of an author to a given set of works is the sum of the fractional contributions of works from a given set.

For example. Assume that author A wrote in the year 2015 three works a: 1, b: 5, c: 2 (numbers count authors). Then A's fractional contribution for the year 2015 is 1 + 1/5 + 1/2 = 1.7.

For authors from the selected institution in the years 2011-2021, create a data frame (author's OpenAlex ID, author's name, total number of works, total fractional contribution, total number of works with foreigners) and save it in CSV format.

Task 2

For the selected institution draw the corresponding picture of the type

red - proportion of single-author works, blue - proportion of works with 2 authors, etc.


Interpret the obtained results.



Students; EDA

ru/hse/eda24/stu/p2.txt · Last modified: 2024/04/09 04:29 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki