The Scopus bibliographic data are Unicode encoded and can contain non-ASCII characters. The data from WoS have author names transformed into ASCII. How can we make both data compatible?
I first prepared a test file examples.txt
(UTF-8 with BOM)
AU Garcia-Calvo, T RI García-Calvo, Tomas/AAN-6825-2021; OLIVA, DAVID SANCHEZ/L-1698-2014; AU Bazina, AM Pericic, TP Mihanovic, F RI Peričić, Tina Poklepović/G-8402-2017; Mihanović, Frane/E-3337-2017 RI Chmura, Paweł/U-6645-2019; Struzik, Artur/AAC-2669-2021; Popowczak, AU - Marinović, M. AU - Kjær, M. AU - Renström, P.A.F.H. AU - Ibáñez, S.J. AU - Kristjánsdóttir, H. Cvetković, D., Doob, M., Sachs, H., (1995) Spectra of Graphs: Theory and Application, pp. 18-20. , Johann Ambrosius Barth, Heidelberg, 3rd edn; Заболотский Александр Викторович <azabolotskii@hse.ru> الدبلوم التنفيذي | مهارات الذكاء الاصطناعي وعلم البيانات Cui, Chunfang; Tong, Zhongliang 干燥新技术及应用 /Gan zao xin ji shu ji ying yong [Di 1 ban. ed.]
After some searching on Google I found the solution in Transliterating non-ASCII characters with Python
wdir = "C:/Users/vlado/work2/mark/ascii" import sys; sys.path.append(wdir) import os; os.chdir(wdir) import io from unidecode import unidecode infile = open("examples.txt","r",encoding="utf-8-sig") data = infile.read() infile.close() a = unidecode(data) print(data) print(a)
We get the following trasliteration:
AU Garcia-Calvo, T RI Garcia-Calvo, Tomas/AAN-6825-2021; OLIVA, DAVID SANCHEZ/L-1698-2014; AU Bazina, AM Pericic, TP Mihanovic, F RI Pericic, Tina Poklepovic/G-8402-2017; Mihanovic, Frane/E-3337-2017 RI Chmura, Pawel/U-6645-2019; Struzik, Artur/AAC-2669-2021; Popowczak, AU - Marinovic, M. AU - Kjaer, M. AU - Renstrom, P.A.F.H. AU - Ibanez, S.J. AU - Kristjansdottir, H. Cvetkovic, D., Doob, M., Sachs, H., (1995) Spectra of Graphs: Theory and Application, pp. 18-20. , Johann Ambrosius Barth, Heidelberg, 3rd edn; Zabolotskii Aleksandr Viktorovich <azabolotskii@hse.ru> ldblwm ltnfydhy | mhrt ldhk lSTn`y w`lm lbynt Cui, Chunfang; Tong, Zhongliang Gan Zao Xin Ji Zhu Ji Ying Yong /Gan zao xin ji shu ji ying yong [Di 1 ban. ed.]