Transliterating non-ASCII characters

The Scopus bibliographic data are Unicode encoded and can contain non-ASCII characters. The data from WoS have author names transformed into ASCII. How can we make both data compatible?

I first prepared a test file examples.txt (UTF-8 with BOM)

AU Garcia-Calvo, T
RI García-Calvo, Tomas/AAN-6825-2021; OLIVA, DAVID SANCHEZ/L-1698-2014;

AU Bazina, AM
   Pericic, TP
   Mihanovic, F
RI Peričić, Tina Poklepović/G-8402-2017; Mihanović, Frane/E-3337-2017
RI Chmura, Paweł/U-6645-2019; Struzik, Artur/AAC-2669-2021; Popowczak,
AU  - Marinović, M.
AU  - Kjær, M.
AU  - Renström, P.A.F.H.
AU  - Ibáñez, S.J.
AU  - Kristjánsdóttir, H.
Cvetković, D., Doob, M., Sachs, H., (1995) Spectra of Graphs: Theory and Application, pp. 18-20. , Johann Ambrosius Barth, Heidelberg, 3rd edn; 
Заболотский Александр Викторович <azabolotskii@hse.ru>
الدبلوم التنفيذي | مهارات الذكاء الاصطناعي وعلم البيانات
Cui, Chunfang; Tong, Zhongliang 干燥新技术及应用 /Gan zao xin ji shu ji ying yong [Di 1 ban. ed.]

After some searching on Google I found the solution in Transliterating non-ASCII characters with Python

wdir = "C:/Users/vlado/work2/mark/ascii"
import sys; sys.path.append(wdir)
import os; os.chdir(wdir)
import io
from unidecode import unidecode
 
infile = open("examples.txt","r",encoding="utf-8-sig")
data = infile.read()
infile.close()
 
a = unidecode(data)
 
print(data)
print(a)

We get the following trasliteration:

AU Garcia-Calvo, T
RI Garcia-Calvo, Tomas/AAN-6825-2021; OLIVA, DAVID SANCHEZ/L-1698-2014;

AU Bazina, AM
   Pericic, TP
   Mihanovic, F
RI Pericic, Tina Poklepovic/G-8402-2017; Mihanovic, Frane/E-3337-2017
RI Chmura, Pawel/U-6645-2019; Struzik, Artur/AAC-2669-2021; Popowczak,
AU  - Marinovic, M.
AU  - Kjaer, M.
AU  - Renstrom, P.A.F.H.
AU  - Ibanez, S.J.
AU  - Kristjansdottir, H.
Cvetkovic, D., Doob, M., Sachs, H., (1995) Spectra of Graphs: Theory and Application, pp. 18-20. , Johann Ambrosius Barth, Heidelberg, 3rd edn; 
Zabolotskii Aleksandr Viktorovich <azabolotskii@hse.ru>
ldblwm ltnfydhy | mhrt ldhk lSTn`y w`lm lbynt
Cui, Chunfang; Tong, Zhongliang Gan Zao Xin Ji Zhu Ji Ying Yong  /Gan zao xin ji shu ji ying yong [Di 1 ban. ed.]

URLs

vlado/notes/txt/er/nonascii.txt · Last modified: 2022/06/14 03:48 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki