====== Cyrillic and Unicode ====== ===== Converting names in Cyrillic into Ascii ===== July 23, 2017 Daria had problems with exporting matrix representation of results of blockmodeling (and clustering dendrograms) because the EPS files do not support Unicode (Cyrillic). A solution would be to implement in Pajek the export to SVG. A quicker solution is to transcribe the Russian names into Latin alphabet. In R this service is provided in the library ''stringi''. See also [[https://www.r-bloggers.com/icu-unicode-text-transforms-in-the-r-package-stringi/|ICU Unicode text transforms in the R package stringi]]. > F <- readLines("BM.net")[3:108] > Encoding(F) <- "UTF-8" > L <- strsplit(F,'\"') > df <- data.frame(matrix(unlist(L),nrow=106,byrow=TRUE),stringsAsFactors=FALSE) > N <- df$X2 > Encoding(N) <- "UTF-8" > library(stringi) > R <- stri_trans_general(N,"cyrillic-latin;nfd;[:nonspacing mark:] remove;nfc") > write.csv(R,"BMlatin.nam",row.names=FALSE) Now we manually copy the names from ''BMlatin.nam'' into ''BM.net''. ==== Check: ==== > tail(N) [1] "ГОМЗИН А" "НЕДУМОВ Я" "IVANOV I" "АСТРАХАНЦЕВ Н" [5] "ТРИПУТИНА В" "МАКАГОНОВА Н" > tail(R) [1] "GOMZIN A" "NEDUMOV A" "IVANOV I" "ASTRAHANCEV N" [5] "TRIPUTINA V" "MAKAGONOVA N" ===== Problems with conversion of character Ь ===== August 1, 2017 > N[44] [1] "ЗОРЬКИНА К" > R[44] [1] "ZORʹKINA K" > utf8ToInt(R[44]) [1] 90 79 82 697 75 73 78 65 32 75 > T <- sapply(R,function(w)gsub(intToUtf8(697),"'",w),USE.NAMES=FALSE) > T[44] [1] "ZOR'KINA K" > utf8ToInt(T[44]) [1] 90 79 82 39 75 73 78 65 32 75