====== Cyrillic and Unicode ======
===== Converting names in Cyrillic into Ascii =====
July 23, 2017
Daria had problems with exporting matrix representation of results of blockmodeling (and clustering dendrograms) because the EPS files do not support Unicode (Cyrillic). A solution would be to implement in Pajek the export to SVG. A quicker solution is to transcribe the Russian names into Latin alphabet. In R this service is provided in the library ''stringi''. See also [[https://www.r-bloggers.com/icu-unicode-text-transforms-in-the-r-package-stringi/|ICU Unicode text transforms in the R package stringi]].
> F <- readLines("BM.net")[3:108]
> Encoding(F) <- "UTF-8"
> L <- strsplit(F,'\"')
> df <- data.frame(matrix(unlist(L),nrow=106,byrow=TRUE),stringsAsFactors=FALSE)
> N <- df$X2
> Encoding(N) <- "UTF-8"
> library(stringi)
> R <- stri_trans_general(N,"cyrillic-latin;nfd;[:nonspacing mark:] remove;nfc")
> write.csv(R,"BMlatin.nam",row.names=FALSE)
Now we manually copy the names from ''BMlatin.nam'' into ''BM.net''.
==== Check: ====
> tail(N)
[1] "ГОМЗИН А" "НЕДУМОВ Я" "IVANOV I" "АСТРАХАНЦЕВ Н"
[5] "ТРИПУТИНА В" "МАКАГОНОВА Н"
> tail(R)
[1] "GOMZIN A" "NEDUMOV A" "IVANOV I" "ASTRAHANCEV N"
[5] "TRIPUTINA V" "MAKAGONOVA N"
===== Problems with conversion of character Ь =====
August 1, 2017
> N[44]
[1] "ЗОРЬКИНА К"
> R[44]
[1] "ZORʹKINA K"
> utf8ToInt(R[44])
[1] 90 79 82 697 75 73 78 65 32 75
> T <- sapply(R,function(w)gsub(intToUtf8(697),"'",w),USE.NAMES=FALSE)
> T[44]
[1] "ZOR'KINA K"
> utf8ToInt(T[44])
[1] 90 79 82 39 75 73 78 65 32 75