Word clouds: monthly tf-idfs

for(j in 1:12){
  di <- ifelse(TF[,j+1]==0,0,TF[,j+1]/S[j+1]*log(12/P))
  or <- order(di,decreasing=TRUE)
  ro <- invPerm(or)
  names(di) <- N
  cat("\n",j,":\n")
  print(di[or][1:10])
  ff <- round(200000/sqrt(1+ro))
  df <- data.frame(N=N[or],f=ff[or])
  wc <- wordcloud2(df,size=0.8,minRotation=0,maxRotation=0)
  tmp <- paste("tmp",j,".html",sep="")
  pdf <- paste("fig",j,".pdf",sep="")
  png <- paste("fig",j,".png",sep="")
  saveWidget(wc,tmp,selfcontained=FALSE)
  webshot(tmp,pdf,delay=5,vwidth=800,vheight=600)
  webshot(tmp,png,delay=5,vwidth=800,vheight=600)
}

January, February

March, April

May, June

July, August

September, October

November, December

Some observations

  1. synonyms: covid-19, covid19, sars-cov-2, cov2, ncov, 2019-ncov; U.S., USA; Italy, italian
  2. “permanent” keywords: Italy, lockdown, distance, symptom, food, school, pandemia, virtual, multisystem, face, USA, ethnic, telemedicine, tocilizumab, medium
  3. stopwords from German, French, Spanish: en, le, du, der, und, um, uma, por, il, une, da, mi, aux, con, mit, bei, com
  4. numbers?
  5. dashes: -, –, —
  6. unicode, latex
notes/imfm/corona/ana/wca.txt · Last modified: 2021/01/05 23:20 by vlado
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki