====== Stemming and lemmatization ====== July 4, 2017 * Introduction to Information Retrieval: [[http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html|Stemming and lemmatization]] * [[http://stackoverflow.com/questions/28214148/how-to-perform-lemmatization-in-r|Lemmatization in R]] * [[http://meta-guide.com/software-meta-guide/100-best-github-lemmatization|100 best Github lemmatization]] * IJS [[http://lemmatise.ijs.si|Lemmatisation Portal]]; [[http://lemmatise.ijs.si/Software|Software]] * [[http://cst.dk/online/lemmatiser/uk/|CST's Lemmatiser]] * [[http://text-processing.com/demo/stem/|Stemming and Lemmatization with Python NLTK]] * [[https://semanticanalyzer.info/blog/2014/03/lemmatizer-stemmer-for-russian-how-to-use-in-your-code/|Lemmatizer / Stemmer for Russian: how to use in your code]] * http://reganmian.net/blog/2013/10/17/playing-with-word-stemming-and-frequencies-in-russian/ * https://www.slideshare.net/dmitrykan/linguistic-component-lemmatizer-for-the-russian-language * [[https://pypi.python.org/pypi/pymystem3/0.1.0|pymystem3 0.1.0]] - morphological analyzer for Russian language Yandex Mystem 3.0 * [[https://github.com/kmike/pymorphy2|pymorphy2]], [[https://arxiv.org/pdf/1503.07283.pdf|doc]] * [[http://corpus.leeds.ac.uk/mocky/|TreeTagger]] * [[https://cst.dk/online/lemmatiser/uk/|CST's Lemmatiser]] * [[https://github.com/grachev/node-lemmer|English and Russian lemmatizer for Node.js]] * [[http://text-processing.com/demo/stem/|Stemming and Lemmatization with Python NLTK]], [[http://textminingonline.com/dive-into-nltk-part-iv-stemming-and-lemmatization|Dive in]] * https://nlp.stanford.edu/IR-book/html/htmledition/determining-the-vocabulary-of-terms-1.html * http://rupostagger.sourceforge.net/ * http://www.solarix.ru/grammatical-dictionary-api-en.shtml * http://www.dialog-21.ru/media/3384/berdi%C4%8Devskisaetal.pdf * http://wiki.cs.hse.ru/Lecture_2._Tokenization_and_word_counts * [[http://wiki.cs.hse.ru/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0|HSE wiki - Учебные курсы факультета компьютерных наук]] ===== Russian lemmatizer ===== * https://github.com/nlpub/pymystem3 Ruski lematizer * https://yandex.ru/dev/mystem/ * https://nlpub.ru/Mystem * https://repology.org/project/python:pymystem3/packages * https://github.com/Koziev/rulemma * https://stackoverflow.com/questions/28214148/how-to-perform-lemmatization-in-r * https://pymorphy2.readthedocs.io/en/latest/ * https://github.com/buriy/python-readability * https://github.com/topics/lemmatization?o=desc&s=stars * https://github.com/trinker/textstem * https://github.com/Hyperparticle/LemmaTag * https://github.com/reynoldsnlp/udar * https://github.com/writecrow/lemmatizer * https://github.com/Nourshosharah/introduction-to-natural-language-processing-in-python * https://github.com/Nourshosharah/introduction-to-natural-language-processing-in-python/blob/master/named%20entity%20rec/nlp%20chapter3.pdf * https://github.com/aditeyabaral/HashMap-Lemmatizer * https://www.datacamp.com/community/tutorials/stemming-lemmatization-python * https://github.com/zushicat/text-topics * https://stanfordnlp.github.io/stanza/ ; https://github.com/stanfordnlp/stanza/ ; https://stanfordnlp.github.io/stanza/available_models.html ; https://stanfordnlp.github.io/CoreNLP/lemma.html ===== Stopwords ===== * https://cran.r-project.org/web/packages/stopwords/stopwords.pdf * https://github.com/stopwords-iso/stopwords-ru * https://github.com/stopwords-iso/stopwords-iso * https://raw.githubusercontent.com/stopwords-iso/stopwords-iso/master/stopwords-iso.json * https://countwordsfree.com/stopwords/russian * https://www.ranks.nl/stopwords/russian * https://www.rdocumentation.org/packages/tm/versions/0.7-7/topics/stopwords * https://www.kaggle.com/alxmamaev/how-to-easy-preprocess-russian-text * https://stats.stackexchange.com/questions/9674/how-to-remove-stopwords-with-russian-documents ===== Named-entity_recognition ===== * https://en.wikipedia.org/wiki/Named-entity_recognition * https://www.researchgate.net/publication/262203599_Introducing_Baselines_for_Russian_Named_Entity_Recognition * https://github.com/chambliss/Multilingual_NER * https://github.com/deepmipt/DeepPavlov * https://towardsdatascience.com/19-entities-for-104-languages-a-new-era-of-ner-with-the-deeppavlov-multilingual-bert-1bfa6d413ea6 * http://www.dialog-21.ru/media/1234/brykinamm.pdf ; http://www.dialog-21.ru/media/4914/paulsae-berezinsa.pdf * https://gab41.lab41.org/how-to-fine-tune-bert-for-named-entity-recognition-2257b5e5ce7e * https://www.springerprofessional.de/en/named-entity-recognition-in-russian-with-word-representation-lea/16157560 * Damien Nouvel, Maud Ehrmann, Sophie Rosset: Named entities for computational linguistics. Focus series, Wiley 2016 * https://www.clarin.eu/resource-families/tools-named-entity-recognition * https://cloud.gate.ac.uk/shopfront/displayItem/russian-ner-with-inflexional-gazetteer-and-orthomatcher * https://www.academia.edu/Documents/in/Named_Entity_Recognition * https://www.netowl.com/entity-extraction * https://primer.ai/blog/a-new-state-of-the-art-for-named-entity-recognition/ * http://eurekaengine.ru/en/description/ * https://www.coursera.org/lecture/text-mining-analytics/3-2-explanations-of-named-entity-recognition-uQ7KH * https://tatianashavrina.github.io/2018/08/30/datasets/ * https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=general * https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3-preview * https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/language-support?tabs=sentiment-analysis *