Academic interests
Computational linguistics and natural language processing, distributional semantics, diachronic word embedding models, machine learning, translation studies, learner corpora.
You may want to have a look at WebVectors, the web service we created to play with neural distributional models for English and Norwegian languages.
Courses taught
Background
I received my Master's degree in Computational Linguistics at National Research University Higher School of Economics (Moscow) in 2014, with a thesis "Semantic clustering of Russian web search results: possibilities and problems".
Full CV
Below is the list of my selected recent publications. Full list and texts can be found at my Academia page.
Tags:
Machine Learning,
Natural Language Processing,
Computational Linguistics,
Corpus Linguistics,
Word Embeddings
Publications
-
Fomin, Vadim; Bakshandaeva, Daria; Rodina, Julia & Kutuzov, Andrei (2019). Tracing Cultural Diachronic Semantic Shifts in Russian Using Word Embeddings: Test Sets and Baselines. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
18, s 203- 218
Show summary
The paper introduces manually annotated test sets for the task of tracing diachronic (temporal) semantic shifts in Russian. The two test sets are complementary in that the first one covers comparatively strong semantic changes occurring to nouns and adjectives from pre-Soviet to Soviet times, while the second one covers comparatively subtle socially and culturally determined shifts occurring in years from 2000 to 2014. Additionally, the second test set offers more granular classification of shifts degree, but is limited to only adjectives. The introduction of the test sets allowed us to evaluate several well-established algorithms of semantic shifts detection (posing this as a classification problem), most of which have never been tested on Russian material. All of these algorithms use distributional word embedding models trained on the corresponding in-domain corpora. The resulting scores provide solid comparison baselines for future studies tackling similar tasks. We publish the datasets, code and the trained models in order to facilitate further research in automatically detecting temporal semantic shifts for Russian words, with time periods of different granularities.
-
Kutuzov, Andrei; Dorgham, Mohammad; Oliynyk, Oleksiy; Biemann, Chris & Panchenko, Alexander (2019). Learning Graph Embeddings from WordNet-based Similarity Measures, In Rada Mihalcea; Ekaterina Shutova; Lun-Wei Ku; Kilian Evang & Soujanya Poria (ed.),
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019).
Association for Computational Linguistics.
ISBN 978-1-948087-93-3.
conference paper.
s 125
- 135
Show summary
We present path2vec, a new approach for learning graph embeddings that relies on structural measures of pairwise node similarities. The model learns representations for nodes in a dense space that approximate a given user-defined graph distance measure, such as e.g. the shortest path distance or distance measures that take information beyond the graph structure into account. Evaluation of the proposed model on semantic similarity and word sense disambiguation tasks, using various WordNet-based similarity measures, show that our approach yields competitive results, outperforming strong graph embedding baselines. The model is computationally efficient, being orders of magnitude faster than the direct computation of graph-based distances.
-
Kutuzov, Andrei; Dorgham, Mohammad; Oliynyk, Oleksiy; Biemann, Chris & Panchenko, Alexander (2019). Making Fast Graph-based Algorithms with Graph Metric Embeddings, In Anna Korhonen; David Traum & Lluís Màrquez (ed.),
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
Association for Computational Linguistics.
ISBN 978-1-950737-48-2.
conference paper.
s 3349
- 3355
Show summary
Graph measures, such as node distances, are inefficient to compute. We explore dense vector representations as an effective way to approximate the same information. We introduce a simple yet efficient and effective approach for learning graph embeddings. Instead of directly operating on the graph structure, our method takes structural measures of pairwise node similarities into account and learns dense node representations reflecting user-defined graph distance measures, such as e.g. the shortest path distance or distance measures that take information beyond the graph structure into account. We demonstrate a speed-up of several orders of magnitude when predicting word similarity by vector operations on our embeddings as opposed to directly computing the respective path-based measures, while outperforming various other graph embeddings on semantic similarity and word sense disambiguation tasks.
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2019). To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation, In Joakim Nivre; Leon Derczynski; Filip Ginter; Bjørn Lindi; Stephan Oepen; Anders Søgaard & Jorg Tidemann (ed.),
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing.
Linköping University Electronic Press.
ISBN 978-91-7929-999-6.
conference paper.
s 22
- 28
Show summary
In this paper, we critically evaluate the widespread assumption that deep learning NLP models do not require lemmatized input. To test this, we trained versions of contextualised word embedding ELMo models on raw tokenized corpora and on the corpora with word tokens replaced by their lemmas. Then, these models were evaluated on the word sense disambiguation task. This was done for the English and Russian languages. The experiments showed that while lemmatization is indeed not necessary for English, the situation is different for Russian. It seems that for rich-morphology languages, using lemmatized training and testing data yields small but consistent improvements: at least for word sense disambiguation. This means that the decisions about text pre-processing before training ELMo should consider the linguistic nature of the language in question.
-
Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2019). One-to-X Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts, In Nina Tahmasebi; Lars Borin; Adam Jatowt & Yang Xu (ed.),
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change.
Association for Computational Linguistics.
ISBN 978-1-950737-31-4.
chapter.
s 196
- 201
Show summary
We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type ‘location:armed-group’ based on data about past events. As the source of semantic information, we use diachronic word embedding models trained on English news texts. A simple technique to improve diachronic performance in such task is demonstrated, using a threshold based on a function of cosine distance to decrease the number of false positives; this approach is shown to be beneficial on two different corpora. Finally, we publish a ready-to-use test set for one-to-X analogy evaluation on historical armed conflicts data.
-
Rodina, Julia; Bakshandaeva, Daria; Fomin, Vadim; Kutuzov, Andrei; Touileb, Samia & Velldal, Erik (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian, In Nina Tahmasebi; Lars Borin; Adam Jatowt & Yang Xu (ed.),
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change.
Association for Computational Linguistics.
ISBN 978-1-950737-31-4.
chapter.
s 202
- 209
Show summary
We measure the intensity of diachronic semantic shifts in adjectives in English, Norwegian and Russian across 5 decades. This is done in order to test the hypothesis that evaluative adjectives are more prone to temporal semantic change. To this end, 6 different methods of quantifying semantic change are used. Frequency-controlled experimental results show that, depending on the particular method, evaluative adjectives either do not differ from other types of adjectives in terms of semantic change or appear to actually be less prone to shifting (particularly, to ‘jitter’-type shifting). Thus, in spite of many well-known examples of semantically changing evaluative adjectives (like ‘terrific’ or ‘incredible’), it seems that such cases are not specific to this particular type of words.
-
Bakarov, A; Kutuzov, Andrei & Nikishina, I (2018). Russian computational linguistics: Topical structure in 2007-2017 conference papers. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
2018-May(17)
Show summary
Russian NLP community exists for at least several decades. However, academic works analyzing it are scarce. The present paper fills in this gap by topical modeling of the proceedings of three major Russian NLP conferences (Dialogue, AIST and AINL) for the years from 2007 to 2017. The resulting corpus consists of about 500 academic papers. We focus on the analysis of developing research trends manifested in topical drift over time. As a result, we show statistically how Russian NLP community interests are moving towards machine learning and how the Dialogue (as the largest venue) influences the whole computational linguistics landscape.
-
Kutuzov, Andrei (2018). Russian Word Sense Induction by Clustering Averaged Word Embeddings. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
2018-May(17), s 391- 403 Full text in Research Archive.
Show summary
The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE’2018). Our team was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th for the bts-rnc and active-dict datasets (containing mostly polysemous words) among all 19 participants. The method we employed was extremely naive. It implied representing contexts of ambiguous words as averaged word embedding vectors, using off-the-shelf pre-trained distributional models. Then, these vector representations were clustered with mainstream clustering techniques, thus producing the groups corresponding to the ambiguous word’ senses. As a side result, we show that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data—not only in intrinsic evaluation, but also in downstream tasks like word sense induction.
-
Kutuzov, Andrei & Kunilovskaya, Maria (2018). Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus. Lecture Notes in Computer Science.
ISSN 0302-9743.
10716 LNCS, s 47- 58 . doi:
10.1007/978-3-319-73013-4_5
Show summary
In this paper, we present a distributional word embedding model trained on one of the largest available Russian corpora: Araneum Russicum Maximum (over 10 billion words crawled from the web). We compare this model to the model trained on the Russian National Corpus (RNC). The two corpora are much different in their size and compilation procedures. We test these differences by evaluating the trained models against the Russian part of the Multilingual SimLex999 semantic similarity dataset. We detect and describe numerous issues in this dataset and publish a new corrected version. Aside from the already known fact that the RNC is generally a better training corpus than web corpora, we enumerate and explain fine differences in how the models process semantic similarity task, what parts of the evaluation set are difficult for particular models and why. Additionally, the learning curves for both models are described, showing that the RNC is generally more robust as training material for this task.
-
Kutuzov, Andrei; Øvrelid, Lilja; Szymanski, Terrence & Velldal, Erik (2018). Diachronic word embeddings and semantic shifts: a survey, In
Proceedings of the 27th International Conference on Computational Linguistics.
Association for Computational Linguistics.
ISBN 978-1-948087-50-6.
conference paper.
s 1384
- 1397
Full text in Research Archive.
Show summary
Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.
-
Nikishina, Irina; Bakarov, Amir & Kutuzov, Andrei (2018). RusNLP: Semantic search engine for Russian NLP conference papers. Lecture Notes in Computer Science.
ISSN 0302-9743.
11179 LNCS, s 111- 120 . doi:
10.1007/978-3-030-11027-7_11
-
Sadov, Mikhail A. & Kutuzov, Andrei (2018). Use of morphology in distributional word embedding models: Russian language case. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
2018-May(17), s 1- 12
-
Ustalov, Dmitry; Panchenko, Alexander; Kutuzov, Andrei; Biemann, Chris & Ponzetto, Simone (2018). Unsupervised Semantic Frame Induction using Triclustering, In
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Association for Computational Linguistics.
ISBN 978-1-948087-34-6.
conference paper.
s 55
- 62
Full text in Research Archive.
Show summary
We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction. We cast the frame induction problem as a triclustering problem that is a generalization of clustering for triadic data. Our replicable benchmarks demonstrate that the proposed graph-based approach, Triframes, shows state-of-the art results on this task on a FrameNet-derived dataset and performing on par with competitive methods on a verb class clustering task.
-
Kunilovskaya, Maria & Kutuzov, Andrei (2017). Testing target text fluency: A machine learning approach to detecting syntactic translationese in English-Russian translation, In
New perspectives on cohesion and coherence: Implications for translation.
Language Science Press.
ISBN 978-3-946234-72-2.
Chapter 5.
s 75
- 103
Show summary
This research is aimed at the semi-automatic detection of divergences in sentence structures between Russian translated texts and non-translations. We focus our attention on atypical syntactic features of translations, because they have a greater negative impact on the overall textual quality than lexical translationese. Inadequate syntactic structures bring about various issues with target text fluency, which reduces readability and the reader's chances to get to the text message. From a procedural viewpoint, faulty syntax implies more post-editing effort. In the framework of this research, we reveal cases of syntactic translationese as dissimilarities between patterns of selected morphosyntactic and syntactic features (such as part of speech and sentence length) in the context of sentence boundaries observed in comparable monolingual corpora of learner translated and non-translated texts in Russian. To establish these syntactic differences we resort to a machine learning approach as opposed to the usual statistical significance analyses. To this end we employ models that predict unnatural sentence boundaries in translations and highlight factors that are responsible for their `foreignness'. For the first stage of the experiment, we train a decision tree model to describe the contextual features of sentence boundaries in the reference corpus of Russian texts. At the second stage, we use the results of the first multifactorial analysis as indicators of learner translators' choices that run counter to the regularities of the standard language variety. The predictors and their combinations are evaluated as to their efficiency for this task. As a result we are able to extract translated sentences whose structure is atypical against Russian texts produced without the constraints of the translation process and which, therefore, can be tentatively considered less fluent. These sentences represent cases of translationese.
-
Kunilovskaya, Maria & Kutuzov, Andrei (2017). Universal Dependencies-based syntactic features in detecting human translation varieties, In Jan Hajič (ed.),
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories.
Association for Computational Linguistics.
ISBN 978-80-88132-04-2.
chapter.
s 27
- 36
Show summary
In this paper, syntactic annotation is used to reveal linguistic properties of translations. We employ the Universal Dependencies framework to represent learner and professional translations of English mass-media texts into Russian (along with non-translated Russian texts of the same genre) with the aim to discover and describe syntactic specificity of translations produced at different levels of competence. The search for differences between varieties of translation and the native texts is augmented with the results obtained from a series of machine learning classifications experiments. We show that syntactic structures have considerable predictive power in translationese detection, on par with the known low-level lexical features.
-
Kutuzov, Andrei (2017). Arbitrariness of Linguistic Sign Questioned: Correlation between Word Form and Meaning in Russian. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
1(16), s 109- 120 Full text in Research Archive.
Show summary
In this paper, we present the results of preliminary experiments on finding the link between the surface forms of Russian nouns (as represented by their graphic forms) and their meanings (as represented by vectors in a distributional model trained on the Russian National Corpus). We show that there is a strongly significant correlation between these two sides of a linguistic sign (in our case, word). This correlation coefficient is equal to 0.03 as calculated on a set of 1 729 mono-syllabic nouns, and in some subsets of words starting with particular two-letter sequences the correlation raises as high as 0.57. The overall correlation value is higher than the one reported in similar experiments for English (0.016). Additionally, we report correlation values for the noun subsets related to different phonaesthemes, supposedly represented by the initial characters of these nouns.
-
Kutuzov, Andrei; Fares, Murhaf; Oepen, Stephan & Velldal, Erik (2017). Word vectors, reuse, and replicability: Towards a community repository of large-text resources, In Jörg Tiedemann (ed.),
Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa).
Linköping University Electronic Press.
ISBN 978-91-7685-601-7.
chapter.
s 271
- 276
Full text in Research Archive.
Show summary
This paper describes an emerging shared repository of large-text resources for creating word vectors, including pre-processed corpora and pre-trained vectors for a range of frameworks and configurations. This will facilitate reuse, rapid experimentation, and replicability of results.
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2017). Two centuries in two thousand words: Neural embedding models in detecting diachronic lexical changes, In
Quantitative Approaches to the Russian Language.
Routledge.
ISBN 9781138097155.
chapter.
Show summary
In this paper, we show how Continuous Bag-of-Words (Mikolov et al., 2013) models trained on time-separated sub-corpora of the Russian National Corpus can be used to detect automatically words that may have undergone semantic changes. Our central assumption is that online training of such models with new textual data results in a “drift” of word vectors in the semantic space. Given that vectors represent the “meaning” of entities, this drift can be taken to reflect semantic shifts in the words experiencing it. As a result, we were able to closely replicate manually compiled lists of semantically changed Russian words from the existing body of research and substantially extend them in a largely unsupervised way. This idea is one of the reasons for the title of this paper, which in a way serves as a complement to the “20 words” in (Daniel & Dobrushina, 2016).
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2017). WebVectors: A toolkit for building web interfaces for vector semantic models. Communications in Computer and Information Science.
ISSN 1865-0929.
661, s 155- 161 . doi:
10.1007/978-3-319-52920-2_15
Show summary
The paper presents a free and open source toolkit which aim is to quickly deploy web services handling distributed vector models of semantics. It fills in the gap between training such models (many tools are already available for this) and dissemination of the results to general public. Our toolkit, WebVectors, provides all the necessary routines for organizing online access to querying trained models via modern web interface. We also describe two demo installations of the toolkit, featuring several efficient models for English, Russian and Norwegian.
-
Kutuzov, Andrei; Kuzmenko, Elizaveta & Pivovarova, Lidia (2017). Clustering of Russian Adjective-Noun Constructions using Word Embeddings, In Lidia Pivovarova; Jakub Piskorski & Tomaž Erjavec (ed.),
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing.
Association for Computational Linguistics.
ISBN 978-1-945626-45-6.
chapter.
s 3
- 13
Full text in Research Archive.
Show summary
This paper presents a method of automatic construction extraction from a large corpus of Russian. The term `construction' here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, 'a glass of [water/juice/milk]'. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it with other clustering techniques and evaluate it against A Russian-English Collocational Dictionary of the Human Body that contains manually annotated groups of constructions with nouns denoting human body parts. The best performing method is used to cluster all adjective-noun bigrams in the Russian National Corpus. Results of this procedure are publicly available and can be used to build a Russian construction dictionary, accelerate theoretical studies of constructions as well as facilitate teaching Russian as a foreign language.
-
Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2017). Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants, In
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.
Association for Computational Linguistics.
ISBN 978-1-945626-83-8.
chapter.
s 1825
- 1830
Full text in Research Archive.
Show summary
This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let us map between members of the relation. The proposed approach is evaluated on the task of predicting insurgent armed groups based on geographical locations. The gold standard data for the time span 1994--2010 is extracted from the UCDP Armed Conflicts dataset. The results show that the method is feasible and outperforms the baselines, but also that important work still remains to be done.
-
Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2017). Tracing armed conflicts with diachronic word embedding models, In Tommaso Caselli (ed.),
Proceedings of the Events and Stories in the News Workshop.
Association for Computational Linguistics.
ISBN 978-1-945626-63-0.
Chapter.
s 31
- 36
Full text in Research Archive.
Show summary
Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts for particular words. In this paper, we evaluate some of these approaches on the new task of predicting the dynamics of global armed conflicts on a year-to-year basis, using a dataset from the field of conflict research as the gold standard and the Gigaword news corpus as the training data. The results show that much work still remains in extracting `cultural' semantic shifts from diachronic word embedding models. At the same time, we present a new task complete with an evaluation set and introduce the `anchor words' method which outperforms previous approaches on this data.
-
Lison, Pierre & Kutuzov, Andrei (2017). Redefining Context Windows for Word Embedding Models: An Experimental Study, In Jörg Tiedemann (ed.),
Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa).
Linköping University Electronic Press.
ISBN 978-91-7685-601-7.
chapter.
s 284
- 288
Full text in Research Archive.
Show summary
Distributional semantic models learn vector representations of words through the contexts they occur in. Although the choice of context (which often takes the form of a sliding window) has a direct influence on the resulting embeddings, the exact role of this model component is still not fully understood. This paper presents a systematic analysis of context windows based on a set of four distinct hyperparameters. We train continuous Skip- Gram models on two English-language corpora for various combinations of these hyper-parameters, and evaluate them on both lexical similarity and analogy tasks. Notable experimental results are the positive impact of cross-sentential contexts and the surprisingly good performance of right-context windows.
-
Smirnov, Ivan V.; Kuznetsova, Rita; Kopotev, Mikhail; Khazov, Andrey; Lyashevskaya, Olga; Ivanova, L. & Kutuzov, Andrei (2017). Evaluation tracks on plagiarism detection algorithms for the Russian language. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
1(16), s 271- 283
-
Koslowa, Olga & Kutuzov, Andrei (2016). Improving Distributional Semantic Models Using Anaphora Resolution during Linguistic Preprocessing. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii.
ISSN 2221-7932.
15, s 288- 299
-
Kutuzov, Andrei; Kopotev, Mikhail; Sviridenko, Tatyana & Ivanova, Lyubov (2016). Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints, In
Proceedings of the Ninth Workshop on Building and Using Comparable Corpora, held at LREC-2016.
European Language Resources Association.
ISBN 978-2-9517408-9-1.
Conference paper.
s 3
- 10
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2016). Cross-lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models, In
Proceedings of the First International Workshop on Recent Trends in News Information Retrieval co-located with 38th European Conference on Information Retrieval (ECIR 2016).
Technical University of Aachen.
ISBN 978-3-319-30671-1.
Chapter.
s 27
- 32
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2016). Neural Embedding Language Models in Semantic Clustering of Web Search Results, In
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
European Language Resources Association.
ISBN 978-2-9517408-9-1.
Conference paper.
s 3044
- 3048
-
Kutuzov, Andrei; Kuzmenko, Elizaveta & Marakasova, Anna (2016). Exploration of register-dependent lexical semantics using word embeddings, In
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH).
Association for Computational Linguistics.
ISBN 978-4-87974-708-2.
Chapter.
s 26
- 34
-
Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2016). Redefining part-of-speech classes with distributional semantic models, In
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL).
Association for Computational Linguistics.
ISBN 978-1-945626-19-7.
Chapter.
s 115
- 125
Full text in Research Archive.
-
Kutuzov, Andrei (2015). Semantic Clustering of Russian Web Search Results: Possibilities and Problems, In
Information retrieval.
Springer Publishing Company.
ISBN 978-3-319-25485-2.
Chapter.
s 320
- 331
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2015). Comparing Neural Lexical Models of a Classic National Corpus and a Web Corpus: The Case for Russian, In Alexander Gelbukh (ed.),
Computational Linguistics and Intelligent Text Processing.
Springer Publishing Company.
ISBN 978-3-319-18111-0.
Chapter.
s 47
- 58
-
Kutuzov, Andrei & Kuzmenko, Elizaveta (2015). Semi-automated typical error annotation for learner English essays: integrating frameworks, In
Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015.
Linköping University Electronic Press.
ISBN 978-91-7519-036-5.
Chapter.
s 35
- 41
View all works in Cristin
Published Oct. 14, 2015 6:11 PM
- Last modified Nov. 15, 2019 4:02 PM