Andrei Kutuzov

Image of Andrei Kutuzov
Norwegian version of this page
Mobile phone +4740648218
Room 4423
Username
Visiting address Gaustadalléen 23B Ole-Johan Dahls hus 0373 Oslo
Postal address Postboks 1080 Blindern 0316 Oslo
Other affiliations Institutt for pedagogikk (Student)

I am an associate professor in the Language Technology Group, University of Oslo. In addition, I currently serve as the Norwegian on-site manager of the High-Performance Language Technology (HPLT) project.

I prefer my first name to be spelled as "Andrey". Unfortunately, my current passport disagrees.

Academic interests

Computational linguistics and natural language processing; semantic change detection and diachronically aware language models; distributional semantics, machine learning, large-scale language models.

Among other things, I participated in designing and training NorBERT and NorELMo models and very large-scale NORA.LLM generative models.

In 2022, I received the Norwegian Artificial Intelligence Research Consortium (NORA) award as a Distinguished Early Career Researcher.

You may also want to have a look at WebVectors, the web service we created to play with static and contextualized word embeddings for English and Norwegian languages.

Courses taught

Background

Read full CV

November 13, 2020, I defended my PhD thesis "Distributional word embeddings in modeling diachronic semantic change". The thesis is available here.

I received my Master's degree in Computational Linguistics at National Research University Higher School of Economics (Moscow) in 2014, with the thesis "Semantic clustering of Russian web search results: possibilities and problems".

Below is the list of my recent publications.

Tags: Machine Learning, Natural Language Processing, Computational Linguistics, Corpus Linguistics, Word Embeddings, Distributional Semantics, Diachronic Word Embeddings, Semantic Shifts, Semantic Change Detection, language models, NorBERT, NorELMO, HPLT

Publications

  • Chen, Pinzhen; Ji, Shaoxiong; Bogoychev, Nikolay; Kutuzov, Andrei; Haddow, Barry & Heafield, Kenneth (2024). Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca, Findings of the Association for Computational Linguistics: EACL 2024. Association for Computational Linguistics. ISSN 979-8-89176-093-6. p. 1347–1356.
  • Giulianelli, Mario; Luden, Iris; Fernandez, Raquel & Kutuzov, Andrei (2023). Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. ISSN 978-1-959429-72-2. p. 3130–3148. Full text in Research Archive
  • Samuel, David; Kutuzov, Andrei; Touileb, Samia; Velldal, Erik; Øvrelid, Lilja & Rønningstad, Egil [Show all 8 contributors for this article] (2023). NorBench – A Benchmark for Norwegian Language Models. In Alumäe, Tanel & Fishel, Mark (Ed.), Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). University of Tartu. ISSN 978-99-1621-999-7. p. 618–633. Full text in Research Archive
  • Samuel, David; Kutuzov, Andrei; Øvrelid, Lilja & Velldal, Erik (2023). Trained on 100 million words and still in shape: BERT meets British National Corpus. In Vlachos, Andreas & Augenstein, Isabelle (Ed.), Findings of the Association for Computational Linguistics: EACL 2023. Association for Computational Linguistics. ISSN 978-1-959429-47-0. p. 1954–1974.
  • Aksenova, Anna; Gavrishina, Ekaterina; Rykov, Elisei & Kutuzov, Andrei (2022). RuDSI: Graph-based Word Sense Induction Dataset for Russian, Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing. Association for Computational Linguistics. ISSN 978-1-955917-22-3. p. 77–88.
  • Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2022). Contextualized embeddings for semantic change detection: Lessons learned . Northern European Journal of Language Technology (NEJLT). ISSN 2000-1533. 8(1). doi: 10.3384/nejlt.2000-1533.2022.3478.
  • Barnes, Jeremy; Oberlaender, Laura; Troiano, Enrica; Kutuzov, Andrei; Buchmann, Jan & Agerri, Rodrigo [Show all 8 contributors for this article] (2022). SemEval 2022 Task 10: Structured Sentiment Analysis. In Emerson, Guy; Schluter, Natalie; Stanovsky, Gabriel; Kumar, Ritesh; Palmer, Alexis; Schneider, Nathan; Singh, Siddarth & Ratan, Shyam (Ed.), Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics. ISSN 978-1-955917-80-3. p. 1280–1295.
  • Kutuzov, Andrei; Touileb, Samia; Mæhlum, Petter; Enstad, Tita & Witteman, Alexandra (2022). NorDiaChange: Diachronic Semantic Change Dataset for Norwegian, Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association. ISSN 979-10-95546-72-6. p. 2563–2572. Full text in Research Archive
  • Giulianelli, Mario; Kutuzov, Andrei & Pivovarova, Lidia (2022). Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change, Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics. ISSN 978-1-955917-42-1. p. 54–67.
  • Kutuzov, Andrei; Giulianelli, Mario & Pivovarova, Lidia (2021). Grammatical Profiling for Semantic Change Detection, Proceedings of the 25th Conference on Computational Natural Language Learning. Association for Computational Linguistics. ISSN 978-1-955917-05-6. p. 423–434.
  • Iazykova, Tatyana; Kapelyushnik, Denis; Bystrova, Olga & Kutuzov, Andrei (2021). Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 20, p. 302–318. doi: 10.28995/2075-7182-2021-20-302-317.
  • Kutuzov, Andrei & Pivovarova, Lidia (2021). Three-part diachronic semantic change dataset for Russian, Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics. ISSN 978-1-954085-60-2. p. 7–13.
  • Ravishankar, Vinit; Kutuzov, Andrei; Øvrelid, Lilja & Velldal, Erik (2021). Multilingual ELMo and the Effects of Corpus Sampling. In Dobnik, Simon & Øvrelid, Lilja (Ed.), Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press. ISSN 978-91-7929-614-8. p. 378–383.
  • Kutuzov, Andrei; Barnes, Jeremy; Velldal, Erik; Øvrelid, Lilja & Oepen, Stephan (2021). Large-Scale Contextualised Language Modelling for Norwegian. In Dobnik, Simon & Øvrelid, Lilja (Ed.), Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press. ISSN 978-91-7929-614-8. p. 30–40.
  • Rodina, Julia; Trofimova, Yuliya; Kutuzov, Andrei & Artemova, Ekaterina (2021). ELMo and BERT in Semantic Change Detection for Russian, Proceedings of AIST 2020: Analysis of Images, Social Networks and Texts. Springer. ISSN 978-3-030-72610-2. p. 175–186.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2021). Representing ELMo embeddings as two-dimensional text online, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics. ISSN 978-1-954085-05-3. p. 143–148.
  • Kutuzov, Andrei; Fomin, V.; Mikhailov, V. Nikola & Rodina, Julia (2020). Shiftry: Web service for diachronic analysis of Russian news. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 2020-(19), p. 500–516. doi: 10.28995/2075-7182-2020-19-500-516.
  • Katricheva, Nadezda; Yaskevich, Alyaxey; Lisitsina, Anastasiya; Zhordaniya, Tamara; Kutuzov, Andrei & Kuzmenko, Elizaveta (2020). Vec2graph: A Python Library for Visualizing Word Embeddings as Graphs. Communications in Computer and Information Science (CCIS). ISSN 1865-0929. 1086, p. 190–198. doi: 10.1007/978-3-030-39575-9_20.
  • Kutuzov, Andrei & Giulianelli, Mario (2020). UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection, Proceedings of the Fourteenth Workshop on Semantic Evaluation. Association for Computational Linguistics. ISSN 978-1-952148-31-6. p. 126–134.
  • Rodina, Julia & Kutuzov, Andrei (2020). RuSemShift: a dataset of historical lexical semantic change in Russian, Proceedings of the 28th International Conference on Computational Linguistics. Association for Computational Linguistics. ISSN 978-1-952148-27-9. p. 1037–1047.
  • Kunilovskaya, Maria; Kutuzov, Andrei & Plum, Alistair (2020). Taxonomy enrichment for Russian: synset classification outperforms linear hyponym-hypernym projections. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 19, p. 459–469. doi: 10.28995/2075-7182-2020-19-474-484.
  • Kutuzov, Andrei; Fomin, Vadim; Rodina, Julia & Mikhailov, Vladislav (2020). ShiftRy: web service for diachronic analysis of Russian news. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 19, p. 485–501.
  • Logacheva, Varvara; Teslenko, Denis; Shelmanov, Artem; Remus, S.; Ustalov, Dmitry & Kutuzov, Andrei [Show all 10 contributors for this article] (2020). Word Sense Disambiguation for 158 Languages using Word Embeddings Only. In Calzolari, Nicoletta; Béchet, Frédéric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odijk, Jan & Piperidis, Stelios (Ed.), Proceedings of The 12th Language Resources and Evaluation Conference. European Language Resources Association. ISSN 979-10-95546-34-4. p. 5945–5954.
  • Kutuzov, Andrei & Nikishina, Irina (2019). Double-Blind Peer-Reviewing and Inclusiveness in Russian NLP Conferences, Analysis of Images, Social Networks and Texts (Revised Selected Papers). Springer Publishing Company. ISSN 978-3-030-37334-4. p. 3–8. doi: 10.1007/978-3-030-37334-4_1.
  • Droganova, Kira; Kutuzov, Andrei; Mediankin, Nikita & Zeman, Daniel (2019). ÚFAL-Oslo at MRP 2019: Garage Sale Semantic Parsing. In Oepen, Stephan; Abend, Omri; Hajic, Jan; Hershcovich, Daniel; Kuhlmann, Marco; O’Gorman, Tim & Nianwen, Xue (Ed.), Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning. Association for Computational Linguistics. ISSN 978-1-950737-60-4. p. 158–165. doi: 10.18653/v1/K19-2015.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2019). To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation. In Nivre, Joakim; Derczynski, Leon; Ginter, Filip; Lindi, Bjørn; Oepen, Stephan; Søgaard, Anders & Tidemann, Jorg (Ed.), Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing. Linköping University Electronic Press. ISSN 978-91-7929-999-6. p. 22–28.
  • Kutuzov, Andrei; Dorgham, Mohammad; Oliynyk, Oleksiy; Biemann, Chris & Panchenko, Alexander (2019). Making Fast Graph-based Algorithms with Graph Metric Embeddings. In Korhonen, Anna; Traum, David & Màrquez, Lluís (Ed.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. ISSN 978-1-950737-48-2. p. 3349–3355. doi: 10.18653/v1/P19-1325.
  • Rodina, Julia; Bakshandaeva, Daria; Fomin, Vadim; Kutuzov, Andrei; Touileb, Samia & Velldal, Erik (2019). Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian. In Tahmasebi, Nina; Borin, Lars; Jatowt, Adam & Xu, Yang (Ed.), Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics. ISSN 978-1-950737-31-4. p. 202–209. doi: 10.18653/v1/W19-4725. Full text in Research Archive
  • Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2019). One-to-X Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts. In Tahmasebi, Nina; Borin, Lars; Jatowt, Adam & Xu, Yang (Ed.), Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics. ISSN 978-1-950737-31-4. p. 196–201. doi: 10.18653/v1/W19-4724. Full text in Research Archive
  • Kutuzov, Andrei; Dorgham, Mohammad; Oliynyk, Oleksiy; Biemann, Chris & Panchenko, Alexander (2019). Learning Graph Embeddings from WordNet-based Similarity Measures. In Mihalcea, Rada; Shutova, Ekaterina; Ku, Lun-Wei; Evang, Kilian & Poria, Soujanya (Ed.), Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019). Association for Computational Linguistics. ISSN 978-1-948087-93-3. p. 125–135. doi: 10.18653/v1/S19-1014.
  • Fomin, Vadim; Bakshandaeva, Daria; Rodina, Julia & Kutuzov, Andrei (2019). Tracing Cultural Diachronic Semantic Shifts in Russian Using Word Embeddings: Test Sets and Baselines. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 2019-May(18), p. 213–227.
  • Nikishina, Irina; Bakarov, Amir & Kutuzov, Andrei (2018). RusNLP: Semantic search engine for Russian NLP conference papers. Lecture Notes in Computer Science (LNCS). ISSN 0302-9743. 11179 LNCS, p. 111–120. doi: 10.1007/978-3-030-11027-7_11.
  • Bakarov, A; Kutuzov, Andrei & Nikishina, I (2018). Russian computational linguistics: Topical structure in 2007-2017 conference papers. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 2018-May(17).
  • Sadov, Mikhail A. & Kutuzov, Andrei (2018). Use of morphology in distributional word embedding models: Russian language case. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 2018-May(17), p. 1–12.
  • Kutuzov, Andrei; Øvrelid, Lilja; Szymanski, Terrence & Velldal, Erik (2018). Diachronic word embeddings and semantic shifts: a survey, Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics. ISSN 978-1-948087-50-6. p. 1384–1397. Full text in Research Archive
  • Ustalov, Dmitry; Panchenko, Alexander; Kutuzov, Andrei; Biemann, Chris & Ponzetto, Simone (2018). Unsupervised semantic frame induction using triclustering, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics. ISSN 978-1-948087-34-6. p. 55–62. doi: 10.18653/v1/p18-2010. Full text in Research Archive
  • Kutuzov, Andrei (2018). Russian Word Sense Induction by Clustering Averaged Word Embeddings. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 2018-May(17), p. 391–403. Full text in Research Archive
  • Kutuzov, Andrei & Kunilovskaya, Maria (2018). Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus. Lecture Notes in Computer Science (LNCS). ISSN 0302-9743. 10716 LNCS, p. 47–58. doi: 10.1007/978-3-319-73013-4_5.
  • Kunilovskaya, Maria & Kutuzov, Andrei (2017). Universal Dependencies-based syntactic features in detecting human translation varieties. In Hajič, Jan (Eds.), Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories. Association for Computational Linguistics. ISSN 978-80-88132-04-2. p. 27–36.
  • Smirnov, Ivan V.; Kuznetsova, Rita; Kopotev, Mikhail; Khazov, Andrey; Lyashevskaya, Olga & Ivanova, L. [Show all 7 contributors for this article] (2017). Evaluation tracks on plagiarism detection algorithms for the Russian language. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 1(16), p. 271–283.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2017). Two centuries in two thousand words: Neural embedding models in detecting diachronic lexical changes, Quantitative Approaches to the Russian Language. Routledge. ISSN 9781138097155. doi: 10.4324/9781315105048-5.
  • Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2017). Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. ISSN 978-1-945626-83-8. p. 1825–1830. doi: 10.18653/v1/D17-1194. Full text in Research Archive
  • Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2017). Tracing armed conflicts with diachronic word embedding models. In Caselli, Tommaso (Eds.), Proceedings of the Events and Stories in the News Workshop. Association for Computational Linguistics. ISSN 978-1-945626-63-0. p. 31–36. doi: 10.18653/v1/W17-2705. Full text in Research Archive
  • Kunilovskaya, Maria & Kutuzov, Andrei (2017). Testing target text fluency: A machine learning approach to detecting syntactic translationese in English-Russian translation, New perspectives on cohesion and coherence: Implications for translation. Language Science Press. ISSN 978-3-946234-72-2. p. 75–103. doi: 10.5281/zenodo.814452.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2017). WebVectors: A toolkit for building web interfaces for vector semantic models. Communications in Computer and Information Science (CCIS). ISSN 1865-0929. 661, p. 155–161. doi: 10.1007/978-3-319-52920-2_15.
  • Kutuzov, Andrei (2017). Arbitrariness of Linguistic Sign Questioned: Correlation between Word Form and Meaning in Russian. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 1(16), p. 109–120. Full text in Research Archive
  • Lison, Pierre & Kutuzov, Andrei (2017). Redefining Context Windows for Word Embedding Models: An Experimental Study. In Tiedemann, Jörg (Eds.), Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press. ISSN 978-91-7685-601-7. p. 284–288. Full text in Research Archive
  • Kutuzov, Andrei; Fares, Murhaf; Oepen, Stephan & Velldal, Erik (2017). Word vectors, reuse, and replicability: Towards a community repository of large-text resources. In Tiedemann, Jörg (Eds.), Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press. ISSN 978-91-7685-601-7. p. 271–276. Full text in Research Archive
  • Kutuzov, Andrei; Kuzmenko, Elizaveta & Pivovarova, Lidia (2017). Clustering of Russian Adjective-Noun Constructions using Word Embeddings. In Pivovarova, Lidia; Piskorski, Jakub & Erjavec, Tomaž (Ed.), Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics. ISSN 978-1-945626-45-6. p. 3–13. doi: 10.18653/v1/W17-1402. Full text in Research Archive
  • Koslowa, Olga & Kutuzov, Andrei (2016). Improving Distributional Semantic Models Using Anaphora Resolution during Linguistic Preprocessing. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. ISSN 2221-7932. 15, p. 288–299.
  • Kutuzov, Andrei; Kuzmenko, Elizaveta & Marakasova, Anna (2016). Exploration of register-dependent lexical semantics using word embeddings, Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). Association for Computational Linguistics. ISSN 978-4-87974-708-2. p. 26–34.
  • Kutuzov, Andrei; Velldal, Erik & Øvrelid, Lilja (2016). Redefining part-of-speech classes with distributional semantic models, Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics. ISSN 978-1-945626-19-7. p. 115–125. doi: 10.18653/v1/K16-1012. Full text in Research Archive
  • Kutuzov, Andrei; Kopotev, Mikhail; Sviridenko, Tatyana & Ivanova, Lyubov (2016). Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints, Proceedings of the Ninth Workshop on Building and Using Comparable Corpora, held at LREC-2016. European Language Resources Association. ISSN 978-2-9517408-9-1. p. 3–10.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2016). Neural Embedding Language Models in Semantic Clustering of Web Search Results, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association. ISSN 978-2-9517408-9-1. p. 3044–3048.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2016). Cross-lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models, Proceedings of the First International Workshop on Recent Trends in News Information Retrieval co-located with 38th European Conference on Information Retrieval (ECIR 2016). Technical University of Aachen. ISSN 978-3-319-30671-1. p. 27–32.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2015). Semi-automated typical error annotation for learner English essays: integrating frameworks, Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015. Linköping University Electronic Press. ISSN 978-91-7519-036-5. p. 35–41.
  • Kutuzov, Andrei & Kuzmenko, Elizaveta (2015). Comparing Neural Lexical Models of a Classic National Corpus and a Web Corpus: The Case for Russian. In Gelbukh, Alexander (Eds.), Computational Linguistics and Intelligent Text Processing. Springer Publishing Company. ISSN 978-3-319-18111-0. p. 47–58. doi: 10.1007/978-3-319-18111-0_4.
  • Kutuzov, Andrei (2015). Semantic Clustering of Russian Web Search Results: Possibilities and Problems, Information retrieval. Springer Publishing Company. ISSN 978-3-319-25485-2. p. 320–331. doi: 10.1007/978-3-319-25485-2_12.

View all works in Cristin

View all works in Cristin

View all works in Cristin

Published Oct. 14, 2015 6:11 PM - Last modified Mar. 19, 2024 3:51 PM