alexanderpanchenko



Personal Websites

Alexander Panchenko

Associate Professor, Head of the Natural Language Processing Laboratory
Natural Language Processing Laboratory
Center for Artificial Intelligence Technology

Hello, I am Alexander, an assistant professor for Natural Language Processing (NLP). My main research interest is computational lexical semantics, including word sense embeddings, word sense induction, extraction of lexical resources, and other related topics. I am also interested in argument mining. More generally, I am interested in neural and statistical natural language processing, information retrieval, knowledge bases, machine learning and intersections/interactions of these fields. You can find the list of my publications below on this page and also at Google Scholar.

I am with the Skoltech since 2019. My background is almost a decade of exciting research and developments in the field of NLP: I worked on a range of problems and tasks, such as semantic relatedness, word sense disambiguation, and induction, sentiment analysis, gender detection, taxonomy induction, etc . Before Skoltech, I was a Postdoctoral researcher in the group of Chris Biemann at the University of Hamburg, Germany.  Prior to the appointment in Hamburg, I had a position of Postdoc at TU Darmstadt. I received my PhD in Computational Linguistics from the Université catholique de Louvain, Belgium. During these years, I (co-)authored more than 40 peer-reviewed research publications, including papers in top-tier conference proceedings, such as ACL, EMNLP, EACL, and ECIR receiving (with co-authors) the best paper awards at the “Representation Learning for NLP” (RepL4NLP) workshop at ACL 2016 and SemEval’2019 competition on “Unsupervised Frame Induction”. I co-organised two shared tasks on semantic relatedness and word sense induction evaluation for the Russian language (RUSSE’15 and RUSSE’18). I served also as a co-editor of a data science conference on Analysis of Social Networks, Images, and Texts (AIST) with the proceedings published in Springer LNCS series.

Information for prospective students willing to do a research project on a topic related to NLP.

Below is a list of some selected research highlights in grouped by fields of interest. The full list of references can be found on the publication tab.

Lexical Semantics

Idea: To induce sense inventories for 158 languages and design a word sense disambiguation (WSD) algorithm for them using only word embeddings with no training data. These way we enable WSD for low-resourced languages.
Paper: Word Sense Disambiguation for 158 Languages using Word Embeddings Only (LREC-2020)
Illustration: screenshot-2020-10-28-at-14-51-03
Ideas:  (1) To perform lexical substitution in context, integration of information about the target work is important; (2) The majority of the lexical substitutes are co-hyponyms and synonyms, but the distribution varies across parts of speeches and models.
Paper:  Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution (COLING-2020)
Illustration: screenshot-2020-10-28-at-18-49-50
Idea: Induce from text interpretable representations of word senses fitted with images, hypernyms, definitions, and so on in a completely unsupervised fashion and design a system for word sense disambiguation on the basis of this sense inventory.
Reference: Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation (EMNLP-2017)
Illustration: screenshot-2020-10-28-at-19-11-41
Idea: Similarly to a social ego-network, where an individual has several circles of close groups that do not overlap, related words cluster in a similar way forming word senses. An ego-network based graph clustering can be used to automatically identify word senses based on any pre-trained word embedding model.
Paper:  Making Sense of Word Embeddings (ACL-2016)
Illustration: screenshot-2020-10-28-at-19-16-53

Argument Mining

Idea:  It is possible to mine the Web for comparative statements to help to answer comparative questions (like “Is python better than Matlab for deep learning?”) and design a system that would fulfill the information needs of the users more efficiently than the usual web search.
Paper:  Answering Comparative Questions: Better than Ten-Blue-Links? (CHIIR-2019)
Illustration:  screenshot-2020-10-28-at-19-25-19
Idea:  Argument mining can be cast as a tagging task, we make available the recent neural models readily available for the integration into various NLP pipelines as well as for interactive analysis of user texts.
Paper:  TARGER: Neural Argument Mining at Your Fingertips (ACL-2019)
Illustration:  screenshot-2020-10-28-at-19-29-07

List of the Main Research Interests

A more complete list of research interests is listed below:

  • Lexical semantics (especially word sense induction and disambiguation, frame induction and disambiguation, semantic similarity and relatedness, sense embeddings, automated construction and completion of lexical resources such as WordNet and FrameNet)
  • Argument mining (especially comparative argument mining, and argument retrieval)
  • Learning representations of linguistic symbolic structures (graphs) such as knowledge bases and lexical resource
  • NLP for a better society: recognition of fake news, hate speech, and related phenomena
  • Textual style transfer

Monographs

  • Alexander Panchenko (2013): Similarity Measures for Semantic Relation Extraction, PhD Thesis, Université catholique de Louvain
  • Alexander Panchenko (2008): Automatic Thesaurus Construction System, Graduation Thesis, Moscow State Technical University (BMSTU)

Edited volumes

Journal articles

Conference proceedings

Workshop proceedings


Supervision of PhD Theses

  • Özge Sevgili Ergüven (10.2018-…) is co-supervised with Chris Biemann (Germany). Özge is supported by DAAD (Deutscher Akademischer Austauschdienst) and is based at the University of Hamburg. Her PhD research is related to neural entity linking and representation learning on graphs.
  • Saba Anwar (10.2018-…) is co-supervised with Chris Biemann (Germany). Saba is supported by DAAD (Deutscher Akademischer Austauschdiens) and the Higher Education Commission of Pakistan. Her PhD research is related to using of neural language models for unsupervised frame induction.
  • Irina Nikishina (11.2019-…). Research of Irina is related to the development of methods for the automatic construction of lexical resources and the application of multilinear algebra to NLP.
  • Daryna Dementieva (11.2019-…) Research of Daryna is related to the detection of fake news and textual style transfer.
  • Victoriia Chekalina (11.2019-…) Research of Victoriia is related to comparative argument mining and multilinear algebra for NLP.
  • Anton Razzhigaev (11.2020-…) Research of Anton is related to vectorization of knowledge graphs and question answering (KBQA).

Supervision of Master and Bachelor Theses

I supervised research-oriented Master theses, usually also aiming to publish a conference paper on the basis of the produced materials.

  • Dmitry Puzyrev (2021, Skoltech).Policy-based strategies for active learning in a scalable setup. Co-supervised with Artem Shelmanov.
  • Anton Voronov (2021, Skoltech-MIPT): Automatic Dialogue Censor – Style Transfer for Texts. Now with Sberbank AIR institute.
  • Lyubov Kupriyanova  (2021, Skoltech): Uncertainty Estimation for Active Learning and Misclassification detection in NLP. Co-supervised with Artem Shelmanov.
  • Vitaly Protasov (2021, Skoltech-MIPT): Cross-lingual lexical substitution. Now with Sberbank AIR institute.
  • Denis Teslenko (2020, Ural Federal University): Multilingual Graph-based Word Sense Disambiguation with Word Embedding Only.
  • Heike Heller (2019, University of Hamburg): Comparative Query Suggestion. Co-supervised with Chris Biemann.
  • Dmitry Puzyrev (2019, Higher School of Economics, CS faculty): Supervised Approaches to Detection of Noun Compositionality. Co-supervised with Artem Shelmanov and Ekaterina Artemova.
  • Matthias Schildwächter (2019, University of Hamburg): An Open-Domain System for Retrieval and Visualization of Comparative Arguments from Text.
  • Alvin Rindra Fazrie (2019, University of Hamburg): Visual Information Management with Compound Graphs. Main supervisor: Steffen Remus.
  • Dahmash Ibrahim (2018, University of Hamburg): Question Answering using Dynamic Neural Networks. Main supervisor: Benjamin Milde.
  • Mirco Franzek (2018, University of Hamburg). Comparative Argument Mining. Co-supervised with Chris Biemann.
  • Marten Fide (2017, TU Darmstadt). Predicting hypernyms in contexts with JoBimText. Co-supervised with Chris Biemann. Now at TU Darmstadt.
  • Maria Pelevina (2016, TU Darmstadt). Unsupervised Word Sense Disambiguation with Sense Embeddings. Co-supervised with Chris Biemann. Now at Deutsche Bahn R&D.
  • Simon Dif (2015, TU Darmstadt). Statistical Models of Semantics with Structured Topics. Co-supervised with Chris Biemann. A dual degree Masters program with ENSIMAG, Grenoble, France. Now at Altran R&D.
  • Alexey Romanov (2012, Moscow State Technical University). Graph Algorithms in the Lexical Semantic Search Engine ‘Serelex’. Co-supervised with Andrew Philippovich. Now at the University of Massachusets Lowell.

Internships & Visiting Researchers

I help to write research proposals to funding organizations which let researchers visit our faculty and do interesting short-term research project together.

  • Dmitry Puzyrev (2019): Using hyperbolic word embeddings for detection of noun compositionality. Partially funded by the University of Hamburg. Visit outcome: ACL publication.
  • Shantanu Acharya (2018): Taxonomy induction using word sense representations. Funded by DAAD. Visit outcome: ACL publication.
  • Andrey Kutuzov (2018): Learning graph embeddings via node similarities. Funded by the University of Oslo. Visit outcome: ACL and *SEM publications.
  • Artem Chernodub (2017): Recurrent Neural Networks for Argument Mining. Funded by DAAD. Visit outcome: ACL demo paper.
  • Dmitry Ustalov (2016): Graph Clustering for Word Sense Induction. Funded by DAAD. Visit outcome: ACL and EACL publications.
  • Statistical Natural Language Processing (Spring, 2019). A course for Master students. Slides. The course is based on the classic textbook of Jurafsky & Martin and represents a set of topics on (mostly) pre-neural NLP.
  • Neural Natural Language Processing (Winter, 2019). A course for Master students. Slides. This course is focusing mostly on neural NLP models.
  • A guest lecture at the Deep Learning for Natural Language Processing course at Indian Institute of Technology (IIT) Patna, India on “Word and document embeddings” (Winter, 2020).

Organization of Events

Presentations and Invited Talks

Area Chair for Conferences

  • European Chapter of the Association for Computational Linguistics (EACL-2021). Area chair in the field of lexical semantics.

Programme Committee for Conferences and Workshops

  • ICLR: International Conference on Representation Learning (2021)
  • NeurIPS: Conference on Neural Information Processing Systems (2021)
  • BSNLP: Workshop on Balto-Slavic Natural Language Processing (2021)
  • EACL: European Chapter of the Annual Meeting of the Association of Computational Linguistics (2019, 2021)
  • BIAS: International Workshop on  Algorithmic Bias in Search and Recommendation co-located with the 42nd European Conference on Information Retrieval, ECIR (2020)
  • ISCW: International Conference on Computational Semantics, ACL SIGSEM special interest group on semantics (2019)
  • CoNLL: The SIGNLL Conference on Computational Natural Language Learning (2018, 2019)
  • ACL: Annual Meeting of the Association for Computational Linguistics (2018, 2019, 2020)
  • *SEM: Joint Conference on Lexical and Computational Semantics (2018, 2019)
  • NAACL: North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2018, 2019)
  • SocInfoSocial Informatics  international conference (2018)
  • CLL: 3rd Workshop on Computational Linguistics and Language Science (2018)
  • EMNLP: The Conference on Empirical Methods on Natural Language Processing (2017, 2018, 2019, 2020)
  • ESWC: The Semantic Web conference (2017, 2018)
  • ASSET: Workshop on Advanced Solutions for Semantic Extraction from Texts co-located with the ESWC conference (2017)
  • TextGraphs: Workshop on Text Graph co-located with the ACL/EMNLP/NAACLconferences (2016, 2017, 2018, 2019)
  • ReprL4NLP: Workshop on Representation Learning for NLP co-located with the ACL conference (2017, 2018).
  • SMERP: International Workshop on Exploitation of Social Media for Emergency Relief and Preparedness (co-located with 39th European Conference on Information Retrieval, ECIR (2017)
  • COLING: International Conference on Natural Language Processing (2016, 2018)
  • AINL: Conference on Artificial Intelligence and Natural Language  (2015, 2016, 2017)
  • SEMANTiCS: International Conference on Semantic Systems (2016, 2017)
  • Dialogue: International Conference on Computational Linguistics (2015, 2016, 2017, 2018)
  • RECITAL: Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, co-located with TALN conference (2015, 2016, 2017)
  • NLDB 2015, 2016, 2017, 2022: International Conference on Natural Language & Information Systems
  • WI: IEEE/WIC/ACM International Conference on Web Intelligence (2014, 2015)
  • RuSSIR: Young Scientists Conference at Russian Summer School in Information Retrieval (2014, 2015)
  • AIST: Conference on Analysis of Images, Social Networks, and Texts (2014, 2015, 2016, 2017, 2018)
  • RANLP: Conference on Recent Advances in Natural Language Processing (2013, 2015, 2019)
  • LTC: The Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (2011, 2013)
  • CANI: a workshop of International Joint Conferences on Artificial Intelligence, IJCAI (2020)

Journal Reviewing

  • Machine Learning. Springer-Nature (2021)
  • Journal of Machine Learning Research (JMLR), MIT press (2020)
  • Information Processing & Management, Elsevier (2018, 2019, 2020)
  • Language Resources & Evaluation, Springer (2018, 2019, 2020, 2021)
  • PLOS ONE (2018)
  • Natural Language Engineering, Cambridge University Press (2018)
  • Data & Knowledge Engineering (DATAK), Elsevier (2017, 2018)
  • International Journal of Artificial Intelligence and Soft Computing, Interscience (2016)
  • Internet Computing Journal, IEEE (2015)
  • International Journal of Child Abuse & Neglect, Elsevier (2014)

Reviewing of Ph.D. Dissertations

  • “Linguistic interpretation and evaluation of word vector models for the Russian language” by Tatiana Shavrian, Higher School of Economics
  • “Advancing Lexical Substitution via Automatically Built Resources and Generative Approaches” by Caterina Lacerra, Sapienza Università di Roma
  • “Specialisation of language models for the natural language processing tasks” by Yuri Kuratov, Moscow Institute of Physics and Technology
  • “Methods of Network Embeddings and their Applications” by Ilya Makarov, University of Ljubljana
  • Combinatorial and neural graph vector representations” by Sergei Ivanov, Skolkovo Institute of Science and Technology
  • “Knowledge-based approaches to producing large-scale training data from scratch for Word Sense Disambiguation and Sense Distribution Learning” by Tomasso Passini, Sapienza Università di Roma
  • “Methods for compression of neural networks for natural language processing”, Artem Grachev, Higher School of Economics

 Reviewing of Master Dissertations

  • Grigory Arshinov (2021):  Knowledge graph based question answering dataset for Russian. Higher School of Economics (Faculty of Computational Linguistics)
  • Boris Sheludko (2021). Lexical substitution methods for NLP tasks. Moscow State University (Faculty of Computing Sciences and Mathematics)
  • Maxim Fedoseev (2021): Methods for building vector representations of word senses in context. Moscow State University (Faculty of Computing Sciences and Mathematics)
  • Adis Davletov (2021): Methods for extraction of information from definitions of words. Moscow State University (Faculty of Computing Sciences and Mathematics)