Enriching contextual word embeddings with character information
tarafından
 
Polatbilek, Ozan, author.

Başlık
Enriching contextual word embeddings with character information

Yazar
Polatbilek, Ozan, author.

Yazar Ek Girişi
Polatbilek, Ozan, author.

Fiziksel Tanımlama
x, 79 leaves: charts;+ 1 computer laser optical disc.

Özet
Natural Language Processing has become more and more popular with the recent advances in Artificial Intelligence. Fundamental improvements have been introduced in word representations to store semantic and/or syntactic features. With the recently published language model BERT, contextual word vectors could be generated. This model do not process character level information. In morphologically rich languages such as Turkish, this model's perception of syntax could be improved. In this thesis, a new model, called BERT-ELMo, which is a combination of BERT and ELMo, is proposed to enrich BERT with character level information. This model combines character level processing part of ELMo and contextual word representation part of the BERT model. To show the effectiveness of the proposed model, both quantitative (question answering) and qualitative (word analogy, word contextualization, morphological meaning, out of vocabulary word capturing) analyses are performed and it is compared with BERT on Turkish language. Thanks to character level addition, proposed model is able get trained in any language without any pre-analysis.To the best of our knowledge, this is the first study which uses morphological analysis to train the BERT model in Turkish, and the first model to integrate a character level module to BERT.

Konu Başlığı
Natural language processing (Computer science).
 
Machine learning.

Yazar Ek Girişi
Tekir, Selma,

Tüzel Kişi Ek Girişi
İzmir Institute of Technology. Computer Engineering.

Tek Biçim Eser Adı
Thesis (Master)--İzmir Institute of Technology: Computer Engineering.
 
İzmir Institute of Technology: Computer Engineering--Thesis (Master).

Elektronik Erişim
Access to Electronic Versiyon.


LibraryMateryal TürüDemirbaş NumarasıYer Numarası
IYTE LibraryTezT002177QA76.9.N38 P75 2020