Transformers using local attention mappings for text document classification
tarafından
Haman, Bekir Ufuk, author.
Başlık
:
Transformers using local attention mappings for text document classification
Yazar
:
Haman, Bekir Ufuk, author.
Yazar Ek Girişi
:
Haman, Bekir Ufuk, author.
Fiziksel Tanımlama
:
viii, 45 leaves: illustrarions, charts; 29 cm + 1 computer laser optical disc.
Özet
:
Transformer models are powerful and flexible encoder-decoder structures that have proven their success in many fields, including natural language processing. Although they are especially successful in working with textual input, classifying texts, answering questions, and producing text, they have difficulty processing long texts. Current leading transformer models such as BERT limit input lengths to 512 tokens. The most prominent reason for this limitation is that the self-attention operation, which forms the backbone of the transformer structure, requires high processing power. This processing power requirement, which increases quadratically with the input length, makes it impossible for transformers to process long texts. However, new transformer structures that use various local attention mapping methods have begun to be proposed to overcome the text length challenge. This study first proposes two alternative local attention mapping methods to make transformer models capable of processing long texts. In addition, it presents the "Refined Patents" dataset consisting of 200,000 patent documents, specifically prepared for the long text document classification task. The proposed attention mapping methods, Term Frequency - Inverse Document Frequency (TF-IDF) and Point Mutual Information (PMI), create a sparse version of the self-attention matrix based on the occurrence statistics of words and word pairs. These methods were implemented based on the Longformer and Big Bird models, and tested on the Refined Patents dataset. Test results show that both proposed approaches are acceptable local attention mapping alternatives and can be used to enable long text processing in transformers.
Konu Başlığı
:
Natural language processing (Computer science)
Yazar Ek Girişi
:
Tekir, Selma,
Tüzel Kişi Ek Girişi
:
İzmir Institute of Technology. Computer Engineering.
Tek Biçim Eser Adı
:
Thesis (Master)-- İzmir Institute of Technology: Computer Engineering
İzmir Institute of Technology: Computer Engineering. (Master).
Elektronik Erişim
:
Library | Materyal Türü | Demirbaş Numarası | Yer Numarası | Durumu/İade Tarihi |
---|
IYTE Library | Tez | T002868 | QA76.9.N38 H19 2023 | Tez Koleksiyonu |