Transformers using local attention mappings for text document classification için kapak resmi
Transformers using local attention mappings for text document classification
Transformers using local attention mappings for text document classification
Haman, Bekir Ufuk, author.
Yazar Ek Girişi:
Fiziksel Tanımlama:
viii, 45 leaves: illustrarions, charts; 29 cm + 1 computer laser optical disc.
Transformer models are powerful and flexible encoder-decoder structures that have proven their success in many fields, including natural language processing. Although they are especially successful in working with textual input, classifying texts, answering questions, and producing text, they have difficulty processing long texts. Current leading transformer models such as BERT limit input lengths to 512 tokens. The most prominent reason for this limitation is that the self-attention operation, which forms the backbone of the transformer structure, requires high processing power. This processing power requirement, which increases quadratically with the input length, makes it impossible for transformers to process long texts. However, new transformer structures that use various local attention mapping methods have begun to be proposed to overcome the text length challenge. This study first proposes two alternative local attention mapping methods to make transformer models capable of processing long texts. In addition, it presents the "Refined Patents" dataset consisting of 200,000 patent documents, specifically prepared for the long text document classification task. The proposed attention mapping methods, Term Frequency - Inverse Document Frequency (TF-IDF) and Point Mutual Information (PMI), create a sparse version of the self-attention matrix based on the occurrence statistics of words and word pairs. These methods were implemented based on the Longformer and Big Bird models, and tested on the Refined Patents dataset. Test results show that both proposed approaches are acceptable local attention mapping alternatives and can be used to enable long text processing in transformers.
Yazar Ek Girişi:
Tek Biçim Eser Adı:
Thesis (Master)-- İzmir Institute of Technology: Computer Engineering

İzmir Institute of Technology: Computer Engineering. (Master).
Elektronik Erişim:
Access to Electronic Versiyon.
Ayırtma: Copies: