Automatic quote detection from literary work
tarafından
Altıntaş, Aybike Güzel, author.
Başlık
:
Automatic quote detection from literary work
Yazar
:
Altıntaş, Aybike Güzel, author.
Yazar Ek Girişi
:
Altıntaş, Aybike Güzel, author.
Fiziksel Tanımlama
:
ix, 60 leaves: charts;+ 1 computer laser optical disc.
Özet
:
Literature inspires readers, and readers tend to share quotes from a literary work. The reader underlines the quotes in the book and shares them on social media, or on an online platform used by book readers. The definition of a quote is a span in a written text that is interesting for many readers and readers can use the quote in different contexts. In this study, a novel task in the field of Natural Language Processing is proposed: the Quote Detection Task. Also, an original dataset was formed from the Goodreads and Gutenberg websites with web scraping. Quotes are Goodreads data sourced from Kaggle and data that has been voted by 10 or more users are selected. These quotes have been validated with the books on the Project Gutenberg website. The final dataset consists of 4554 rows. The dataset contains quotes with their book spans. The span of a quote consists of the previous 10 sentences of the quote, the quote itself, and the following 10 sentences of the quote. Conditional Random Field (CRF) and Extractive Summarization as Text Matching (MatchSum) were run as two different baselines for quote detection. The Quote Detection Task is span detection that can be modeled with sequence labeling solutions and Neural extractive summarization systems in the literature. For this sequence tagging problem, the statistics-based CRF was run as first baseline. Extractive Summarization as Text Matching baseline is the second baseline chosen for the experimental part. Rouge-1 scores of 27.24% and 40.54%, respectively, were obtained from these baselines.
Konu Başlığı
:
Natural language processing (Computer science)
Yazar Ek Girişi
:
Tekir, Selma,
Tüzel Kişi Ek Girişi
:
İzmir Institute of Technology. Computer Engineering.
Tek Biçim Eser Adı
:
Thesis (Master)--İzmir Institute of Technology:Computer Engineering.
İzmir Institute of Technology: Computer Engineering--Thesis (Master).
Elektronik Erişim
:
Library | Materyal Türü | Demirbaş Numarası | Yer Numarası | Durumu/İade Tarihi |
---|
IYTE Library | Tez | T002673 | QA76.9.N38 A46 2022 | Tez Koleksiyonu |