Coreference : Annotation, Resolution and Evaluation in Polish. için kapak resmi
Coreference : Annotation, Resolution and Evaluation in Polish.
Başlık:
Coreference : Annotation, Resolution and Evaluation in Polish.
Yazar:
Savary, Agata.
ISBN:
9781614518389
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 online resource (334 pages)
İçerik:
Coreference -- Title Page -- Copyright Page -- Table of Contents -- Preface -- Part I: Introduction -- 1 Reference, anaphora, coreference -- 1.1 The concept of reference -- 1.2 The typology of reference -- 1.3 The real world versus the mental world -- 1.4 Reference, coreference and anaphora -- 1.5 Coreference and identity -- 2 Polish coreference-related studies -- 2.1 Terminology and characteristics of the problem -- 2.1.1 Klemensiewicz -- 2.1.2 Topolińska -- 2.1.3 Pasek -- 2.1.4 Fall -- 2.2 Text coherence, cohesion and intra-document linking -- 2.2.1 Pisarkowa -- 2.2.2 Bellert -- 2.2.3 Wajszczuk -- 2.2.4 Marciszewski -- 2.2.5 Stroińska, Szkudlarek and Trofimiec -- 2.2.6 Pisarek -- 2.3 Reference and anaphora in text understanding -- 2.3.1 Grzegorczykowa -- 2.3.2 Gajda -- 2.4 Text genres and language stylistics -- 2.4.1 Szwedek and Duszak -- 2.4.2 Honowska -- 2.4.3 Fontański -- 2.4.4 Dobrzyńska -- 2.5 Anaphora and first-order logic -- 2.5.1 Dunin-Kęplicz -- 2.5.2 Studnicki, Polanowska, Fall and Puczyłowski -- 2.6 Application of formal binding theories to Polish -- 2.6.1 Reinders-Machowska -- 2.6.2 Kupść and Marciniak -- 2.6.3 Trawiński -- Part II: Coreference Annotation -- 3 Related work -- 3.1 Coreference annotation abroad -- 3.2 Polish anaphora and coreference annotation -- 3.2.1 LUNA-related work -- 3.2.2 Pronominal anaphora annotation for information extraction -- 3.2.3 Anaphora representation in the KPWr corpus -- 4 Annotation models -- 4.1 Annotation requirements -- 4.1.1 Requirement 1: Linguistic compatibility -- 4.1.2 Requirement 2: Offline mode -- 4.1.3 Requirement 3: Standard, open, stand-off annotation format -- 4.1.4 Requirement 4: Simple, user-centered design -- 4.1.5 Requirement 5: Reliability -- 4.1.6 Requirement 6: Extensibility, adaptability and open source availability -- 4.2 Annotation tools review -- 4.2.1 Available tools.

4.2.2 PALinkA -- 4.2.2.1 Usage -- 4.2.2.2 Data format -- 4.2.2.3 Other remarks -- 4.2.3 MMAX2 -- 4.2.3.1 Usage -- 4.2.3.2 Data format -- 4.2.3.3 Other remarks -- 4.2.4 Conclusion -- 5 Annotation guidelines -- 5.1 Types of coreference -- 5.1.1 Identity of reference, identity of sense -- 5.1.2 Identity, quasi-identity -- 5.1.3 Dominant expression -- 5.2 Scope of annotation and typology of coreferential constructions -- 5.2.1 Anaphoric expressions in grammatical and lexical form -- 5.2.1.1 Annotation of pronominalisation -- 5.2.1.2 Annotation full replicas of anaphorised groups -- 5.2.2 The borders of the nominal phrase -- 5.2.2.1 Semantic heads of syntactic groups -- 5.3 Particular annotation problems -- 5.3.1 Embedded phrases -- 5.3.2 Discontinuous phrases -- 5.3.3 Annotation of zero subject -- 5.3.4 Idioms -- 5.3.5 Indirect speech, direct speech and free direct speech -- 5.3.6 Gerunds -- 5.3.7 Definiteness and indefiniteness -- 5.3.8 Elective phrases -- 5.3.9 Extended nominal phrases -- 5.3.10 Enumerations -- 6 Annotation methodology -- 6.1 Initial annotation experiment -- 6.1.1 The SemEval-based format -- 6.1.2 Annotation data -- 6.1.3 Annotation procedure -- 6.1.4 First visualisation attempts -- 6.1.5 Findings from the process -- 6.2 Series versus parallel annotation -- 6.3 Annotation workflow -- 6.3.1 Design decisions -- 6.3.2 Corpus creation procedure -- 6.3.3 Text selection -- 6.3.4 Text preparation -- 6.3.5 Text distribution and acquisition -- 6.3.6 Manual text annotation -- 6.3.7 Corpus publication process -- 7 Annotation tools -- 7.1 Tools used in the CORE project -- 7.1.1 DistSys -- 7.1.1.1 Server -- 7.1.1.2 Client -- 7.1.1.3 Annotator's package -- 7.1.2 MMAX4CORE -- 7.1.2.1 Required changes -- 7.1.2.2 New features -- 7.1.2.3 Removed features -- 7.2 DistSys from annotator's perspective -- 7.2.1 Downloading texts.

7.2.2 Saving texts on server (optional operation) -- 7.2.3 Uploading finished texts -- 7.2.4 Checking the number of finished texts -- 7.2.5 Rejecting problematic texts -- 7.2.6 Working on more than one computer -- 7.2.6.1 Second option -- 7.3 MMAX4CORE from annotator's perspective -- 7.3.1 Starting the program -- 7.3.2 Operations on files -- 7.3.2.1 Opening a file -- 7.3.2.2 Saving a file -- 7.3.3 Operations on mentions -- 7.3.3.1 Selecting existing mention -- 7.3.3.2 Editing mention attributes -- 7.3.3.3 Creating new mention -- 7.3.3.4 Editing mention boundaries -- 7.3.3.5 Deleting mention -- 7.3.3.6 Merging two mentions -- 7.3.4 Operations on clusters -- 7.3.4.1 Creating a cluster -- 7.3.4.2 Adding a mention to a cluster -- 7.3.4.3 Removing a mention from a cluster -- 7.3.4.4 Choosing a cluster dominating expression -- 7.3.5 Operations on links -- 7.3.5.1 Adding link -- 7.3.5.2 Removing link -- 7.3.5.3 Changing link target -- 7.3.6 Browsers -- 7.3.6.1 Browsing mentions -- 7.3.6.2 Browsing clusters -- 7.3.6.3 Browsing links -- 7.3.7 Program settings -- 7.3.7.1 Settings -- 7.3.7.2 Display -- 7.3.7.3 Size and location of windows -- 7.3.7.4 Restoring default settings -- 7.3.8 Copying text fragments to clipboard -- 7.4 Adjudication of parallel annotations -- 7.4.1 Differences in DistSys -- 7.4.2 Differences in MMAX4CORE -- 7.4.2.1 Adjudicating layers -- 7.4.2.2 Manual edition -- 8 Polish Coreference Corpus -- 8.1 Corpus composition -- 8.1.1 Short texts -- 8.1.1.1 Text extraction -- 8.1.1.2 Statistics -- 8.1.2 Long texts -- 8.1.2.1 Text extraction -- 8.1.2.2 Statistics -- 8.2 Corpus representation and visualisation -- 8.2.1 TEI format -- 8.2.2 MMAX format -- 8.2.3 Brat format -- 8.2.4 Brat visualization -- 8.3 Corpus statistics -- 8.3.1 Mentions -- 8.3.1.1 Pronouns and zero subjects -- 8.3.1.2 Nested and coordinated mentions -- 8.3.2 Coreference clusters.

8.3.2.1 Agreement in clusters -- 8.3.2.2 Clusters with indefinite mentions -- 8.3.3 Cluster and mention count correlation -- Part III: Coreference Resolution -- 9 Resolution approaches -- 9.1 Resolution methodologies -- 9.1.1 Resolution models -- 9.1.1.1 Mention-pair model -- 9.1.1.2 Entity-based model and ranking -- 9.1.2 Resolution strategies -- 9.1.3 Learning features -- 9.1.4 Resolution algorithms -- 9.2 Foreign state-of-the-art resolution tools -- 9.2.1 BART -- 9.2.1.1 Annotation process -- 9.2.1.2 Summary -- 9.2.2 Reconcile -- 9.2.2.1 Annotation process -- 9.2.2.2 Summary -- 9.2.3 Stanford Deterministic Coreference Resolution System -- 9.2.3.1 Annotation process -- 9.2.3.2 Summary -- 9.2.4 Berkeley Coreference Resolution System -- 9.2.4.1 Annotation process -- 9.2.4.2 Summary -- 9.3 Polish coreference resolution attempts -- 9.3.1 Knowledge-poor pronoun resolution -- 9.3.2 The analysis of anaphoric relations in Polsyn parser -- 9.3.3 Coreferencing for geo-tagging of Polish data -- 9.3.4 Pronominal anaphora resolution module for GATE -- 9.3.5 IKAR and anaphora representation in KPWr -- 9.3.5.1 PN-PN algorithm -- 9.3.5.2 PN-AgP algorithm -- 9.3.5.3 PN-Pron algorithm -- 9.3.6 English-Polish projection-based approach -- 10 Mention detection -- 10.1 Simple nouns and pronouns -- 10.2 Nominal groups -- 10.3 Nested mentions -- 10.3.1 Reorganisation of the grammar -- 10.3.2 Rule modification -- 10.3.3 Problematic cases -- 10.3.3.1 NP-NP groups -- 10.3.3.2 PP-NP groups -- 10.3.3.3 Dates, addresses and numbers -- 10.3.4 Reorganization results -- 10.4 Zero subjects -- 10.4.1 Null subject detection difficulties -- 10.4.2 Development and evaluation data -- 10.4.2.1 Inter-annotator agreement -- 10.4.2.2 Results of full dependency parsing -- 10.4.3 Development of the solution -- 10.4.3.1 First approximation - high recall study.

10.4.3.2 Second approximation - high precision study -- 10.4.3.3 The final setting -- 10.4.3.4 Accuracy on the development corpus -- 10.4.4 Evaluation -- 10.4.4.1 Accuracy on the evaluation corpus -- 10.4.4.2 Learning curve -- 10.4.5 Results -- 10.5 Named entities -- 10.6 Mention detection chain -- 11 Rule-based approach -- 11.1 Resolution process -- 11.2 Data sets and evaluation -- 11.3 Results -- 12 Statistical approach -- 12.1 First adaptation of BART for Polish -- 12.2 Second adaptation of BART for Polish -- 12.2.1 Feature categories -- 12.2.1.1 Surface features -- 12.2.1.2 Syntactic features -- 12.2.1.3 Discourse structure and salience features -- 12.2.1.4 Anaphoricity and antecedenthood features -- 12.2.1.5 BART features -- 12.2.1.6 Other features -- 12.2.2 The final configuration -- 12.2.3 Summary -- 12.3 Third adaptation of BART for Polish -- 12.3.1 Bartek features -- 12.3.1.1 WordNet features -- 12.3.1.2 Wikipedia features -- 12.3.1.3 The final configuration -- Part IV: Evaluation -- 13 Manual annotation evaluation -- 13.1 Annotation agreement of mentions -- 13.2 Annotation agreement of heads -- 13.3 Annotation agreement of quasi-identity links -- 13.4 Annotation agreement of dominant expressions -- 13.5 Annotation agreement of coreference -- 13.5.1 Existing agreement scores -- 13.5.1.1 First Passonneau's scores -- 13.5.1.2 Method à la MUC -- 13.5.1.3 Weighted Krippendorff's α -- 13.5.1.4 Recasens' agreement study -- 13.5.2 Results for PCC -- 13.5.2.1 Agreement à la Passonneau -- 13.5.2.2 Agreement à la Recasens -- 13.5.2.3 Agreement à la BLANC (our contribution) -- 14 Evaluation approaches -- 14.1 Evaluation exercises -- 14.1.1 Anaphora Resolution Exercise 2007 -- 14.1.1.1 Data -- 14.1.1.2 Tasks -- 14.1.1.3 Evaluation metrics -- 14.1.1.4 Summary -- 14.1.2 SemEval 2010 -- 14.1.2.1 Data -- 14.1.2.2 Tasks -- 14.1.3 Evaluation metrics.

14.1.3.1 Summary.
Özet:
The book presents work on coreference understanding, annotation and resolution of a slavic language which can be applied to natural language processing in computers and software using English and other languages. By presenting the steps of building a coreference-related component of the NLP toolset, the volume serves as a reference book on state-of-the art methods in coreference projects for new languages and a tutorial for NLP practitioners.
Notlar:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Elektronik Erişim:
Click to View
Ayırtma: Copies: