Cover image for Scaling Apache Solr.
Scaling Apache Solr.
Title:
Scaling Apache Solr.
Author:
Karambelkar, Hrishikesh Vijay.
ISBN:
9781783981755
Physical Description:
1 online resource (358 pages)
Contents:
Scaling Apache Solr -- Table of Contents -- Scaling Apache Solr -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers, and more -- Why subscribe? -- Free access for Packt account holders -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Errata -- Piracy -- Questions -- 1. Understanding Apache Solr -- Challenges in enterprise search -- Apache Solr - an overview -- Features of Apache Solr -- Solr for end users -- Powerful full text search -- Search through rich information -- Results ranking, pagination, and sorting -- Facets for better browsing experience -- Advanced search capabilities -- Administration -- Apache Solr architecture -- Storage -- Solr application -- Integration -- Client APIs and SolrJ client -- Other interfaces -- Practical use cases for Apache Solr -- Enterprise search for a job search agency -- Problem statement -- Approach -- Enterprise search for energy industry -- Problem statement -- Approach -- Summary -- 2. Getting Started with Apache Solr -- Setting up Apache Solr -- Prerequisites -- Running Solr on Jetty -- Running Solr on Tomcat -- Solr administration -- What's next? -- Common problems and solution -- Understanding the Solr structure -- The Solr home directory structure -- Solr navigation -- Configuring the Apache Solr for enterprise -- Defining a Solr schema -- Solr fields -- Dynamic Fields in Solr -- Copying the fields -- Field types -- Other important elements in the Solr schema -- Configuring Solr parameters -- solr.xml and Solr core -- solrconfig.xml -- The Solr plugin -- Other configurations -- Understanding SolrJ -- Summary -- 3. Analyzing Data with Apache Solr -- Understanding enterprise data.

Categorizing by characteristics -- Categorizing by access pattern -- Categorizing by data formats -- Loading data using native handlers -- Quick and simple data loading - post tool -- Working with JSON, XML, and CSV -- Handling JSON data -- Working with CSV data -- Working with XML data -- Working with rich documents -- Understanding Apache Tika -- Using Solr Cell (ExtractingRequestHandler) -- Adding metadata to your rich documents -- Importing structured data from the database -- Configuring the data source -- Importing data in Solr -- Full import -- Delta import -- Loading RDBMS tables in Solr -- Advanced topics with Solr -- Deduplication -- Extracting information from scanned documents -- Searching through images using LIRE -- Summary -- 4. Designing Enterprise Search -- Designing aspects for enterprise search -- Identifying requirements -- Matching user expectations through relevance -- Access to searched entities and user interface -- Improving search performance and ensuring instance scalability -- Working with applications through federated search -- Other differentiators - mobiles, linguistic search, and security -- Enterprise search data-processing patterns -- Standalone search engine server -- Distributed enterprise search pattern -- The replicated enterprise search pattern -- Distributed and replicated -- Data integrating pattern for search -- Data import by enterprise search -- Applications pushing data -- Middleware-based integration -- Case study - designing an enterprise knowledge repository search for software IT services -- Gathering requirements -- Designing the solution -- Designing the schema -- Integrating subsystems with Apache Solr -- Working on end user interface -- Summary -- 5. Integrating Apache Solr -- Empowering the Java Enterprise application with Solr search.

Embedding Apache Solr as a module (web application) in an enterprise application -- How to do it? -- Apache Solr in your web application -- How to do it? -- Integration with client technologies -- Integrating Apache Solr with PHP for web portals -- Interacting directly with Solr -- Using the Solr PHP client -- How to do it? -- Advanced integration with Solarium -- How to do it? -- Integrating Apache Solr with JavaScript -- Using simple XMLHTTPRequest -- Integrating Apache Solr using AJAX Solr -- Parsing Solr XML with the help of XSLT -- Case study - Apache Solr and Drupal -- How to do it? -- Summary -- 6. Distributed Search Using Apache Solr -- Need for distributed search -- Distributed search architecture -- Apache Solr and distributed search -- Understanding SolrCloud -- Why Zookeeper? -- SolrCloud architecture -- Building enterprise distributed search using SolrCloud -- Setting up a SolrCloud for development -- Setting up a SolrCloud for production -- Adding a document to SolrCloud -- Creating shards, collections, and replicas in SolrCloud -- Common problems and resolutions -- Case study - distributed enterprise search server for the software industry -- Summary -- 7. Scaling Solr through Sharding, Fault Tolerance, and Integration -- Enabling search result clustering with Carrot2 -- Why Carrot2? -- Enabling Carrot2-based document clustering -- Understanding Carrot2 result clustering -- Viewing Solr results in the Carrot2 workbench -- FAQs and problems -- Sharding and fault tolerance -- Document routing and sharding -- Shard splitting -- Load balancing and fault tolerance in SolrCloud -- Searching Solr documents in near real time -- Strategies for near real-time search in Apache Solr -- Explicit call to commit from a client -- solrconfig.xml - autocommit -- CommitWithin - delegating the responsibility to Solr -- Real-time search in Apache Solr.

Solr with MongoDB -- Understanding MongoDB -- Installing MongoDB -- Creating Solr indexes from MongoDB -- Scaling Solr through Storm -- Getting along with Apache Storm -- Solr and Apache Storm -- Summary -- 8. Scaling Solr through High Performance -- Monitoring performance of Apache Solr -- What should be monitored? -- Hardware and operating system -- Java virtual machine -- Apache Solr search runtime -- Apache Solr indexing time -- SolrCloud -- Tools for monitoring Solr performance -- Solr administration user interface -- JConsole -- SolrMeter -- Tuning Solr JVM and container -- Deciding heap size -- How can we optimize JVM? -- Optimizing JVM container -- Optimizing Solr schema and indexing -- Stored fields -- Indexed fields and field lengths -- Copy fields and dynamic fields -- Fields for range queries -- Index field updates -- Synonyms, stemming, and stopwords -- Tuning DataImportHandler -- Speeding up index generation -- Committing the change -- Limiting indexing buffer size -- SolrJ implementation classes -- Speeding Solr through Solr caching -- The filter cache -- The query result cache -- The document cache -- The field value cache -- The warming up cache -- Improving runtime search for Solr -- Pagination -- Reducing Solr response footprint -- Using filter queries -- Search query and the parsers -- Lazy field loading -- Optimizing SolrCloud -- Summary -- 9. Solr and Cloud Computing -- Enterprise search on Cloud -- Models of engagement -- Enterprise search Cloud deployment models -- Solr on Cloud strategies -- Scaling Solr with a dedicated application -- Advantages -- Disadvantages -- Scaling Solr horizontal as multiple applications -- Advantages -- Disadvantages -- Scaling horizontally through the Solr multicore -- Scaling horizontally with replication -- Scaling horizontally with Zookeeper -- Advantages -- Disadvantages.

Running Solr on Cloud (IaaS and PaaS) -- Running Solr with Amazon Cloud -- Running Solr on Windows Azure -- Running Solr on Cloud (SaaS) and enterprise search as a service -- Running Solr with OpenSolr Cloud -- Running Solr with SolrHQ Cloud -- Running Solr with Bitnami -- Working with Amazon CloudSearch -- Drupal-Solr SaaS with Acquia -- Summary -- 10. Scaling Solr Capabilities with Big Data -- Apache Solr and HDFS -- Big Data search on Katta -- How Katta works? -- Setting up Katta cluster -- Creating Katta indexes -- Using the Solr 1045 patch - map-side indexing -- Using the Solr 1301 patch - reduce-side indexing -- Apache Solr and Cassandra -- Working with Cassandra and Solr -- Single node configuration -- Integrating with multinode Cassandra -- Advanced analytics with Solr -- Integrating Solr and R -- Summary -- A. Sample Configuration for Apache Solr -- schema.xml -- solrconfig.xml -- spellings.txt -- synonyms.txt -- protwords.txt -- stopwords.txt -- Index.
Abstract:
This book is a step-by-step guide for readers who would like to learn how to build complete enterprise search solutions, with ample real-world examples and case studies. If you are a developer, designer, or architect who would like to build enterprise search solutions for your customers or organization, but have no prior knowledge of Apache Solr/Lucene technologies, this is the book for you.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: