Introduction to Information Retrieval


Author: Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze
Publisher: Cambridge University Press
ISBN: 1139472100
Category: Computers
Page: N.A
View: 4613

Continue Reading →

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

Introduction to Modern Information Retrieval


Author: Gobinda G. Chowdhury
Publisher: Facet Publishing
ISBN: 185604694X
Category: Information organization
Page: 508
View: 655

Continue Reading →

An information retrieval (IR) system is designed to analyse, process and store sources of information and retrieve those that match a particular user's requirements. A bewildering range of techniques is now available to the information professional attempting to successfully retrieve information. It is recognized that today's information professionals need to concentrate their efforts on learning the techniques of computerized IR. However, it is this book's contention that it also benefits them to learn the theory, techniques and tools that constitute the traditional approaches to the organization and processing of information. In fact much of this knowledge may still be applicable in the storage and retrieval of electronic information in digital library environments. The fully revised third edition of this highly regarded textbook has been thoroughly updated to incorporate major changes in this rapidly expanding field since the second edition in 2004, and a complete new chapter on citation indexing has been added. Unique in its scope, the book covers the whole spectrum of information storage and retrieval, including: users of IR and IR options; database technology; bibliographic formats; cataloguing and metadata; subject analysis and representation; automatic indexing and file organization; vocabulary control; abstracts and indexing; searching and retrieval; user-centred models of IR and user interfaces; evaluation of IR systems and evaluation experiments; online and CD-ROM IR; multimedia IR; hypertext and mark-up languages; web IR; intelligent IR; natural language processing and its applications in IR; citation analysis and IR; IR in digital libraries; and trends in IR research. Illustrated with many examples and comprehensively referenced for an international audience, this is an indispensable textbook for students of library and information studies. It is also an invaluable aid for information practitioners wishing to brush up on their skills and keep up to date with the latest techniques.

Text Data Management and Analysis

A Practical Introduction to Information Retrieval and Text Mining
Author: ChengXiang Zhai,Sean Massung
Publisher: Morgan & Claypool
ISBN: 1970001178
Category: Computers
Page: 530
View: 7674

Continue Reading →

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people analyze and manage vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and are accompanied by semantically rich content. As such, text data are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. In contrast to structured data, which conform to well-defined schemas (thus are relatively easy for computers to handle), text has less explicit structure, requiring computer processing toward understanding of the content encoded in text. The current technology of natural language processing has not yet reached a point to enable a computer to precisely understand natural language text, but a wide range of statistical and heuristic approaches to analysis and management of text data have been developed over the past few decades. They are usually very robust and can be applied to analyze and manage text data in any natural language, and about any topic. This book provides a systematic introduction to all these approaches, with an emphasis on covering the most useful knowledge and skills required to build a variety of practically useful text information systems. The focus is on text mining applications that can help users analyze patterns in text data to extract and reveal useful knowledge. Information retrieval systems, including search engines and recommender systems, are also covered as supporting technology for text mining applications. The book covers the major concepts, techniques, and ideas in text data mining and information retrieval from a practical viewpoint, and includes many hands-on exercises designed with a companion software toolkit (i.e., MeTA) to help readers learn how to apply techniques of text mining and information retrieval to real-world text data and how to experiment with and improve some of the algorithms for interesting application tasks. The book can be used as a textbook for a computer science undergraduate course or a reference book for practitioners working on relevant problems in analyzing and managing text data.

Information Retrieval

Implementing and Evaluating Search Engines
Author: Stefan Büttcher,Charles L. A. Clarke,Gordon V. Cormack
Publisher: MIT Press
ISBN: 0262528878
Category: Computers
Page: 632
View: 4865

Continue Reading →

An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation.

Classification Made Simple

An Introduction to Knowledge Organisation and Information Retrieval
Author: Eric J. Hunter
Publisher: Routledge
ISBN: 1351951041
Category: Language Arts & Disciplines
Page: 176
View: 5661

Continue Reading →

This established textbook introduces the essentials of classification as used for information processing. The third edition takes account of developments that have taken place since the second edition was published in 2002. Classification Made Simple provides a useful gateway to more advanced works and the study of specific schemes. As an introductory text, it will be invaluable to students of information work and to anyone inside or outside the information profession who needs to understand the manner in which classification can be utilized to facilitate and enhance organisation and retrieval.

Introduction to Information Retrieval and Quantum Mechanics


Author: Massimo Melucci
Publisher: Springer
ISBN: 3662483130
Category: Computers
Page: 232
View: 5122

Continue Reading →

This book introduces the quantum mechanical framework to information retrieval scientists seeking a new perspective on foundational problems. As such, it concentrates on the main notions of the quantum mechanical framework and describes an innovative range of concepts and tools for modeling information representation and retrieval processes. The book is divided into four chapters. Chapter 1 illustrates the main modeling concepts for information retrieval (including Boolean logic, vector spaces, probabilistic models, and machine-learning based approaches), which will be examined further in subsequent chapters. Next, chapter 2 briefly explains the main concepts of the quantum mechanical framework, focusing on approaches linked to information retrieval such as interference, superposition and entanglement. Chapter 3 then reviews the research conducted at the intersection between information retrieval and the quantum mechanical framework. The chapter is subdivided into a number of topics, and each description ends with a section suggesting the most important reference resources. Lastly, chapter 4 offers suggestions for future research, briefly outlining the most essential and promising research directions to fully leverage the quantum mechanical framework for effective and efficient information retrieval systems. This book is especially intended for researchers working in information retrieval, database systems and machine learning who want to acquire a clear picture of the potential offered by the quantum mechanical framework in their own research area. Above all, the book offers clear guidance on whether, why and when to effectively use the mathematical formalism and the concepts of the quantum mechanical framework to address various foundational issues in information retrieval.

Modern Information Retrieval

The Concepts and Technology Behind Search
Author: Ricardo Baeza-Yates,Berthier Ribeiro-Neto
Publisher: Addison-Wesley Professional
ISBN: 9780321416919
Category: Computers
Page: 913
View: 2524

Continue Reading →

This is a rigorous and complete textbook for a first course on information retrieval from the computer science perspective. It provides an up-to-date student oriented treatment of information retrieval including extensive coverage of new topics such as web retrieval, web crawling, open source search engines and user interfaces. From parsing to indexing, clustering to classification, retrieval to ranking, and user feedback to retrieval evaluation, all of the most important concepts are carefully introduced and exemplified. The contents and structure of the book have been carefully designed by the two main authors, with individual contributions coming from leading international authorities in the field, including Yoelle Maarek, Senior Director of Yahoo! Research Israel; Dulce Poncele´on IBM Research; and Malcolm Slaney, Yahoo Research USA. This completely reorganized, revised and enlarged second edition of Modern Information Retrieval contains many new chapters and double the number of pages and bibliographic references of the first edition, and a companion website www.mir2ed.org with teaching material. It will prove invaluable to students, professors, researchers, practitioners, and scholars of this fascinating field of information retrieval.

Foundations of Statistical Natural Language Processing


Author: Christopher D. Manning,Hinrich Schütze
Publisher: MIT Press
ISBN: 9780262133609
Category: Language Arts & Disciplines
Page: 680
View: 5863

Continue Reading →

An introduction to statistical natural language processing (NLP). The text contains the theory and algorithms needed for building NLP tools. Topics covered include: mathematical and linguistic foundations; statistical methods; collocation finding; word sense disambiguation; and probalistic parsing.

Information Retrieval

Algorithms and Heuristics
Author: David A. Grossman,Ophir Frieder
Publisher: Springer Science & Business Media
ISBN: 9781402030048
Category: Computers
Page: 332
View: 8895

Continue Reading →

Interested in how an efficient search engine works? Want to know what algorithms are used to rank resulting documents in response to user requests? The authors answer these and other key information retrieval design and implementation questions. This book is not yet another high level text. Instead, algorithms are thoroughly described, making this book ideally suited for both computer science students and practitioners who work on search-related applications. As stated in the foreword, this book provides a current, broad, and detailed overview of the field and is the only one that does so. Examples are used throughout to illustrate the algorithms. The authors explain how a query is ranked against a document collection using either a single or a combination of retrieval strategies, and how an assortment of utilities are integrated into the query processing scheme to improve these rankings. Methods for building and compressing text indexes, querying and retrieving documents in multiple languages, and using parallel or distributed processing to expedite the search are likewise described. This edition is a major expansion of the one published in 1998. Besides updating the entire book with current techniques, it includes new sections on language models, cross-language information retrieval, peer-to-peer processing, XML search, mediators, and duplicate document detection.

Learning to Rank for Information Retrieval


Author: Tie-Yan Liu
Publisher: Springer Science & Business Media
ISBN: 9783642142673
Category: Computers
Page: 285
View: 8547

Continue Reading →

Due to the fast growth of the Web and the difficulties in finding desired information, efficient and effective information retrieval systems have become more important than ever, and the search engine has become an essential tool for many people. The ranker, a central component in every search engine, is responsible for the matching between processed queries and indexed documents. Because of its central role, great attention has been paid to the research and development of ranking technologies. In addition, ranking is also pivotal for many other information retrieval applications, such as collaborative filtering, definition ranking, question answering, multimedia retrieval, text summarization, and online advertisement. Leveraging machine learning technologies in the ranking process has led to innovative and more effective ranking models, and eventually to a completely new research area called “learning to rank”. Liu first gives a comprehensive review of the major approaches to learning to rank. For each approach he presents the basic framework, with example algorithms, and he discusses its advantages and disadvantages. He continues with some recent advances in learning to rank that cannot be simply categorized into the three major approaches – these include relational ranking, query-dependent ranking, transfer ranking, and semisupervised ranking. His presentation is completed by several examples that apply these technologies to solve real information retrieval problems, and by theoretical discussions on guarantees for ranking performance. This book is written for researchers and graduate students in both information retrieval and machine learning. They will find here the only comprehensive description of the state of the art in a field that has driven the recent advances in search engine development.

Search Engines

Information Retrieval in Practice
Author: Bruce Croft,Donald Metzler,Trevor Strohman
Publisher: Pearson Higher Ed
ISBN: 0133001598
Category: Computers
Page: 552
View: 2840

Continue Reading →

This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book. Search Engines: Information Retrieval in Practice is ideal for introductory information retrieval courses at the undergraduate and graduate level in computer science, information science and computer engineering departments. It is also a valuable tool for search engine and information retrieval professionals. Written by a leader in the field of information retrieval, Search Engines: Information Retrieval in Practice , is designed to give undergraduate students the understanding and tools they need to evaluate, compare and modify search engines. Coverage of the underlying IR and mathematical models reinforce key concepts. The book’s numerous programming exercises make extensive use of Galago, a Java-based open source search engine.

Music Similarity and Retrieval

An Introduction to Audio- and Web-based Strategies
Author: Peter Knees,Markus Schedl
Publisher: Springer
ISBN: 3662497220
Category: Computers
Page: 299
View: 6463

Continue Reading →

This book provides a summary of the manifold audio- and web-based approaches to music information retrieval (MIR) research. In contrast to other books dealing solely with music signal processing, it addresses additional cultural and listener-centric aspects and thus provides a more holistic view. Consequently, the text includes methods operating on features extracted directly from the audio signal, as well as methods operating on features extracted from contextual information, either the cultural context of music as represented on the web or the user and usage context of music. Following the prevalent document-centered paradigm of information retrieval, the book addresses models of music similarity that extract computational features to describe an entity that represents music on any level (e.g., song, album, or artist), and methods to calculate the similarity between them. While this perspective and the representations discussed cannot describe all musical dimensions, they enable us to effectively find music of similar qualities by providing abstract summarizations of musical artifacts from different modalities. The text at hand provides a comprehensive and accessible introduction to the topics of music search, retrieval, and recommendation from an academic perspective. It will not only allow those new to the field to quickly access MIR from an information retrieval point of view but also raise awareness for the developments of the music domain within the greater IR community. In this regard, Part I deals with content-based MIR, in particular the extraction of features from the music signal and similarity calculation for content-based retrieval. Part II subsequently addresses MIR methods that make use of the digitally accessible cultural context of music. Part III addresses methods of collaborative filtering and user-aware and multi-modal retrieval, while Part IV explores current and future applications of music retrieval and recommendation.>

Visual Information Retrieval Using Java and LIRE


Author: Mathias Lux,Oge Marques
Publisher: Morgan & Claypool Publishers
ISBN: 1608459187
Category: Computers
Page: 96
View: 7917

Continue Reading →

Visual information retrieval (VIR) is an active and vibrant research area, which attempts at providing means for organizing, indexing, annotating, and retrieving visual information (images and videos) from large, unstructured repositories. The goal of VIR is to retrieve matches ranked by their relevance to a given query, which is often expressed as an example image and/or a series of keywords. During its early years (1995-2000), the research efforts were dominated by content-based approaches contributed primarily by the image and video processing community. During the past decade, it was widely recognized that the challenges imposed by the lack of coincidence between an image's visual contents and its semantic interpretation, also known as semantic gap, required a clever use of textual metadata (in addition to information extracted from the image's pixel contents) to make image and video retrieval solutions efficient and effective. The need to bridge (or at least narrow) the semantic gap has been one of the driving forces behind current VIR research. Additionally, other related research problems and market opportunities have started to emerge, offering a broad range of exciting problems for computer scientists and engineers to work on. In this introductory book, we focus on a subset of VIR problems where the media consists of images, and the indexing and retrieval methods are based on the pixel contents of those images -- an approach known as content-based image retrieval (CBIR). We present an implementation-oriented overview of CBIR concepts, techniques, algorithms, and figures of merit. Most chapters are supported by examples written in Java, using Lucene (an open-source Java-based indexing and search implementation) and LIRE (Lucene Image REtrieval), an open-source Java-based library for CBIR.

Web Information Retrieval


Author: Stefano Ceri,Alessandro Bozzon,Marco Brambilla,Emanuele Della Valle,Piero Fraternali,Silvia Quarteroni
Publisher: Springer Science & Business Media
ISBN: 3642393144
Category: Computers
Page: 284
View: 7617

Continue Reading →

With the proliferation of huge amounts of (heterogeneous) data on the Web, the importance of information retrieval (IR) has grown considerably over the last few years. Big players in the computer industry, such as Google, Microsoft and Yahoo!, are the primary contributors of technology for fast access to Web-based information; and searching capabilities are now integrated into most information systems, ranging from business management software and customer relationship systems to social networks and mobile phone applications. Ceri and his co-authors aim at taking their readers from the foundations of modern information retrieval to the most advanced challenges of Web IR. To this end, their book is divided into three parts. The first part addresses the principles of IR and provides a systematic and compact description of basic information retrieval techniques (including binary, vector space and probabilistic models as well as natural language search processing) before focusing on its application to the Web. Part two addresses the foundational aspects of Web IR by discussing the general architecture of search engines (with a focus on the crawling and indexing processes), describing link analysis methods (specifically Page Rank and HITS), addressing recommendation and diversification, and finally presenting advertising in search (the main source of revenues for search engines). The third and final part describes advanced aspects of Web search, each chapter providing a self-contained, up-to-date survey on current Web research directions. Topics in this part include meta-search and multi-domain search, semantic search, search in the context of multimedia data, and crowd search. The book is ideally suited to courses on information retrieval, as it covers all Web-independent foundational aspects. Its presentation is self-contained and does not require prior background knowledge. It can also be used in the context of classic courses on data management, allowing the instructor to cover both structured and unstructured data in various formats. Its classroom use is facilitated by a set of slides, which can be downloaded from www.search-computing.org.

Teaching and Learning in Information Retrieval


Author: Efthimis Efthimiadis,Juan M. Fernández-Luna,Juan F. Huete,Andrew MacFarlane
Publisher: Springer Science & Business Media
ISBN: 9783642225116
Category: Computers
Page: 213
View: 3301

Continue Reading →

Information Retrieval has become a very active research field in the 21st century. Many from academia and industry present their innovations in the field in a wide variety of conferences and journals. Companies transfer this new knowledge directly to the general public via services such as web search engines in order to improve their information seeking experience. In parallel, teaching IR is turning into an important aspect of IR generally, not only because it is necessary to impart effective search techniques to make the most of the IR tools available, but also because we must provide a good foundation for those students who will become the driving force of future IR technologies. There are very few resources for teaching and learning in IR, the major problem which this book is designed to solve. The objective is to provide ideas and practical experience of teaching and learning IR, for those whose job requires them to teach in one form or another, and where delivering IR courses is a major part of their working lives. In this context of providing a higher profile for teaching and learning as applied to IR, the co-editor of this book, Efthimis Efthimiathis, had maintained a leading role in teaching and learning within the domain of IR for a number of years. This book represents a posthumous example of his efforts in the area, as he passed away in April 2011. This book, his book, is dedicated to his memory.

Information Storage and Retrieval Systems

Theory and Implementation
Author: Gerald J. Kowalski,Mark T. Maybury
Publisher: Springer Science & Business Media
ISBN: 0306470314
Category: Computers
Page: 318
View: 9030

Continue Reading →

Chapter 1 places into perspective a total Information Storage and Retrieval System. This perspective introduces new challenges to the problems that need to be theoretically addressed and commercially implemented. Ten years ago commercial implementation of the algorithms being developed was not realistic, allowing theoreticians to limit their focus to very specific areas. Bounding a problem is still essential in deriving theoretical results. But the commercialization and insertion of this technology into systems like the Internet that are widely being used changes the way problems are bounded. From a theoretical perspective, efficient scalability of algorithms to systems with gigabytes and terabytes of data, operating with minimal user search statement information, and making maximum use of all functional aspects of an information system need to be considered. The dissemination systems using persistent indexes or mail files to modify ranking algorithms and combining the search of structured information fields and free text into a consolidated weighted output are examples of potential new areas of investigation. The best way for the theoretician or the commercial developer to understand the importance of problems to be solved is to place them in the context of a total vision of a complete system. Understanding the differences between Digital Libraries and Information Retrieval Systems will add an additional dimension to the potential future development of systems. The collaborative aspects of digital libraries can be viewed as a new source of information that dynamically could interact with information retrieval techniques.

Understanding Information Retrieval Systems

Management, Types, and Standards
Author: Marcia J. Bates
Publisher: CRC Press
ISBN: 1466551356
Category: Business & Economics
Page: 752
View: 2812

Continue Reading →

In order to be effective for their users, information retrieval (IR) systems should be adapted to the specific needs of particular environments. The huge and growing array of types of information retrieval systems in use today is on display in Understanding Information Retrieval Systems: Management, Types, and Standards, which addresses over 20 types of IR systems. These various system types, in turn, present both technical and management challenges, which are also addressed in this volume. In order to be interoperable in a networked environment, IR systems must be able to use various types of technical standards, a number of which are described in this book—often by their original developers. The book covers the full context of operational IR systems, addressing not only the systems themselves but also human user search behaviors, user-centered design, and management and policy issues. In addition to theory and practice of IR system design, the book covers Web standards and protocols, the Semantic Web, XML information retrieval, Web social mining, search engine optimization, specialized museum and library online access, records compliance and risk management, information storage technology, geographic information systems, and data transmission protocols. Emphasis is given to information systems that operate on relatively unstructured data, such as text, images, and music. The book is organized into four parts: Part I supplies a broad-level introduction to information systems and information retrieval systems Part II examines key management issues and elaborates on the decision process around likely information system solutions Part III illustrates the range of information retrieval systems in use today discussing the technical, operational, and administrative issues for each type Part IV discusses the most important organizational and technical standards needed for successful information retrieval This volume brings together authoritative articles on the different types of information systems and how to manage real-world demands such as digital asset management, network management, digital content licensing, data quality, and information system failures. It explains how to design systems to address human characteristics and considers key policy and ethical issues such as piracy and preservation. Focusing on web–based systems, the chapters in this book provide an excellent starting point for developing and managing your own IR systems.