搜索引擎：信息检索实践

作者: [美] 克罗夫特著

出版社: 机械工业出版社

出版时间: 2009-10

版次: 1

ISBN: 9787111282471

定价: 45.00

装帧: 平装

开本: 大32开

纸张: 胶版纸

页数: 520页

正文语种: 英语

原版书名: 英文版

丛书: 经典原版书库

分类: 计算机与互联网

1 张插图图片

82人买过

　　《搜索引擎：信息检索实践（英文版）》介绍了信息检索（1R）中的关键问题。以及这些问题如何影响搜索引擎的设计与实现，并且用数学模型强化了重要的概念。对于网络搜索引擎这一重要的话题，书中主要涵盖了在网络上广泛使用的搜索技术。
　　《搜索引擎：信息检索实践（英文版）》适用于高等院校计算机科学或计算机工程专业的本科生、研究生，对于专业人士而言，《搜索引擎：信息检索实践（英文版）》也不失为一本理想的入门教材。　　W.BruceCroft，马萨诸塞大学阿默斯特分校计算机科学特聘教授、ACM会士。他创建了智能信息检索研究中心，发表了200余篇论文，多次获奖，其中包括2003年由ACMSIGIR颁发的GerardSalton奖。
　　DonaldMetzler马萨诸塞大学阿默斯特分校博士，是位于加州SantaClara的雅虎研究中心搜索与计算广告组的研究科学家。
　　TrevorStrohman马萨诸塞大学阿默斯特分校博士，是Google公司搜索质量部门的软件工程师。他开发了Galago搜索引擎，也是Indri搜索引擎的主要开发者。 1SearchEnginesandInformationRetrieval
1.1WhatIsInformationRetrieval?
1.2TheBigIssues
1.3SearchEngines
1.4SearchEngineers

2ArchitectureofaSearchEngine
2.1WhatIsanArchitecture
2.2BasicBuildingBlocks
2.3BreakingItDown
2.3.1TextAcquisition
2.3.2TextTransformation
2.3.3IndexCreation
2.3.4UserInteraction
2.3.5Ranking
2.3.6Evaluation
2.4HowDoesItReallyWork?

3CrawlsandFeeds
3.1DecidingWhattoSearch
3.2CrawlingtheWeb
3.2.1RetrievingWebPages
3.2.2TheWebCrawler
3.2.3Freshness
3.2.4FocusedCrawling
3.2.5DeepWeb
3.2.6Sitemaps
3.2.7DistributedCrawling
3.3CrawlingDocumentsandEmail
3.4DocumentFeeds
3.5TheConversionProblem
3.5.1CharacterEncodings
3.6StoringtheDocuments
3.6,1UsingaDatabaseSystem
3.6.2RandomAccess
3.6.3CompressionandLargeFiles
3.6.4Update
3.6.5BigTable
3.7DetectingDuplicates
3.8RemovingNoise

4ProcessingText
4.1FromWordstoTerms
4.2TextStatistics
4.2.1VocabularyGrowth
4.2.2EstimatingCollectionandResultSetSizes
4.3DocumentParsing
4.3.1Overview
4.3.2Tokenizing
4.3.3Stopping
4.3.4Stemming
4.3.5PhrasesandN-grams
4.4DocumentStructureandMarkup
4.5LinkAnalysis
4.5.1AnchorText
4.5.2PageRank
4.5.3LinkQuality
4.6InformationExtraction
4.6.1HiddenMarkovModelsforExtraction
4.7Internationalization

5RankingwithIndexes
5.1Overview
5.2AbstractModelofRanking
5.3InvertedIndexes
5.3.1Documents
5.3.2Counts
5.3.3Positions
5.3AFieldsandExtents
5.3.5Scores
5.3.6Ordering
5.4Compression
5.4.1EntropyandAmbiguity
5.4.2DeltaEncoding
5.4.3Bit-AlignedCodes
5.4.4Byte-AlignedCodes
5.4.5CompressioninPractice
5.4.6LookingAhead
5.4.7SkippingandSkipPointers
5.5AuxiliaryStructures
5.6IndexConstruction
5.6.1SimpleConstruction
5.6.2Merging
5.6.3ParallelismandDistribution
5.6.4Update
5.7QueryProcessing
5.7.1Document-at-a-timeEvaluation
5.7.2Term-at-a-timeEvaluation
5.7.3OptimizationTechniques
5.7.4StructuredQueries
5.7.5DistributedEvaluation
5.7.6Caching

6QueriesandInterfaces
6.1InformationNeedsandQueries
6.2QueryTransformationandRefinement
6.2.1StoppingandStemmingRevisited
6.2.2SpellCheckingandSuggestions
6.2.3QueryExpansion
6.2.4RelevanceFeedback
6.2.5ContextandPersonalization
6.3ShowingtheResults
6.3.1ResultPagesandSnippets
6.3.2AdvertisingandSearch
6.3.3ClusteringtheResults
6.4Cross-LanguageSearch

7RetrievalModels
7.1OverviewofRetrievalModels
7.1.1BooleanRetrieval
7.1.2TheVectorSpaceModel
7.2ProbabilisticModels
7.2.1InformationRetrievalasClassification
7.2.2TheBM25RankingAlgorithm
7.3RankingBasedonLanguageModels
7.3.1QueryLikelihoodRanking
7.3.2RelevanceModelsandPseudo-RelevanceFeedback
7.4ComplexQueriesandCombiningEvidence
7.4.1TheInferenceNetworkModel
7.4.2TheGalagoQueryLanguage
7.5WebSearch
7.6MachineLearningandInformationRetrieval
7.6.1LearningtoRank
7.6.2TopicModelsandVocabularyMismatch
7.7Application-BasedModels

8EvaluatingSearchEngines
8.1WhyEvaluate?
8.2TheEvaluationCorpus
8.3Logging
8.4EffectivenessMetrics
8.4.1RecallandPrecision
8.4.2AveragingandInterpolation
8.4.3FocusingontheTopDocuments
8.4.4UsingPreferences
……
9ClassificationandClustering
10SocialSearch
11BeyondBagofWords
Reverences
Index
内容简介:
　　《搜索引擎：信息检索实践（英文版）》介绍了信息检索（1R）中的关键问题。以及这些问题如何影响搜索引擎的设计与实现，并且用数学模型强化了重要的概念。对于网络搜索引擎这一重要的话题，书中主要涵盖了在网络上广泛使用的搜索技术。
　　《搜索引擎：信息检索实践（英文版）》适用于高等院校计算机科学或计算机工程专业的本科生、研究生，对于专业人士而言，《搜索引擎：信息检索实践（英文版）》也不失为一本理想的入门教材。
作者简介:
　　W.BruceCroft，马萨诸塞大学阿默斯特分校计算机科学特聘教授、ACM会士。他创建了智能信息检索研究中心，发表了200余篇论文，多次获奖，其中包括2003年由ACMSIGIR颁发的GerardSalton奖。
　　DonaldMetzler马萨诸塞大学阿默斯特分校博士，是位于加州SantaClara的雅虎研究中心搜索与计算广告组的研究科学家。
　　TrevorStrohman马萨诸塞大学阿默斯特分校博士，是Google公司搜索质量部门的软件工程师。他开发了Galago搜索引擎，也是Indri搜索引擎的主要开发者。
目录:
1SearchEnginesandInformationRetrieval
1.1WhatIsInformationRetrieval?
1.2TheBigIssues
1.3SearchEngines
1.4SearchEngineers

2ArchitectureofaSearchEngine
2.1WhatIsanArchitecture
2.2BasicBuildingBlocks
2.3BreakingItDown
2.3.1TextAcquisition
2.3.2TextTransformation
2.3.3IndexCreation
2.3.4UserInteraction
2.3.5Ranking
2.3.6Evaluation
2.4HowDoesItReallyWork?

3CrawlsandFeeds
3.1DecidingWhattoSearch
3.2CrawlingtheWeb
3.2.1RetrievingWebPages
3.2.2TheWebCrawler
3.2.3Freshness
3.2.4FocusedCrawling
3.2.5DeepWeb
3.2.6Sitemaps
3.2.7DistributedCrawling
3.3CrawlingDocumentsandEmail
3.4DocumentFeeds
3.5TheConversionProblem
3.5.1CharacterEncodings
3.6StoringtheDocuments
3.6,1UsingaDatabaseSystem
3.6.2RandomAccess
3.6.3CompressionandLargeFiles
3.6.4Update
3.6.5BigTable
3.7DetectingDuplicates
3.8RemovingNoise

4ProcessingText
4.1FromWordstoTerms
4.2TextStatistics
4.2.1VocabularyGrowth
4.2.2EstimatingCollectionandResultSetSizes
4.3DocumentParsing
4.3.1Overview
4.3.2Tokenizing
4.3.3Stopping
4.3.4Stemming
4.3.5PhrasesandN-grams
4.4DocumentStructureandMarkup
4.5LinkAnalysis
4.5.1AnchorText
4.5.2PageRank
4.5.3LinkQuality
4.6InformationExtraction
4.6.1HiddenMarkovModelsforExtraction
4.7Internationalization

5RankingwithIndexes
5.1Overview
5.2AbstractModelofRanking
5.3InvertedIndexes
5.3.1Documents
5.3.2Counts
5.3.3Positions
5.3AFieldsandExtents
5.3.5Scores
5.3.6Ordering
5.4Compression
5.4.1EntropyandAmbiguity
5.4.2DeltaEncoding
5.4.3Bit-AlignedCodes
5.4.4Byte-AlignedCodes
5.4.5CompressioninPractice
5.4.6LookingAhead
5.4.7SkippingandSkipPointers
5.5AuxiliaryStructures
5.6IndexConstruction
5.6.1SimpleConstruction
5.6.2Merging
5.6.3ParallelismandDistribution
5.6.4Update
5.7QueryProcessing
5.7.1Document-at-a-timeEvaluation
5.7.2Term-at-a-timeEvaluation
5.7.3OptimizationTechniques
5.7.4StructuredQueries
5.7.5DistributedEvaluation
5.7.6Caching

6QueriesandInterfaces
6.1InformationNeedsandQueries
6.2QueryTransformationandRefinement
6.2.1StoppingandStemmingRevisited
6.2.2SpellCheckingandSuggestions
6.2.3QueryExpansion
6.2.4RelevanceFeedback
6.2.5ContextandPersonalization
6.3ShowingtheResults
6.3.1ResultPagesandSnippets
6.3.2AdvertisingandSearch
6.3.3ClusteringtheResults
6.4Cross-LanguageSearch

7RetrievalModels
7.1OverviewofRetrievalModels
7.1.1BooleanRetrieval
7.1.2TheVectorSpaceModel
7.2ProbabilisticModels
7.2.1InformationRetrievalasClassification
7.2.2TheBM25RankingAlgorithm
7.3RankingBasedonLanguageModels
7.3.1QueryLikelihoodRanking
7.3.2RelevanceModelsandPseudo-RelevanceFeedback
7.4ComplexQueriesandCombiningEvidence
7.4.1TheInferenceNetworkModel
7.4.2TheGalagoQueryLanguage
7.5WebSearch
7.6MachineLearningandInformationRetrieval
7.6.1LearningtoRank
7.6.2TopicModelsandVocabularyMismatch
7.7Application-BasedModels

8EvaluatingSearchEngines
8.1WhyEvaluate?
8.2TheEvaluationCorpus
8.3Logging
8.4EffectivenessMetrics
8.4.1RecallandPrecision
8.4.2AveragingandInterpolation
8.4.3FocusingontheTopDocuments
8.4.4UsingPreferences
……
9ClassificationandClustering
10SocialSearch
11BeyondBagofWords
Reverences
Index