数据挖掘导论 (英文版)

数据挖掘导论 (英文版)
分享
扫描下方二维码分享到微信
打开微信,点击右上角”+“,
使用”扫一扫“即可将网页分享到朋友圈。
作者: [美] , ,
2006-01
版次: 1
ISBN: 9787115141446
定价: 59.00
装帧: 平装
开本: 16开
纸张: 胶版纸
页数: 516页
字数: 713千字
27人买过
  •   本书对数据挖掘进行了全面介绍,旨在为读者提供将数据挖掘应用于实际问题所必需的知识。本书涵盖五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前面一章讲述基本概念、代表性算法和评估技术,而后面一章较深入地讨论高级概念和算法。目的是在使读者透彻地理解数据挖掘基础的同时,还能了解更多重要的高级主题。此外,书中还提供了大量例子、图表和习题。
      本书适合作为相关专业高年级本科生和研究生数据挖掘课程的教材,同时也可作为从事数据挖掘研究和应用开发工作的技术人员的参考书。   Pang-NingTan现为密歇根州立大学计算机与工程系助理教授,主要教授数据挖掘、数据库系统等课程。此前,他曾是明尼苏达大学美国陆军高性能计算研究中心副研究员(2002-2003)。
      MichaelSteinbach明尼苏达大学计算机与工程系研究员,在读博士。
      VipinKumar明尼苏达大学计算机科学与工程系主任,曾任美国陆军高性能计算研究中心主任。他拥有马里兰大学博士学位,是数据挖掘和高性能计算方面的国际权威,IEEE会士。 1 Introduction 1
    1.1 WhatIsDataMining? 2
    1.2 MotivatingChallenges 3
    1.3 TheOriginsofDataMining 4
    1.4 DataMiningTasks 5
    1.5 ScopeandOrganizationoftheBook 8
    1.6 BibliographicNotes 9
    1.7 Exercises 12

    2 Data 13
    2.1 TypesofData 15
    2.1.1 AttributesandMeasurement 15
    2.1.2 TypesofDataSets 20
    2.2 DataQuality 25
    2.2.1 MeasurementandDataCollectionIssues 26
    2.2.2 IssuesRelatedtoApplications 31
    2.3 DataPreprocessing 32
    2.3.1 Aggregation 32
    2.3.2 Sampling 34
    2.3.3 DimensionalityReduction 36
    2.3.4 FeatureSubsetSelection 37
    2.3.5 FeatureCreation 39
    2.3.6 DiscretizationandBinarization 41
    2.3.7 VariableTransformation 45
    2.4 MeasuresofSimilarityandDissimilarity 47
    2.4.1 Basics 47
    2.4.2 SimilarityandDissimilaritybetweenSimpleAttributes 49
    2.4.3 DissimilaritiesbetweenDataObjects 50
    2.4.4 SimilaritiesbetweenDataObjects 52
    2.4.5 ExamplesofProximityMeasures 53
    2.4.6 IssuesinProximityCalculation 58
    2.4.7 SelectingtheRightProximityMeasure 60
    2.5 BibliographicNotes 61
    2.6 Exercises 64

    3 ExploringData 71
    3.1 TheIrisDataSet 71
    3.2 SummaryStatistics 72
    3.2.1 FrequenciesandtheMode 72
    3.2.2 Percentiles 73
    3.2.3 MeasuresofLocation:MeanandMedian 73
    3.2.4 MeasuresofSpread:RangeandVariance 75
    3.2.5 MultivariateSummaryStatistics 76
    3.2.6 OtherWaystoSummarizetheData 77
    3.3 Visualization 77
    3.3.1 MotivationsforVisualization 77
    3.3.2 GeneralConcepts 78
    3.3.3 Techniques 81
    3.3.4 VisualizingHigher-DimensionalData 90
    3.3.5 DosandDonts 94
    3.4 OLAPandMultidimensionalDataAnalysis 95
    3.4.1 RepresentingIrisDataasaMultidimensionalArray 95
    3.4.2 MultidimensionalData:TheGeneralCase 97
    3.4.3 AnalyzingMultidimensionalData 98
    3.4.4 FinalCommentsonMultidimensionalDataAnalysis 101
    3.5 BibliographicNotes 102
    3.6 Exercises 103

    4 Classification:BasicConcepts,DecisionTrees,andModelEvaluation 105
    4.1 Preliminaries 105
    4.2 GeneralApproachtoSolvingaClassificationProblem 107
    4.3 DecisionTreeInduction 108
    4.3.1 HowaDecisionTreeWorks 108
    4.3.2 HowtoBuildaDecisionTree 110
    4.3.3 MethodsforExpressingAttributeTestConditions 112
    4.3.4 MeasuresforSelectingtheBestSplit 114
    4.3.5 AlgorithmforDecisionTreeInduction 119
    4.3.6 AnExample:WebRobotDetection 120
    4.3.7 CharacteristicsofDecisionTreeInduction 122
    4.4 ModelOverfitting 125
    4.4.1 OverfittingDuetoPresenceofNoise 127
    4.4.2 OverfittingDuetoLackofRepresentativeSamples 129
    4.4.3 OverfittingandtheMultipleComparisonProcedure 129
    4.4.4 EstimationofGeneralizationErrors 131
    4.4.5 HandlingOverfittinginDecisionTreeInduction 134
    4.5 EvaluatingthePerformanceofaClassifier 135
    4.5.1 HoldoutMethod 136
    4.5.2 RandomSubsampling 136
    4.5.3 Cross-Validation 136
    4.5.4 Bootstrap 137
    4.6 MethodsforComparingClassifiers 137
    4.6.1 EstimatingaConfidenceIntervalforAccuracy 138
    4.6.2 ComparingthePerformanceof TwoModels 139
    4.6.3 ComparingthePerformanceofTwoClassifiers 140
    4.7 BibliographicNotes 141
    4.8 Exercises 144

    5 Classification:AlternativeTechniques 151
    5.1 Rule-BasedClassifier 151
    5.1.1 HowaRule-BasedClassifierWorks 153
    5.1.2 Rule-OrderingSchemes 154
    5.1.3 HowtoBuildaRule-BasedClassifier 155
    5.1.4 DirectMethodsforRuleExtraction 155
    5.1.5 IndirectMethodsforRuleExtraction 161
    5.1.6 CharacteristicsofRule-BasedClassifiers 163
    5.2 Nearest-Neighborclassifiers 163
    5.2.1 Algorithm 165
    5.2.2 CharacteristicsofNearest-NeighborClassifiers 165
    5.3 BayesianClassifiers 166
    5.3.1 BayesTheorem 166
    5.3.2 UsingtheBayesTheoremforClassification 168
    5.3.3 Na?veBayesClassifier 169
    5.3.4 BayesErrorRate 175
    5.3.5 BayesianBeliefNetworks 176
    5.4 ArtificialNeuralNetwork(ANN) 181
    5.4.1 Perceptron 181
    5.4.2 MultilayerArtificialNeuralNetwork 184
    5.4.3 CharacteristicsofANN 187
    5.5 SupportVectorMachine(SVM) 188
    5.5.1 MaximumMarginHyperplanes 188
    5.5.2 LinearSVM:SeparableCase 190
    5.5.3 LinearSVM:NonseparableCase 195
    5.5.4 NonlinearSVM 198
    5.5.5 CharacteristicsofSVM 203
    5.6 EnsembleMethods 203
    5.6.1 RationaleforEnsembleMethod 203
    5.6.2 MethodsforConstructinganEnsembleClassifier 204
    5.6.3 Bias-VarianceDecomposition 206
    5.6.4 Bagging 209
    5.6.5 Boosting 211
    5.6.6 RandomForests 215
    5.6.7 EmpiricalComparisonamongEnsembleMethods 216
    5.7 ClassImbalanceProblem 217
    5.7.1 AlternativeMetrics 218
    5.7.2 TheReceiverOperatingCharacteristicCurve 220
    5.7.3 Cost-SensitiveLearning 223
    5.7.4 Sampling-BasedApproaches 225
    5.8 MulticlassProblem 226
    5.9 BibliographicNotes 228
    5.10Exercises 233

    6 AssociationAnalysis:BasicConceptsandAlgorithms 241
    6.1 ProblemDefinition 242
    6.2 FrequentItemsetGeneration 244
    6.2.1 TheAprioriPrinciple 246
    6.2.2 FrequentItemsetGenerationintheAprioriAlgorithm 247
    6.2.3 CandidateGenerationandPruning 249
    6.2.4 SupportCounting 252
    6.2.5 ComputationalComplexity 255
    6.3 RuleGeneration 257
    6.3.1 Confidence-BasedPruning 258
    6.3.2 RuleGenerationinAprioriAlgorithm 258
    6.3.3 AnExample:CongressionalVotingRecords 259
    6.4 CompactRepresentationofFrequentItemsets 260
    6.4.1 MaximalFrequentItemsets 260
    6.4.2 ClosedFrequentItemsets 262
    6.5 AlternativeMethodsforGeneratingFrequentItemsets 264
    6.6 FP-GrowthAlgorithm 268
    6.6.1 FP-TreeRepresentation 268
    6.6.2 FrequentItemsetGenerationinFP-GrowthAlgorithm 270
    6.7 EvaluationofAssociationPatterns 273
    6.7.1 ObjectiveMeasuresofInterestingness 274
    6.7.2 MeasuresbeyondPairsofBinaryVariables 282
    6.7.3 SimpsonsParadox 283
    6.8 EffectofSkewedSupportDistribution 285
    6.9 BibliographicNotes 288
    6.10Exercises 298

    7 AssociationAnalysis:AdvancedConcepts 307
    7.1 HandlingCategoricalAttributes 307
    7.2 HandlingContinuousAttributes 309
    7.2.1 Discretization-BasedMethods 310
    7.2.2 Statistics-BasedMethods 312
    7.2.3 Non-discretizationMethods 314
    7.3 HandlingaConceptHierarchy 316
    7.4 SequentialPatterns 318
    7.4.1 ProblemFormulation 318
    7.4.2 SequentialPatternDiscovery 320
    7.4.3 TimingConstraints 323
    7.4.4 AlternativeCountingSchemes 327
    7.5 SubgraphPatterns 328
    7.5.1 GraphsandSubgraphs 329
    7.5.2 FrequentSubgraphMining 330
    7.5.3 Apriori-likeMethod 332
    7.5.4 CandidateGeneration 333
    7.5.5 CandidatePruning 338
    7.5.6 SupportCounting 340
    7.6 InfrequentPatterns 340
    7.6.1 NegativePatterns 341
    7.6.2 NegativelyCorrelatedPatterns 342
    7.6.3 ComparisonsamongInfrequentPatterns,NegativePatterns,andNegativelyCorrelatedPatterns 343
    7.6.4 TechniquesforMiningInterestingInfrequentPatterns 344
    7.6.5 TechniquesBasedonMiningNegativePatterns 345
    7.6.6 TechniquesBasedonSupportExpectation 347
    7.7 BibliographicNotes 350
    7.8 Exercises 353

    8 ClusterAnalysis:BasicConceptsandAlgorithms 363
    8.1 Overview 365
    8.1.1 WhatIsClusterAnalysis? 365
    8.1.2 DifferentTypesofClusterings 366
    8.1.3 DifferentTypesofClusters 368
    8.2 K-means 370
    8.2.1 TheBasicK-meansAlgorithm 371
    8.2.2 K-means:AdditionalIssues 378
    8.2.3 BisectingK-means 380
    8.2.4 K-meansandDifferentTypesofClusters 381
    8.2.5 StrengthsandWeaknesses 383
    8.2.6 K-meansasanOptimizationProblem 383
    8.3 AgglomerativeHierarchicalClustering 385
    8.3.1 BasicAgglomerativeHierarchicalClusteringAlgorithm 385
    8.3.2 SpecificTechniques 387
    8.3.3 TheLance-WilliamsFormulaforClusterProximity 391
    8.3.4 KeyIssuesinHierarchicalClustering 391
    8.3.5 StrengthsandWeaknesses 393
    8.4 DBSCAN 393
    8.4.1 TraditionalDensity:Center-BasedApproach 393
    8.4.2 TheDBSCANAlgorithm 394
    8.4.3 StrengthsandWeaknesses 398
    8.5 ClusterEvaluation 398
    8.5.1 Overview 399
    8.5.2 UnsupervisedClusterEvaluationUsingCohesionandSeparation 401
    8.5.3 UnsupervisedClusterEvaluationUsingtheProximityMatrix 406
    8.5.4 UnsupervisedEvaluationofHierarchicalClustering 408
    8.5.5 DeterminingtheCorrectNumberofClusters 409
    8.5.6 ClusteringTendency 410
    8.5.7 SupervisedMeasuresofClusterValidity 411
    8.5.8 AssessingtheSignificanceofClusterValidityMeasures 414
    8.6 BibliographicNotes 416
    8.7 Exercises 419

    9 ClusterAnalysis:AdditionalIssuesandAlgorithms 427
    9.1 CharacteristicsofData,Clusters,andClusteringAlgorithms 427
    9.1.1 Example:ComparingK-meansandDBSCAN 428
    9.1.2 DataCharacteristics 429
    9.1.3 ClusterCharacteristics 430
    9.1.4 GeneralCharacteristicsofClusteringAlgorithms 431
    9.2 Prototype-BasedClustering 433
    9.2.1 FuzzyClustering 433
    9.2.2 ClusteringUsingMixtureModels 437
    9.2.3 Self-OrganizingMaps(SOM) 446
    9.3 Density-BasedClustering 451
    9.3.1 Grid-BasedClustering 451
    9.3.2 SubspaceClustering 454
    9.3.3 DENCLUE:AKernel-BasedSchemeforDensity-BasedClustering 457
    9.4 Graph-BasedClustering 460
    9.4.1 Sparsification 461
    9.4.2 MinimumSpanningTree(MST)Clustering 462
    9.4.3 OPOSSUM:OptimalPartitioningofSparseSimilaritiesUsingMETIS 463
    9.4.4 Chameleon:HierarchicalClusteringwithDynamicModeling 464
    9.4.5 SharedNearestNeighborSimilarity 468
    9.4.6 TheJarvis-PatrickClusteringAlgorithm 471
    9.4.7 SNNDensity 472
    9.4.8 SNNDensity-BasedClustering 473
    9.5 ScalableClusteringAlgorithms 475
    9.5.1 Scalability:GeneralIssuesandApproaches 476
    9.5.2 BIRCH 477
    9.5.3 CURE 479
    9.6 WhichClusteringAlgorithm? 482
    9.7 BibliographicNotes 484
    9.8 Exercises 488

    10 AnomalyDetection 491
    10.1 Preliminaries 492
    10.1.1 CausesofAnomalies 492
    10.1.2 ApproachestoAnomalyDetection 493
    10.1.3 TheUseofClassLabels 494
    10.1.4 Issues 495
    10.2 StatisticalApproaches 496
    10.2.1 DetectingOutliersinaUnivariateNormalDistribution 497
    10.2.2 OutliersinaMultivariateNormalDistribution 499
    10.2.3 AMixtureModelApproachforAnomalyDetection 500
    10.2.4 StrengthsandWeaknesses 502
    10.3 Proximity-BasedOutlierDetection 502
    10.3.1 StrengthsandWeaknesses 503
    10.4 Density-BasedOutlierDetection 504
    10.4.1 DetectionofOutliersUsingRelativeDensity 505
    10.4.2 StrengthsandWeaknesses 506
    10.5 Clustering-BasedTechniques 506
    10.5.1 AssessingtheExtenttoWhichanObjectBelongstoaCluster 507
    10.5.2 ImpactofOutliersontheInitialClustering 509
    10.5.3 TheNumberofClusterstoUse 509
    10.5.4 StrengthsandWeaknesses 509
    10.6 BibliographicNotes 510
    10.7 Exercises 513
  • 内容简介:
      本书对数据挖掘进行了全面介绍,旨在为读者提供将数据挖掘应用于实际问题所必需的知识。本书涵盖五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前面一章讲述基本概念、代表性算法和评估技术,而后面一章较深入地讨论高级概念和算法。目的是在使读者透彻地理解数据挖掘基础的同时,还能了解更多重要的高级主题。此外,书中还提供了大量例子、图表和习题。
      本书适合作为相关专业高年级本科生和研究生数据挖掘课程的教材,同时也可作为从事数据挖掘研究和应用开发工作的技术人员的参考书。
  • 作者简介:
      Pang-NingTan现为密歇根州立大学计算机与工程系助理教授,主要教授数据挖掘、数据库系统等课程。此前,他曾是明尼苏达大学美国陆军高性能计算研究中心副研究员(2002-2003)。
      MichaelSteinbach明尼苏达大学计算机与工程系研究员,在读博士。
      VipinKumar明尼苏达大学计算机科学与工程系主任,曾任美国陆军高性能计算研究中心主任。他拥有马里兰大学博士学位,是数据挖掘和高性能计算方面的国际权威,IEEE会士。
  • 目录:
    1 Introduction 1
    1.1 WhatIsDataMining? 2
    1.2 MotivatingChallenges 3
    1.3 TheOriginsofDataMining 4
    1.4 DataMiningTasks 5
    1.5 ScopeandOrganizationoftheBook 8
    1.6 BibliographicNotes 9
    1.7 Exercises 12

    2 Data 13
    2.1 TypesofData 15
    2.1.1 AttributesandMeasurement 15
    2.1.2 TypesofDataSets 20
    2.2 DataQuality 25
    2.2.1 MeasurementandDataCollectionIssues 26
    2.2.2 IssuesRelatedtoApplications 31
    2.3 DataPreprocessing 32
    2.3.1 Aggregation 32
    2.3.2 Sampling 34
    2.3.3 DimensionalityReduction 36
    2.3.4 FeatureSubsetSelection 37
    2.3.5 FeatureCreation 39
    2.3.6 DiscretizationandBinarization 41
    2.3.7 VariableTransformation 45
    2.4 MeasuresofSimilarityandDissimilarity 47
    2.4.1 Basics 47
    2.4.2 SimilarityandDissimilaritybetweenSimpleAttributes 49
    2.4.3 DissimilaritiesbetweenDataObjects 50
    2.4.4 SimilaritiesbetweenDataObjects 52
    2.4.5 ExamplesofProximityMeasures 53
    2.4.6 IssuesinProximityCalculation 58
    2.4.7 SelectingtheRightProximityMeasure 60
    2.5 BibliographicNotes 61
    2.6 Exercises 64

    3 ExploringData 71
    3.1 TheIrisDataSet 71
    3.2 SummaryStatistics 72
    3.2.1 FrequenciesandtheMode 72
    3.2.2 Percentiles 73
    3.2.3 MeasuresofLocation:MeanandMedian 73
    3.2.4 MeasuresofSpread:RangeandVariance 75
    3.2.5 MultivariateSummaryStatistics 76
    3.2.6 OtherWaystoSummarizetheData 77
    3.3 Visualization 77
    3.3.1 MotivationsforVisualization 77
    3.3.2 GeneralConcepts 78
    3.3.3 Techniques 81
    3.3.4 VisualizingHigher-DimensionalData 90
    3.3.5 DosandDonts 94
    3.4 OLAPandMultidimensionalDataAnalysis 95
    3.4.1 RepresentingIrisDataasaMultidimensionalArray 95
    3.4.2 MultidimensionalData:TheGeneralCase 97
    3.4.3 AnalyzingMultidimensionalData 98
    3.4.4 FinalCommentsonMultidimensionalDataAnalysis 101
    3.5 BibliographicNotes 102
    3.6 Exercises 103

    4 Classification:BasicConcepts,DecisionTrees,andModelEvaluation 105
    4.1 Preliminaries 105
    4.2 GeneralApproachtoSolvingaClassificationProblem 107
    4.3 DecisionTreeInduction 108
    4.3.1 HowaDecisionTreeWorks 108
    4.3.2 HowtoBuildaDecisionTree 110
    4.3.3 MethodsforExpressingAttributeTestConditions 112
    4.3.4 MeasuresforSelectingtheBestSplit 114
    4.3.5 AlgorithmforDecisionTreeInduction 119
    4.3.6 AnExample:WebRobotDetection 120
    4.3.7 CharacteristicsofDecisionTreeInduction 122
    4.4 ModelOverfitting 125
    4.4.1 OverfittingDuetoPresenceofNoise 127
    4.4.2 OverfittingDuetoLackofRepresentativeSamples 129
    4.4.3 OverfittingandtheMultipleComparisonProcedure 129
    4.4.4 EstimationofGeneralizationErrors 131
    4.4.5 HandlingOverfittinginDecisionTreeInduction 134
    4.5 EvaluatingthePerformanceofaClassifier 135
    4.5.1 HoldoutMethod 136
    4.5.2 RandomSubsampling 136
    4.5.3 Cross-Validation 136
    4.5.4 Bootstrap 137
    4.6 MethodsforComparingClassifiers 137
    4.6.1 EstimatingaConfidenceIntervalforAccuracy 138
    4.6.2 ComparingthePerformanceof TwoModels 139
    4.6.3 ComparingthePerformanceofTwoClassifiers 140
    4.7 BibliographicNotes 141
    4.8 Exercises 144

    5 Classification:AlternativeTechniques 151
    5.1 Rule-BasedClassifier 151
    5.1.1 HowaRule-BasedClassifierWorks 153
    5.1.2 Rule-OrderingSchemes 154
    5.1.3 HowtoBuildaRule-BasedClassifier 155
    5.1.4 DirectMethodsforRuleExtraction 155
    5.1.5 IndirectMethodsforRuleExtraction 161
    5.1.6 CharacteristicsofRule-BasedClassifiers 163
    5.2 Nearest-Neighborclassifiers 163
    5.2.1 Algorithm 165
    5.2.2 CharacteristicsofNearest-NeighborClassifiers 165
    5.3 BayesianClassifiers 166
    5.3.1 BayesTheorem 166
    5.3.2 UsingtheBayesTheoremforClassification 168
    5.3.3 Na?veBayesClassifier 169
    5.3.4 BayesErrorRate 175
    5.3.5 BayesianBeliefNetworks 176
    5.4 ArtificialNeuralNetwork(ANN) 181
    5.4.1 Perceptron 181
    5.4.2 MultilayerArtificialNeuralNetwork 184
    5.4.3 CharacteristicsofANN 187
    5.5 SupportVectorMachine(SVM) 188
    5.5.1 MaximumMarginHyperplanes 188
    5.5.2 LinearSVM:SeparableCase 190
    5.5.3 LinearSVM:NonseparableCase 195
    5.5.4 NonlinearSVM 198
    5.5.5 CharacteristicsofSVM 203
    5.6 EnsembleMethods 203
    5.6.1 RationaleforEnsembleMethod 203
    5.6.2 MethodsforConstructinganEnsembleClassifier 204
    5.6.3 Bias-VarianceDecomposition 206
    5.6.4 Bagging 209
    5.6.5 Boosting 211
    5.6.6 RandomForests 215
    5.6.7 EmpiricalComparisonamongEnsembleMethods 216
    5.7 ClassImbalanceProblem 217
    5.7.1 AlternativeMetrics 218
    5.7.2 TheReceiverOperatingCharacteristicCurve 220
    5.7.3 Cost-SensitiveLearning 223
    5.7.4 Sampling-BasedApproaches 225
    5.8 MulticlassProblem 226
    5.9 BibliographicNotes 228
    5.10Exercises 233

    6 AssociationAnalysis:BasicConceptsandAlgorithms 241
    6.1 ProblemDefinition 242
    6.2 FrequentItemsetGeneration 244
    6.2.1 TheAprioriPrinciple 246
    6.2.2 FrequentItemsetGenerationintheAprioriAlgorithm 247
    6.2.3 CandidateGenerationandPruning 249
    6.2.4 SupportCounting 252
    6.2.5 ComputationalComplexity 255
    6.3 RuleGeneration 257
    6.3.1 Confidence-BasedPruning 258
    6.3.2 RuleGenerationinAprioriAlgorithm 258
    6.3.3 AnExample:CongressionalVotingRecords 259
    6.4 CompactRepresentationofFrequentItemsets 260
    6.4.1 MaximalFrequentItemsets 260
    6.4.2 ClosedFrequentItemsets 262
    6.5 AlternativeMethodsforGeneratingFrequentItemsets 264
    6.6 FP-GrowthAlgorithm 268
    6.6.1 FP-TreeRepresentation 268
    6.6.2 FrequentItemsetGenerationinFP-GrowthAlgorithm 270
    6.7 EvaluationofAssociationPatterns 273
    6.7.1 ObjectiveMeasuresofInterestingness 274
    6.7.2 MeasuresbeyondPairsofBinaryVariables 282
    6.7.3 SimpsonsParadox 283
    6.8 EffectofSkewedSupportDistribution 285
    6.9 BibliographicNotes 288
    6.10Exercises 298

    7 AssociationAnalysis:AdvancedConcepts 307
    7.1 HandlingCategoricalAttributes 307
    7.2 HandlingContinuousAttributes 309
    7.2.1 Discretization-BasedMethods 310
    7.2.2 Statistics-BasedMethods 312
    7.2.3 Non-discretizationMethods 314
    7.3 HandlingaConceptHierarchy 316
    7.4 SequentialPatterns 318
    7.4.1 ProblemFormulation 318
    7.4.2 SequentialPatternDiscovery 320
    7.4.3 TimingConstraints 323
    7.4.4 AlternativeCountingSchemes 327
    7.5 SubgraphPatterns 328
    7.5.1 GraphsandSubgraphs 329
    7.5.2 FrequentSubgraphMining 330
    7.5.3 Apriori-likeMethod 332
    7.5.4 CandidateGeneration 333
    7.5.5 CandidatePruning 338
    7.5.6 SupportCounting 340
    7.6 InfrequentPatterns 340
    7.6.1 NegativePatterns 341
    7.6.2 NegativelyCorrelatedPatterns 342
    7.6.3 ComparisonsamongInfrequentPatterns,NegativePatterns,andNegativelyCorrelatedPatterns 343
    7.6.4 TechniquesforMiningInterestingInfrequentPatterns 344
    7.6.5 TechniquesBasedonMiningNegativePatterns 345
    7.6.6 TechniquesBasedonSupportExpectation 347
    7.7 BibliographicNotes 350
    7.8 Exercises 353

    8 ClusterAnalysis:BasicConceptsandAlgorithms 363
    8.1 Overview 365
    8.1.1 WhatIsClusterAnalysis? 365
    8.1.2 DifferentTypesofClusterings 366
    8.1.3 DifferentTypesofClusters 368
    8.2 K-means 370
    8.2.1 TheBasicK-meansAlgorithm 371
    8.2.2 K-means:AdditionalIssues 378
    8.2.3 BisectingK-means 380
    8.2.4 K-meansandDifferentTypesofClusters 381
    8.2.5 StrengthsandWeaknesses 383
    8.2.6 K-meansasanOptimizationProblem 383
    8.3 AgglomerativeHierarchicalClustering 385
    8.3.1 BasicAgglomerativeHierarchicalClusteringAlgorithm 385
    8.3.2 SpecificTechniques 387
    8.3.3 TheLance-WilliamsFormulaforClusterProximity 391
    8.3.4 KeyIssuesinHierarchicalClustering 391
    8.3.5 StrengthsandWeaknesses 393
    8.4 DBSCAN 393
    8.4.1 TraditionalDensity:Center-BasedApproach 393
    8.4.2 TheDBSCANAlgorithm 394
    8.4.3 StrengthsandWeaknesses 398
    8.5 ClusterEvaluation 398
    8.5.1 Overview 399
    8.5.2 UnsupervisedClusterEvaluationUsingCohesionandSeparation 401
    8.5.3 UnsupervisedClusterEvaluationUsingtheProximityMatrix 406
    8.5.4 UnsupervisedEvaluationofHierarchicalClustering 408
    8.5.5 DeterminingtheCorrectNumberofClusters 409
    8.5.6 ClusteringTendency 410
    8.5.7 SupervisedMeasuresofClusterValidity 411
    8.5.8 AssessingtheSignificanceofClusterValidityMeasures 414
    8.6 BibliographicNotes 416
    8.7 Exercises 419

    9 ClusterAnalysis:AdditionalIssuesandAlgorithms 427
    9.1 CharacteristicsofData,Clusters,andClusteringAlgorithms 427
    9.1.1 Example:ComparingK-meansandDBSCAN 428
    9.1.2 DataCharacteristics 429
    9.1.3 ClusterCharacteristics 430
    9.1.4 GeneralCharacteristicsofClusteringAlgorithms 431
    9.2 Prototype-BasedClustering 433
    9.2.1 FuzzyClustering 433
    9.2.2 ClusteringUsingMixtureModels 437
    9.2.3 Self-OrganizingMaps(SOM) 446
    9.3 Density-BasedClustering 451
    9.3.1 Grid-BasedClustering 451
    9.3.2 SubspaceClustering 454
    9.3.3 DENCLUE:AKernel-BasedSchemeforDensity-BasedClustering 457
    9.4 Graph-BasedClustering 460
    9.4.1 Sparsification 461
    9.4.2 MinimumSpanningTree(MST)Clustering 462
    9.4.3 OPOSSUM:OptimalPartitioningofSparseSimilaritiesUsingMETIS 463
    9.4.4 Chameleon:HierarchicalClusteringwithDynamicModeling 464
    9.4.5 SharedNearestNeighborSimilarity 468
    9.4.6 TheJarvis-PatrickClusteringAlgorithm 471
    9.4.7 SNNDensity 472
    9.4.8 SNNDensity-BasedClustering 473
    9.5 ScalableClusteringAlgorithms 475
    9.5.1 Scalability:GeneralIssuesandApproaches 476
    9.5.2 BIRCH 477
    9.5.3 CURE 479
    9.6 WhichClusteringAlgorithm? 482
    9.7 BibliographicNotes 484
    9.8 Exercises 488

    10 AnomalyDetection 491
    10.1 Preliminaries 492
    10.1.1 CausesofAnomalies 492
    10.1.2 ApproachestoAnomalyDetection 493
    10.1.3 TheUseofClassLabels 494
    10.1.4 Issues 495
    10.2 StatisticalApproaches 496
    10.2.1 DetectingOutliersinaUnivariateNormalDistribution 497
    10.2.2 OutliersinaMultivariateNormalDistribution 499
    10.2.3 AMixtureModelApproachforAnomalyDetection 500
    10.2.4 StrengthsandWeaknesses 502
    10.3 Proximity-BasedOutlierDetection 502
    10.3.1 StrengthsandWeaknesses 503
    10.4 Density-BasedOutlierDetection 504
    10.4.1 DetectionofOutliersUsingRelativeDensity 505
    10.4.2 StrengthsandWeaknesses 506
    10.5 Clustering-BasedTechniques 506
    10.5.1 AssessingtheExtenttoWhichanObjectBelongstoaCluster 507
    10.5.2 ImpactofOutliersontheInitialClustering 509
    10.5.3 TheNumberofClusterstoUse 509
    10.5.4 StrengthsandWeaknesses 509
    10.6 BibliographicNotes 510
    10.7 Exercises 513
查看详情
系列丛书 / 更多
数据挖掘导论 (英文版)
算法(英文版•第4版)
[美]塞奇威克(Robert Sedgewick)、[美]韦恩(Kevin Wayne) 著
数据挖掘导论 (英文版)
计算机程序设计艺术(第2卷 英文版·第3版):半数值算法
[美]高德纳 著
数据挖掘导论 (英文版)
计算机程序设计艺术,卷4A:组合算法(一)(英文版)
[美]Donald E.Knuth 著
数据挖掘导论 (英文版)
计算机程序设计艺术(第3卷 英文版·第2版):排序与查找
[美]高德纳(Knuth D.E) 著
数据挖掘导论 (英文版)
C++Primer(英文版)(第4版)
李普曼 著
数据挖掘导论 (英文版)
信息检索:算法与启发式方法(英文版·第2版)
[美]格罗斯曼、[美]弗里德 著
数据挖掘导论 (英文版)
数据结构与算法分析:C++描述(英文版)(第3版)
[美]维斯 著
数据挖掘导论 (英文版)
UNIX环境高级编程
史蒂文斯、拉戈 著
数据挖掘导论 (英文版)
文本挖掘
[以色列]费尔德曼、[美]桑格 著
数据挖掘导论 (英文版)
Web数据挖掘:超文本数据的知识发现
[印]查凯莱巴蒂 著
数据挖掘导论 (英文版)
算法
[美]塞奇威克(Robert Sedgewick)、[美]韦恩(Kevin Wayne) 著
数据挖掘导论 (英文版)
IPv6详解,第1卷,核心协议实现:IPv6时代的《TCP/IP详解》!
[美]李清、[日]神明达哉、[日]岛庆一 著
相关图书 / 更多
数据挖掘导论 (英文版)
数据要素化治理
陆志鹏、孟庆国、王钺
数据挖掘导论 (英文版)
数据经济学(第二版)
汤珂、熊巧琴、李金璞、屈阳
数据挖掘导论 (英文版)
数据中台:让数据用起来 第2版 付登坡 等
付登坡 江敏 赵东辉 等
数据挖掘导论 (英文版)
数据资源管理 陈忆金 奉国和
陈忆金 奉国和
数据挖掘导论 (英文版)
数据资产:企业数字化转型的底层逻辑 蒋麒霖 郭丹
蒋麒霖 郭丹
数据挖掘导论 (英文版)
数据工程之道:设计和构建健壮的数据系统 [美]乔·里斯 [美]马特·豪斯利
[美]乔·里斯(Joe Reis),[美]马特·豪斯利(Matt Housley)
数据挖掘导论 (英文版)
数据合规实务指引 法律实务 朱晓娟主编 新华正版
朱晓娟主编
数据挖掘导论 (英文版)
数据保护官(DPO)法律实务指南
潘永建
数据挖掘导论 (英文版)
数据法学前沿
武长海
数据挖掘导论 (英文版)
数据库及其应用(2023年版) 全国高等教育自学考试指导委员会
全国高等教育自学考试指导委员会
数据挖掘导论 (英文版)
数据科学技术:文本分析和知识图谱
苏海波、刘译璟、易显维、苏萌
数据挖掘导论 (英文版)
数据治理驱动的数字化转型 王建峰 辛华
王建峰 辛华
您可能感兴趣 / 更多
数据挖掘导论 (英文版)
亚拉山大的读心术(数学大师的逻辑课) 伦理学、逻辑学 [美]雷蒙德·m.斯穆里安(raymondm.smullyan)
[美]雷蒙德·m.斯穆里安(raymondm.smullyan)
数据挖掘导论 (英文版)
纳博科夫精选集第五辑
[美]弗拉基米尔·纳博科夫著
数据挖掘导论 (英文版)
九桃盘(美国二十世纪重要女诗人玛丽安·摩尔诗歌精选集,由知名女诗人和女性诗学研究者倪志娟倾情翻译)
[美]玛丽安•摩尔
数据挖掘导论 (英文版)
全新正版图书 制造德·戴维尼浙江教育出版社9787572276880
[美]理查德·戴维尼
数据挖掘导论 (英文版)
血泪之泣
[美]希瑟·丘·麦克亚当
数据挖掘导论 (英文版)
小学生趣味心理学(培养执行技能的40个练习,发展共情能力的46个练习,学会应对焦虑的40个练习 共3册) (美)莎伦·格兰德 王佳妮译
[美]莎伦·格兰德(sharon grand)
数据挖掘导论 (英文版)
(当代学术棱镜译丛)艺术批评入门:历史、策略与声音
[美]克尔·休斯顿
数据挖掘导论 (英文版)
金钱游戏(划时代增订版):深层透析金融游戏表象之下的规则与黑箱 长达60年盘踞金融畅销榜的现象级作品
[美]亚当·史密斯(Adam Smith) 著;刘寅龙 译
数据挖掘导论 (英文版)
矿王谷的黎明:塞拉俱乐部诉莫顿案与美国环境法的转变(精装典藏版)
[美]丹尼尔·P.塞尔米,(Daniel,P.Selmi)
数据挖掘导论 (英文版)
体式神话:瑜伽传统故事精粹(第二版) (从30个体式古老起源中汲取灵感与力量,内附精美插图)
[美]阿兰娜·凯瓦娅(Alanna Kaivalya)[荷]阿诸那·范德·库伊(Arjuna van der Kooij)
数据挖掘导论 (英文版)
诺奖作家给孩子的阅读课·生命教育(3-9年级,莫言余华的文学启蒙,垫高阅读起点,提升作文能力)
[美]海明威等
数据挖掘导论 (英文版)
蚯蚓的日记(全4册)【平装版】
[美]朵琳·克罗宁