1、 毕 业 设 计 (论文 ) 外 文 文 献 翻 译 专业 理学院 学生姓名 李洪辉 班级 计科 092 学号 200901051 指导教师 姚惠萍 1 英文原文 Introduction to Data Mining Abstract: Microsoft SQL Server 2005 provides an integrated environment for creating and working with data mining models. This tutorial uses four scenarios, targeted mailing, forecasting, mar
2、ket basket, and sequence clustering, to demonstrate how to use the mining model algorithms, mining model viewers, and data mining tools that are included in this release of SQL Server. Introduction The data mining tutorial is designed to walk you through the process of creating data mining models in
3、 Microsoft SQL Server 2005. The data mining algorithms and tools in SQL Server 2005 make it easy to build a comprehensive solution for a variety of projects, including market basket analysis, forecasting analysis, and targeted mailing analysis. The scenarios for these solutions are explained in grea
4、ter detail later in the tutorial. The most visible components in SQL Server 2005 are the workspaces that you use to create and work with data mining models. The online analytical processing (OLAP) and data mining tools are consolidated into two working environments: Business Intelligence Development
5、 Studio and SQL Server Management Studio. Using Business Intelligence Development Studio, you can develop an Analysis Services project disconnected from the server. When the project is ready, you can deploy it to the server. You can also work directly against the server. The main function of SQL Ser
6、ver Management Studio is to manage the server. Each environment is described in more detail later in this introduction. For more information on choosing between the two environments, see Choosing Between SQL Server Management Studio and Business Intelligence Development Studio in SQL Server Books On
7、line. All of the data mining tools exist in the data mining editor. Using the editor you can manage mining models, create new models, view models, compare models, and create predictions based on existing models. After you build a mining model, you will want to explore it, looking for interesting pat
8、terns and rules. Each mining model viewer in the editor is customized to explore models built with a specific algorithm. For more information about the viewers, see Viewing a Data Mining Model in SQL Server Books Online. Often your project will contain several mining models, so before you can use a
9、model to create predictions, you need to be able to determine which model is the most accurate. For this reason, the editor contains a model comparison tool called the Mining Accuracy Chart tab. Using this tool you can compare the predictive accuracy of your models and determine the best model. To c
10、reate predictions, you will use the Data Mining Extensions (DMX) language. DMX extends SQL, containing commands to create, modify, and predict against mining models. For 2 more information about DMX, see Data Mining Extensions (DMX) Reference in SQL Server Books Online. Because creating a prediction
11、 can be complicated, the data mining editor contains a tool called Prediction Query Builder, which allows you to build queries using a graphical interface. You can also view the DMX code that is generated by the query builder. Just as important as the tools that you use to work with and create data
12、mining models are the mechanics by which they are created. The key to creating a mining model is the data mining algorithm. The algorithm finds patterns in the data that you pass it, and it translates them into a mining model it is the engine behind the process. Some of the most important steps in c
13、reating a data mining solution are consolidating, cleaning, and preparing the data to be used to create the mining models. SQL Server 2005 includes the Data Transformation Services (DTS) working environment, which contains tools that you can use to clean, validate, and prepare your data. For more in
14、formation on using DTS in conjunction with a data mining solution, see DTS Data Mining Tasks and Transformations in SQL Server Books Online. In order to demonstrate the SQL Server data mining features, this tutorial uses a new sample database called AdventureWorksDW. The database is included with SQ
15、L Server 2005, and it supports OLAP and data mining functionality. In order to make the sample database available, you need to select the sample database at the installation time in the “Advanced” dialog for component selection. Adventure Works AdventureWorksDW is based on a fictional bicycle manufa
16、cturing company named Adventure Works Cycles. Adventure Works produces and distributes metal and composite bicycles to North American, European, and Asian commercial markets. The base of operations is located in Bothell, Washington with 500 employees, and several regional sales teams are located thr
17、oughout their market base. Adventure Works sells products wholesale to specialty shops and to individuals through the Internet. For the data mining exercises, you will work with the AdventureWorksDW Internet sales tables, which contain realistic patterns that work well for data mining exercises. For
18、 more information on Adventure Works Cycles see Sample Databases and Business Scenarios in SQL Server Books Online. Database Details The Internet sales schema contains information about 9,242 customers. These customers live in six countries, which are combined into three regions: North America (83%) Europe (12%) Australia (7%)