1、 本科毕业论文 外文文献 及 译文 文献、资料题目: Cluster Analysis Basic Concepts and Algorithms 文献、资料来源: http:/ 文献、资料发表(出版)日期: 院 (部): 土木工程学院 专 业: 土木工程 班 级: 姓 名: 学 号: 指导教师: 翻译日期: 山东建筑大学 毕业论文 外文文献 及译文 - 1 - 外文 文献 : Cluster Analysis Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are
2、meaningful, useful,or both. If meaningful groups are the goal, then the clusters should capture the natural structure of the data. In some cases, however, cluster analysis is only a useful starting point for other purposes, such as data summarization. Whether for understanding or utility, cluster an
3、alysis has long played an important role in a wide variety of elds: psychology and other social sciences, biology,statistics, pattern recognition, information retrieval, machine learning, and data mining. There have been many applications of cluster analysis to practical problems. We provide some sp
4、ecic examples, organized by whether the purpose of the clustering is understanding or utility. Clustering for Understanding Classes, or conceptually meaningful groups of objects that share common characteristics, play an important role in how people analyze and describe the world. Indeed, human bein
5、gs are skilled at dividing objects into groups (clustering) and assigning particular objects to these groups (classication). For example, even relatively young children can quickly label the objects in a photograph as buildings, vehicles, people, animals, plants, etc. In the context of understanding
6、 data, clusters are potential classes and cluster analysis is the study of techniques for automatically nding classes. The following are some examples: Biology. Biologists have spent many years creating a taxonomy (hierarchical classication) of all living things: kingdom, phylum, class,order, family
7、, genus, and species. Thus, it is perhaps not surprising that much of the early work in cluster analys is sought to create a discipline of mathematical taxonomy that could automatically nd such classication structures. More recently, biologists have applied clustering to analyze the large amounts of
8、 genetic information that are now available. For example, clustering has been used to nd groups of genes that have similar functions. Information Retrieval. The World Wide Web consists of billions of Web pages, and 山东建筑大学 毕业论文 外文文献 及译文 - 2 - the results of a query to a search engine can return thous
9、ands of pages. Clustering can be used to group these search results into a small number of clusters, each of which captures a particular aspect of the query. For instance, a query of movie might return Web pages grouped into categories such as reviews, trailers, stars, and theaters. Each category (c
10、luster) can be broken into subcategories (sub-clusters), producing a hierarchical structure that further assists a users exploration of the query results. Climate. Understanding the Earths climate requires nding patternsin the atmosphere and ocean. To that end, cluster analysis has been applied to n
11、d patterns in the atmospheric pressure of polar regions and areas of the ocean that have a signicant impact on land climate. Psychology and Medicine. An illness or condition frequently has a number of variations, and cluster analysis can be used to identify these different subcategories. For example
12、, clustering has been used to identify different types of depression. Cluster analysis can also be used to detect patterns in the spatial or temporal distribution of a disease. Business. Businesses collect large amounts of information on current and potential customers. Clustering can be used to seg
13、ment customers into a small number of groups for additional analysis and marketing activities. Clustering for Utility: Cluster analysis provides an abstraction from individual data objects to the clusters in which those data objects reside. Additionally, some clustering techniques characterize each
14、cluster in terms of a cluster prototype; i.e., a data object that is representative of the other objects in the cluster. These cluster prototypes can be used as the basis for a number of data analysis or data processing techniques. Therefore, in the context of utility, cluster analysis is the study
15、of techniques for nding the most representative cluster prototypes. Summarization. Many data analysis techniques, such as regression or PCA, have a time or space complexity of O(m2) or higher (where m is the number of objects), and thus, are not practical for large data sets. However, instead of app
16、lying the algorithm to the entire data set, it can be applied to a reduced data set consisting only of cluster prototypes. Depending on the type of analysis, the number of prototypes, and the accuracy with which the prototypes represent the data, the results can be comparable to those that would have