1、 1 The Concepts and Design of Distributed DBMS 1. INTRODUCTION A major behind the development of database systems is the desire to integrate the operational data of an organization and to provide controlled access to the data. Although integration and controlled access may imply centralization, this
2、 is not the intention. In fact, the development of computer networks promotes a decentralized mode of work. This decentralized approach mirrors the organizational structure of many companies, which are logically distributed into divisions, departments, projects, and so on, and physically distributed
3、 into offices, plants, factories, where each nit maintains its own operational data. The shareability of the data and the efficiency of data access should be improved by the development of a distributed database system that reflects this organizational structure, makes the data in all units accessib
4、le, and stores data proximate to the location where it is most frequently used. Distributed DBMSs should help resolve the islands of information problem. Databases are sometimes regarded, as electronic islands that are distinct and generally inaccessible places, like remote islands. This may be a re
5、sult of geographical separation, incompatible computer architectures, incompatible communication protocols, and so on. Integrating the databases into a logical whole may prevent this way of thinking. 2 Concepts To start the discussion of distributed DBMSs, we first give a definition of a distributed
6、 database. Distributed database: a logically interrelated collection of shared data physically distributed over a computer network. Following on from this we have the definition of distributed DBMS. Distributed DBMS: the software system that permits the management of the distributed database and mak
7、es the distribution transparent to users. A distributed database management system consists of a single logical database that is split into a number of fragments. Each fragment is stored on one or more computers under the control of a separate DBMS, with the computers connected by a communications n
8、etwork. Each site is capable of independently processing user requests that require access to local data and is also capable of processing data stored on other computers in the network. Users access the distributed database via application. Applications are classified as those that do not require da
9、ta from other sites and those that do require data from other sites. We require a DBMSs to have at least one global application. A DDBMS therefore has the 2 following characteristics: A collection of logically related shared data; The data is split into a number of fragments; Fragments may be replic
10、ated; Fragments/replicas are allocated to sites; The sites are linked by a communications network; The data at each site is under the control of a DBMS; The DBMS at each site can handle local applications, autonomously; Each DBMS participates in at least one global application; From the definition o
11、f the DDBMS, the system is expected to make the distribution transparent to the user. Thus, the fact that a distributed database is split tinto fragments that can be stored on different computers and perhaps replicated, should be hidden from the user. The objective of transparency is to make the dis
12、tributed system appear like a centralized system. This is sometimes referred to as the fundamental principle of distributed DBMSs. Advantages and Disadvantages of DDBMSs The distribution of data and applications has potential advantages over traditional centralized database systems. Unfortunately, t
13、here are also disadvantages. In this section, we review the advantages and disadvantages of the DDBMS. Advantages Reflects organizational structure Many organizations are naturally distributed over several locations. For example, DreamHome has many officers in different cities. It is natural for dat
14、abases used in such an application to be distributed over these locations. DreamHome may keep a database at each branch office containing details of such things as the staff who work at that location, the properties that are for rent, and the clients whoown or wish to rent out these properties. The
15、staff at a branch office will make local inquiries of the databases. The company headquarters may wish to make global inquiries involving the access of data at all or a number of branches. Improved shareability and local autonomy The geographical distribution of an organization can be reflected in t
16、he distribution of the data; users at one site can access data stored at other sties. Data can be placed at the site close to the users who normally use that data. In this way, users have local control of the data, and they can consequently establish and enforce local policies regarding the use of t
17、his data. A global database administrator is responsible for the entire system. Generally, part of this 3 responsibility is devolved to the local level, so that the local DBA can manage the local DBMS. Improved availability In a centralized DBMS, a computer failure terminates the operations of the D
18、BMS. However, a failure at one site of a DBMS, or a failure of a communication link making some sites inaccessible, does not make the entire system inoperable. Distributed DBMSs are designed to continue to function despite such failures. If a single node fails, the system may be able to reroute the
19、failed nodes requests to another site. Improved reliability As data may be replicated so that it exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible. Improved performance As the data is located near the site of greatest demand,
20、and given the inherent parallelism of distributed DBMSs, speed of database access may be better than that achievable form a remote centralized database. Furthermore, since each site handles only a part of the entire database, there may not be the same contention for CPU and I/O services as character
21、ized by a centralized DBMS. Economics In the 1960s,computing power was calculated according to the square of the costs of the equipment: three times the cost would provide nine times the power. This was known as Grouchs Law. However, it is now generally accepted that it costs much less to create a s
22、ystem of smaller computers with the equivalent power of a single large computer. This makes it more cost-effective for corporate divisions and departments to obtain separate computers. It is also much more cost-effective to add workstations to a network than to update a mainframe system. The second
23、potential cost saving occurs where databases are geographically remote and the applications require access to distributed data. In such cases, owing to the relative expense of data being transmitted across the network as opposed to the cost of local access, it may be much more economical to partitio
24、n the application and perform the processing locally at each site. Modular growth In a distributed environment, it is much easier to handle expansion. New sites can be added to the network without affecting the operations of other sites. This flexibility allows an organization to expand relatively easily. Increasing database size can usually be handled by adding processing and storage power to the network. In a centralized DBMS, a growth may