当前位置：文档库 › 分布式数据库查询优化研究与实现

分布式数据库查询优化研究与实现

哈尔滨工业大学工程硕士学位论文

Abstract

Based on the research of distributed database query optimization, this paper proposes some algorithms to speed up the query processing of distributed database system. Firstly the paper explains the basic concept of distributed database system, including the distributed database systems’s definition, classification, pattern and architecture, as well as the advantages and disadvantages of the distributed database system.

The third chapter introduces the content of query optimization in distributed database system, including the goal of query optimization in distributed database system, the hierarchical structure of the distributed query processing and the function of each layer structure proposed above. Then this article introduces two kinds of query optimization algorithms commonly used: query optimization algorithm based on full-join and query optimization algorithm based on semi-join. Reducing the number of the tuples in the relation involved in the join operation is the main idea of the semi-join algorithm. The semi-join algorithm is suitable for the wide-area network.

In this paper, an improved algorithm is proposed to solve the problem that semi join operation in SDD-1 algorithm can not be executed parallelly. The improved algorithm first adds some redundant join expression to the query graph according to the connection property between the relations, and the new query graph with redundant conditions is called QGq+. Finding all segmentation points in the graph in figure QGq+,and using the Breadth-First-Search algorithm generates each query block with the information of all segmentation in the query graph. Then using Kursal algorithm generates minimum spanning tree for each query block. Using the SDD-1 algorithm to reduce the number of relation tuples in the minimum spanning tree for each query block. The multiple query blocks can be executed in parallel. he shortcoming that the SDD-1 algorithm can only be carried out in single step before the partition of query graph can be solved, which significantly reduces the cost of the distributed system and speeds up the query response time. Finally, the test results show that the algorithm can significantly reduce the amount of intermediate result data, effectively reduce the total cost of network communication, and improve the efficiency of optimization.

Keywords: distributed database system,SDD-1 algorithm,semi-join

摘要 .......................................................................................................................... I ABSTRACT................................................................................................................ II 第1章绪论 .. (1)

1.1课题背景及研究的目的和意义 (1)

1.2与本课题有关的国内外研究状况 (2)

1.2.1 分布式数据库系统的发展 (2)

1.2.2 分布式查询优化技术的发展 (5)

1.3本文的主要研究内容及论文结构 (7)

第2章分布式数据库系统介绍 (9)

2.1分布式数据库系统定义 (9)

2.2分布式数据库系统的分类 (9)

2.2.1 按数据管理模型分类 (9)

2.2.2 按系统全局控制类型分类 (10)

2.3分布式数据库系统的结构 (10)

2.3.1 DDBS的物理结构和逻辑结构 (11)

2.3.2 DDBS的三层模式结构 (12)

2.4本章小结 (14)

第3章分布式查询优化技术 (15)

3.1查询优化概述 (15)

3.1.1 查询优化 (15)

3.1.2 分布式查询处理的目标 (17)

3.2基于半连接算法的查询优化 (19)

3.3基于直接连接的查询优化算法 (21)

3.3.1 直接连接操作的策略 (21)

3.3.2 站点依赖算法 (21)

3.3.3 分片和复制算法 (22)

3.3.4 Hash划分算法 (23)

3.4本章小结 (24)

第4章基于查询图和SDD-1算法的查询优化算法 (25)

4.1QGSDD-1算法的理论基础 (25)

4.1.1 SDD-1算法 (25)

4.1.2 查询图QGq及其冗余查询图QGq+和生成树的定义 (29)

4.1.3 kruskal启发式算法 (33)

4.2基于查询图和SDD-1的优化算法的实现 (35)

4.2.1 QGSDD-1算法核心思想 (35)

4.2.2 查询图QGq+的边界点和分割 (35)

4.2.3 查询块的连接优化 (40)

4.2.4 QGSDD-1算法的基本步骤 (41)

4.2.5 QGSDD-1算法使用示例 (42)

4.3算法分析 (45)

4.4本章小结 (45)

第5章测试 (46)

5.1测试系统环境 (46)

5.1.1 测试机器配置 (46)

5.1.2 TPC-H介绍 (48)

5.2测试过程与结果分析 (48)

5.2.1 测试过程 (48)

5.2.2 测试结果分析 (49)

5.3本章小结 (53)

结论 (54)

参考文献 (55)

哈尔滨工业大学学位论文原创性声明和使用权限 (58)

致谢 (59)

个人简历 (60)