一起做个简单的数据库（十二）：扫描多级B树

栏目: IT技术 · 发布时间: 4年前

内容简介：用C语言从零开始实现SQLite clone系列：现在，我们支持构建多级B树，但在此过程中，我们破坏了select statements。这是一个插入15行并打印的测试案例：但当我们执行测试时，输出如下：

用 C语言从零开始实现SQLite clone系列：

现在，我们支持构建多级B树，但在此过程中，我们破坏了select statements。这是一个插入15行并打印的测试案例：

+  it 'prints all rows in a multi-level tree' do

+    script = []

+    (1..15).each do |i|

+      script << "insert #{i} user#{i} person#{i}@example.com"

+    end

+    script << "select"

+    script << ".exit"

+    result = run_script(script)

+

+    expect(result[15...result.length]).to match_array([

+      "db > (1, user1, person1@example.com)",

+      "(2, user2, person2@example.com)",

+      "(3, user3, person3@example.com)",

+      "(4, user4, person4@example.com)",

+      "(5, user5, person5@example.com)",

+      "(6, user6, person6@example.com)",

+      "(7, user7, person7@example.com)",

+      "(8, user8, person8@example.com)",

+      "(9, user9, person9@example.com)",

+      "(10, user10, person10@example.com)",

+      "(11, user11, person11@example.com)",

+      "(12, user12, person12@example.com)",

+      "(13, user13, person13@example.com)",

+      "(14, user14, person14@example.com)",

+      "(15, user15, person15@example.com)",

+      "Executed.", "db > ",

+    ])

+  end

但当我们执行测试时，输出如下：

db > select

(2, user1, person1@example.com)

Executed.

这有点奇怪，输出只打印了一行，而且输出也不对（用户名和ID对不上）。

问题源于execute_select()从表头开始执行，而我们的现在用的table_start()函数对根节点的返回值是cell0,但我们B树的根节点现在是内部节点，而且并没有存储任何行。节点为叶时，必须保留已打印的数据。 execute_select（）应该实际上返回最左边的叶节点的cell 0。

所以我们要摆脱掉旧的代码：

-Cursor* table_start(Table* table) {

-  Cursor* cursor = malloc(sizeof(Cursor));

-  cursor->table = table;

-  cursor->page_num = table->root_page_num;

-  cursor->cell_num = 0;

-

-  void* root_node = get_page(table->pager, table->root_page_num);

-  uint32_t num_cells = *leaf_node_num_cells(root_node);

-  cursor->end_of_table = (num_cells == 0);

-

-  return cursor;

-}

部署一段新代码来查找Key 0(最小的键)，即使key0在表中不存在，系统也会返回最低的ID（在最左侧叶子节点）

+Cursor* table_start(Table* table) {

+  Cursor* cursor =  table_find(table, 0);

+

+  void* node = get_page(table->pager, cursor->page_num);

+  uint32_t num_cells = *leaf_node_num_cells(node);

+  cursor->end_of_table = (num_cells == 0);

+

+  return cursor;

+}

有了这些变化，程序也只是打印出一个节点的行：

db > select

(1, user1, person1@example.com)

(2, user2, person2@example.com)

(3, user3, person3@example.com)

(4, user4, person4@example.com)

(5, user5, person5@example.com)

(6, user6, person6@example.com)

(7, user7, person7@example.com)

Executed.

db >

我们的B树有15个条目，由一个内部节点和两个叶子节点组成，看起来像这样：

一起做个简单的数据库（十二）：扫描多级B树

要扫描整个表，我们需要在到达第一个叶子节点的末尾之后跳到第二个叶子节点。为了实现这个功能，我们将在叶子节点标题中保存一个名为“ next_leaf”的新字段，该字段将用来保存叶子节点右侧同级别节点的页码。最右边的叶子节点的next_leaf值为0，表示没有同级节点（页面0始终为表的根节点保留）。

更新叶节点的头格式来添加新字段：

const uint32_t LEAF_NODE_NUM_CELLS_SIZE = sizeof(uint32_t);

const uint32_t LEAF_NODE_NUM_CELLS_OFFSET = COMMON_NODE_HEADER_SIZE;

-const uint32_t LEAF_NODE_HEADER_SIZE =

-    COMMON_NODE_HEADER_SIZE + LEAF_NODE_NUM_CELLS_SIZE;

+const uint32_t LEAF_NODE_NEXT_LEAF_SIZE = sizeof(uint32_t);

+const uint32_t LEAF_NODE_NEXT_LEAF_OFFSET =

+    LEAF_NODE_NUM_CELLS_OFFSET + LEAF_NODE_NUM_CELLS_SIZE;

+const uint32_t LEAF_NODE_HEADER_SIZE = COMMON_NODE_HEADER_SIZE +

+                                       LEAF_NODE_NUM_CELLS_SIZE +

+                                       LEAF_NODE_NEXT_LEAF_SIZE;

添加方法以访问新字段：

+uint32_t* leaf_node_next_leaf(void* node) {

+  return node + LEAF_NODE_NEXT_LEAF_OFFSET;

+}

当初始化新的叶子节点时，将next_leaf的值设置为0

@@ -322,6 +330,7 @@ void initialize_leaf_node(void* node) {

set_node_type(node, NODE_LEAF);

set_node_root(node, false);

*leaf_node_num_cells(node) = 0;

+  *leaf_node_next_leaf(node) = 0;  // 0 represents no sibling

}

当我们拆分叶节点时，都更新同级指针。旧叶子的同级节点变成新叶子，而新叶子的同级变成以前的旧叶子的同级。

@@ -659,6 +671,8 @@ void leaf_node_split_and_insert(Cursor* cursor, uint32_t key, Row* value) {

uint32_t new_page_num = get_unused_page_num(cursor->table->pager);

void* new_node = get_page(cursor->table->pager, new_page_num);

initialize_leaf_node(new_node);

+  *leaf_node_next_leaf(new_node) = *leaf_node_next_leaf(old_node);

+  *leaf_node_next_leaf(old_node) = new_page_num;

增加新的字段要相应改变一些常量:

it 'prints constants' do

 script = [

   ".constants",

@@ -199,9 +228,9 @@ describe 'database' do

   "db > Constants:",

   "ROW_SIZE: 293",

   "COMMON_NODE_HEADER_SIZE: 6",

-      "LEAF_NODE_HEADER_SIZE: 10",

+      "LEAF_NODE_HEADER_SIZE: 14",

   "LEAF_NODE_CELL_SIZE: 297",

-      "LEAF_NODE_SPACE_FOR_CELLS: 4086",

+      "LEAF_NODE_SPACE_FOR_CELLS: 4082",

   "LEAF_NODE_MAX_CELLS: 13",

   "db > ",

 ])

现在，无论何时只要我们预先将光标移到叶节点的末尾，就可以检查叶节点是否具有同级。如果有，就跳到这个节点。否则，我们就在表的末尾。

@@ -428,7 +432,15 @@ void cursor_advance(Cursor* cursor) {



cursor->cell_num += 1;

if (cursor->cell_num >= (*leaf_node_num_cells(node))) {

-    cursor->end_of_table = true;

+    /* Advance to next leaf node */

+    uint32_t next_page_num = *leaf_node_next_leaf(node);

+    if (next_page_num == 0) {

+      /* This was rightmost leaf */

+      cursor->end_of_table = true;

+    } else {

+      cursor->page_num = next_page_num;

+      cursor->cell_num = 0;

+    }

}

}

做了这些变更，程序可以打印15行了

db > select

(1, user1, person1@example.com )

(2, user2, person2@example.com )

(3, user3, person3@example.com )

(4, user4, person4@example.com )

(5, user5, person5@example.com )

(6, user6, person6@example.com )

(7, user7, person7@example.com )

(8, user8, person8@example.com )

(9, user9, person9@example.com )

(10, user10, person10@example.com )

(11, user11, person11@example.com )

(12, user12, person12@example.com )

(13, user13, person13@example.com )

(1919251317, 14, on14@example.com )

(15, user15, person15@example.com )

Executed.

db >

貌似有一行有问题：

(1919251317, 14, on14@example.com)

经过一段时间排错，我发现问题存在于我们拆分叶子节点：

@@ -676,7 +690,9 @@ void leaf_node_split_and_insert(Cursor* cursor, uint32_t key, Row* value) {

 void* destination = leaf_node_cell(destination_node, index_within_node);



 if (i == cursor->cell_num) {

-      serialize_row(value, destination);

+      serialize_row(value,

+                    leaf_node_value(destination_node, index_within_node));

+      *leaf_node_key(destination_node, index_within_node) = key;

 } else if (i > cursor->cell_num) {

   memcpy(destination, leaf_node_cell(old_node, i - 1), LEAF_NODE_CELL_SIZE);

 } else {

回忆一下，叶子节点的每个单元格包含先是键，然后跟着值

一起做个简单的数据库（十二）：扫描多级B树

我们之前把行（值）写入单元格的开始，那里也是键所在的位置。这意味着用户名的一部分侵入了ID的位置。解决这个问题以后，所有行都被如愿打印出来：

db > select

(1, user1, person1@example.com)

(2, user2, person2@example.com)

(3, user3, person3@example.com)

(4, user4, person4@example.com)

(5, user5, person5@example.com)

(6, user6, person6@example.com)

(7, user7, person7@example.com)

(8, user8, person8@example.com)

(9, user9, person9@example.com)

(10, user10, person10@example.com)

(11, user11, person11@example.com)

(12, user12, person12@example.com)

(13, user13, person13@example.com)

(14, user14, person14@example.com)

(15, user15, person15@example.com)

Executed.

db >

面对一个接一个的错误，我们正在努力向前。

原文链接： Part 12 - Scanning a Multi-Level B-Tree （翻译：吴世曦）

以上所述就是小编给大家介绍的《一起做个简单的数据库（十二）：扫描多级B树》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Microformats

John Allsopp / friends of ED / March 26, 2007 / $34.99

In this book, noted web developer and long time WaSP member John Allsop teaches all you need to know about the technology: what Microformats are currently available and how to use them; the general pr......一起来看看《Microformats》这本书的介绍吧!

码农工具

一起做个简单的数据库（十二）：扫描多级B树

Microformats

HTML 压缩/解压工具

MD5 加密

XML 在线格式化