B+ Tree

A B+ tree is a balanced binary search tree that follows multi-level index format. The leaf nodes of binary tree denote actual data pointers.

  • Supports Equality and range searches
  • All of the data are stored at the leaves.
  • All of the keys appear at the leaves (in sorted order)
  • Leaf nodes are linked together (in order) to create a linked list

Please find below organization for a B+tree having order m

Node TypeChild TypeMin Number of ChildrenMax Number of ChildrenMin Number of ValuesMax Number of ValuesExample m=5
Root Node( Leaf node.It is the only node in tree)Records1m-11m-11-4 values in a node
Root Node(Not a leaf)Internal Node or leaf Node2m1m-12-5 children
Internal NodeInternal Node or leaf NodeCeil(m/2)mCeil(m/2)-1m-13-5 children
Leaf NodeRecordsCeil(m/2)-1m-1Ceil(m/2)-1m-12-4 values in a node.
Structure of internal nodes of a B+ tree

    • Each internal node is of the form:
      <P1,K1,P2,K2,……..,Pm-1,Km-1,Pm>
      Pi –> Tree Pointer and
      within each internal node K1 < K2 < Km-1
    • For all Search values X
      Ki-1 < X < Ki for 1 < i < m
      X <= Ki for i=1 and
      Ki-1 < X for i=m
  • Each internal node has at most m pointers.
  • Each internal node has at least ⌈m/2⌉ tree pointers.
  • Root node has at least 2 pointers if it is an internal node.
  • An internal node with k pointers, k<=m, has k-1 search field values.

Structure of leaf nodes of a B+ tree

  • Each Leaf node is of the form
    <<K1,R1>,<K2,R2>,…………,<Km-1,Rm-1>,Rnext> where
    Ri –> Data Pointer and
    Rnext points to next leaf node of the B+ tree.
  • within each leaf node , K1 < K2 < …… < Km-1
  • Each Ri is a data pointer that points to the records whose search field value is Ki or to a file block containing the record (or a block of record pointers that point to records whose search field value is Ki if the search field is not a key)
  • Each leaf node has at least ⌈m/2⌉ -1 values.
  • All leaf nodes are at the same level.

Insertion algorithm

    • If the node has an empty space, insert the key/reference pair into the node.
    • If the node is already full, split it into two nodes, distributing the keys evenly between the two nodes.
      • If node is a leaf node, copy the smallest value from the second node and insert it into parent node.
      • If node is internal node, remove the middle value while splitting and insert it into parent node.
      • Repeat the insertion algorithm in both the above cases.
    • For Example, insert 1,4,16,25,9,20,13,15,10,11,12

Insert 1,4,16

Insert 25,9

Insert 20
Insert 13
Insert 15

Insert 10
Insert 11
Insert 12
Deletion algorithm

  • B+ tree entries are deleted at the leaf nodes.
  • Delete X from leaf
    • If X is present in internal node then replace X with the new leftmost value in the internal node.
  • After deletion, Underflow is tested. If underflow occurs then
    • Shift the values from the siblings if siblings have values greater than the lower limit.
    • If siblings is already at lower limits then merge nodes
      • This removes a key from parent.
      • Merge parent with sibling and propagate the changes up to the root node.
  • Example
    • Consider the below tree (m=5) i.e. order is 5
    • Delete 70
      • 70 is removed from leaf node 60,65,70,75
      • No other change is required.
    • Delete 25
      • 25 is deleted from leaf node 25,28,30
      • 25 is present in the internal node. So 25 is replaced by new left most value(28) in the internal node.
  • Example 2: Consider the below tree of order 5
    • Delete 60
      • 60 is deleted from the leaf node 60,65
      • Replace 60 from the internal node with new left most key value(65)
      • Leaf is now underflow
        • Can’t shift from sibling as siblings are already at lower limit.
        • Merge with sibling 75,80. This will remove parent value 75.
        • Internal node is now below limit. So combine with sibling.
          • This will eliminate topmost root.
  • For a m order B+ tree with n key values
    • Find, Insert and delete all are O(logmn)
    • Range queries can be done in O(logmn+k) for a range of k.
  • For a key size k and a pointer size p
    •  A node holding n keys needs k n + p( n + 1) =(k + p )n + p bytes
    • Example Key size = pointer size= 8 bytes
      • A node holding n keys needs a minimum of 16n + 8 bytes

Leave a Reply

avatar
  Subscribe  
Notify of