Blog

RIAK compact e/leveldb tombstones and reclaim disk space

  |   Linux   |   No comment

The Problem

Riak Logo
When attempting to reclaim disk space, deleting data may seem like the obvious first step. However, in Riak this is not necessarily the best thing to do if the disk is nearly full.
This is because deleting objects in Riak is complicated. As stated in the “tombstone” value for the key to the N vnodes that contain it
(this is a new record)

  • Riak by default, waits 3 seconds to verify all vnodes agree to the
    tombstone/delete
  • Riak issues an actual delete operation against the key to leveldb
  • leveldb creates its own tombstone
  • the leveldb tombstone “floats” through level-0 and level-1 as part of normal
    compactions
  • upon reaching level-2, leveldb will initiate immediate compaction and
    propagation of tombstones in .sst table files containing 1000 or more
    tombstones.
  • Consequence of this is that freeing disk space, if it happens either, it happens very slowly!

    Solution for e/leveldb

    leveldb Logo
    In short, there is an c++ function in leveldb that is used to compact the underlying storage. The function is called CompactRange.

    In particular, deleted and overwritten versions are discarded, and the data is rearranged to reduce the cost of operations needed to access the data.

    This function does not exists in the erlang code that uses this c++ library. This means that we needed to build an standalone tool that calls this library function on all leveldb files in Riak. Drawback of this is, that your Riak server has to be offline while running such an 3rd party tool

    Use case

    We build such an tool, you can check it out from github RiakToolsCxx.git and build it with cmake. External dependency leveldb-basho is pulled automatically by cmake.

    Checkout and build process could look like:

    dwalter@knxwork:~/Projects$ git clone https://github.com/hw-dwalter/RiakToolsCxx.git RiakToolsCxx
    Cloning into 'RiakToolsCxx'...
    remote: Counting objects: 11, done.
    remote: Total 11 (delta 0), reused 0 (delta 0), pack-reused 11
    Unpacking objects: 100% (11/11), done.
    dwalter@knxwork:~/Projects$ cd RiakToolsCxx/
    dwalter@knxwork:~/Projects/RiakToolsCxx$ mkdir build
    dwalter@knxwork:~/Projects/RiakToolsCxx$ cd build/
    dwalter@knxwork:~/Projects/RiakToolsCxx/build$ cmake ..
    -- The CXX compiler identification is GNU 6.3.0
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Performing Test COMPILER_SUPPORTS_CXX11
    -- Performing Test COMPILER_SUPPORTS_CXX11 - Success
    -- Performing Test COMPILER_SUPPORTS_CXX0X
    -- Performing Test COMPILER_SUPPORTS_CXX0X - Success
    -- The C compiler identification is GNU 6.3.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Boost version: 1.62.0
    -- Found the following Boost libraries:
    --   filesystem
    --   system
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /home/dwalter/Projects/RiakToolsCxx/build
    dwalter@knxwork:~/Projects/RiakToolsCxx/build$ make
    Scanning dependencies of target leveldb-basho
    [ 10%] Creating directories for 'leveldb-basho'
    [ 20%] Performing download step (git clone) for 'leveldb-basho'
    Cloning into 'leveldb-basho'...
    Already on 'develop'
    Your branch is up-to-date with 'origin/develop'.
    [ 30%] No patch step for 'leveldb-basho'
    [ 40%] No update step for 'leveldb-basho'
    [ 50%] No configure step for 'leveldb-basho'
    [ 60%] Performing build step for 'leveldb-basho'
    ar: creating libleveldb.a
    [ 70%] No install step for 'leveldb-basho'
    [ 80%] Completed 'leveldb-basho'
    [ 80%] Built target leveldb-basho
    Scanning dependencies of target riakcompact
    [ 90%] Building CXX object src/CMakeFiles/riakcompact.dir/main.cpp.o
    [100%] Linking CXX executable riakcompact
    [100%] Built target riakcompact
    dwalter@knxwork:~/Projects/RiakToolsCxx/build$ ./src/riakcompact 
    usage:   ./src/riakcompact [path]
    
    show checkout and build process
    After this tool is build you can use it like this. Take in mind that your Riak node has to be turned off!

    root@knxwork:/home/dwalter/Projects/RiakToolsCxx/build# ./src/riakcompact /var/lib/riak/leveldb/
    "/var/lib/riak/leveldb/91343852333181432387730302044767688728495783936" [directory]
    compacting...
    done
    "/var/lib/riak/leveldb/685078892498860742907977265335757665463718379520" [directory]
    compacting...
    done
    

    Conclusion

    After deleting an complete bucket in our riak (key by key), we are able to reduce consumed disk space from 75GB to 25GB with this tool!

    Compaction freed more than 66% of the data!

    66.6666666667 %
    No Comments

    Post A Comment