RIAK compact e/leveldb tombstones and reclaim disk space

Insider Blog

Author

Characteristics

released:

September 26, 2017

categories:

What moves us

Tags:

SSD

The Problem

When attempting to reclaim disk space, deleting data may seem like the obvious first step. However, in Riak this is not necessarily the best thing to do if the disk is nearly full. This is because deleting objects in Riak is complicated. As stated in the object-deletion section of latest Riak documentation:

In single-server, non-clustered data storage systems, object deletion is a trivial process. In an eventually consistent, clustered system like Riak, however, object deletion is far less trivial because objects live on multiple nodes, which means that a deletion process must be chosen to determine when an object can be removed from the storage backend.

How deletion works

  1. Riak writes a “tombstone” value for the key to the N vnodes that contain it (this is a new record)
  2. Riak by default, waits 3 seconds to verify all vnodes agree to the tombstone/delete
  3. Riak issues an actual delete operation against the key to leveldb
  4. leveldb creates its own tombstone
  5. the leveldb tombstone “floats” through level-0 and level-1 as part of normal compactions
  6. upon reaching level-2, leveldb will initiate immediate compaction and propagation of tombstones in .sst table files containing 1000 or more tombstones.

Consequence of this is that freeing disk space, if it happens either, it happens very slowly.

Solution for e/leveldb

In short, there is a c++ function in leveldb that is used to compact the underlying storage. The function is called "CompactRange".

In particular, deleted and overwritten versions are discarded, and the data is rearranged to reduce the cost of operations needed to access the data.

This function does not exist in the erlang code that uses this c++ library. This means that we needed to build a standalone tool that calls this library function on all leveldb files in Riak. Drawback of this is, that your Riak server has to be offline while running such a 3rd party tool.

Use case

We build such a tool, you can check it out from github RiakToolsCxx.git and build it with cmake. External dependency leveldb-basho is pulled automatically by cmake.

Checkout and build process could look like:

dwalter@knxwork:~/Projects$ git clone https://github.com/hw-dwalter/RiakToolsCxx.git RiakToolsCxx
Cloning into 'RiakToolsCxx'...
remote: Counting objects: 11, done.
remote: Total 11 (delta 0), reused 0 (delta 0), pack-reused 11
Unpacking objects: 100% (11/11), done.
dwalter@knxwork:~/Projects$ cd RiakToolsCxx/
dwalter@knxwork:~/Projects/RiakToolsCxx$ mkdir build
dwalter@knxwork:~/Projects/RiakToolsCxx$ cd build/
dwalter@knxwork:~/Projects/RiakToolsCxx/build$ cmake ..
-- The CXX compiler identification is GNU 6.3.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test COMPILER_SUPPORTS_CXX11
-- Performing Test COMPILER_SUPPORTS_CXX11 - Success
-- Performing Test COMPILER_SUPPORTS_CXX0X
-- Performing Test COMPILER_SUPPORTS_CXX0X - Success
-- The C compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Boost version: 1.62.0
-- Found the following Boost libraries:
--   filesystem
--   system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dwalter/Projects/RiakToolsCxx/build
dwalter@knxwork:~/Projects/RiakToolsCxx/build$ make
Scanning dependencies of target leveldb-basho
[ 10%] Creating directories for 'leveldb-basho'
[ 20%] Performing download step (git clone) for 'leveldb-basho'
Cloning into 'leveldb-basho'...
Already on 'develop'
Your branch is up-to-date with 'origin/develop'.
[ 30%] No patch step for 'leveldb-basho'
[ 40%] No update step for 'leveldb-basho'
[ 50%] No configure step for 'leveldb-basho'
[ 60%] Performing build step for 'leveldb-basho'
ar: creating libleveldb.a
[ 70%] No install step for 'leveldb-basho'
[ 80%] Completed 'leveldb-basho'
[ 80%] Built target leveldb-basho
Scanning dependencies of target riakcompact
[ 90%] Building CXX object src/CMakeFiles/riakcompact.dir/main.cpp.o
[100%] Linking CXX executable riakcompact
[100%] Built target riakcompact
dwalter@knxwork:~/Projects/RiakToolsCxx/build$ ./src/riakcompact 
usage:   ./src/riakcompact [path]

After this tool is build you can use it like this. Take in mind that your Riak node has to be turned off!

root@knxwork:/home/dwalter/Projects/RiakToolsCxx/build# ./src/riakcompact /var/lib/riak/leveldb/
"/var/lib/riak/leveldb/91343852333181432387730302044767688728495783936" [directory]
compacting...
done
"/var/lib/riak/leveldb/685078892498860742907977265335757665463718379520" [directory]
compacting...
done

Conclusion

After deleting a complete bucket in our riak (key by key), we are able to reduce consumed disk space from 75GB to 25GB with this tool!

Compaction freed more than 66% of the data!

previous article
For developers, for store operators - We don't work with arbitrary frameworks, libraries, plugins and programming languages. We love it professionally. Software created at Helm & Walter should be powerful and lean. Here are our favorites: PHP (recursive acronym for PHP: hypertext preprocessor) is a widely used programming language for ...
June 27, 2017
Bernd Helm
next article
Comment and investigation in RAID performance RAID 5 vs RAID10 has been discussed for ages; it's common knowledge that RAID10 offers better performance – but how much depends on the actual implementation, hardware and use-case. I just got a server with 4 x 16TB …
November 6, 2019
Bernd Helm