read

Last year I posted about static_any and I am back with a similar container that I implemented recently: inplace_string.

For the record, static_any should have been be called inplace_any, but at this time I chose the prefix static as Boost.StaticVector was popular and it was clear to everybody what to expect from such a container.

inplace_string

The idea behind inplace_string is to get a full replacement of C++17’s std::string, with an in-place memory storage:

using Name = inplace_string<15>; // 16 bytes on stack, size included
Name name = "foo"; 

auto it = name.find("r");
assert(it == Name::npos);

name += "bar";
  
std::string str(name); // implicit string_view construction


What you get as benefits:

  • Faster string operations: string construction and destruction are much faster since no memory allocation takes place. Other operations are also slightly faster: as its buffer does not grow, inplace_string benefits from a simpler code with less branches.
  • Cache-friendly: it stays close to your other class members, your cache likes it.
  • Simplicity: it allows you to avoid dynamic allocations without implementing your own memory allocator.

As for the implementation, no surprises: the underlying container is std::array<CharT, N> and it uses the famous trick — made popular by fbstring — of storing the remaining size within the last byte of the string — if you don’t know it yet look here and think how the value will change when the string reaches the maximum capacity.

One of the only difference with std::string is the presence of a constructor from const CharT(&)[M]), allowing a compile-time error it the input exceeds the maximum capacity:

inplace_string<5> too_small;
too_small = "foobar"; // error: static_assert failed "basic_inplace_string: size exceeds maximum capacity"


std::string’s SSO

At first glance, it might seem that inplace_string overlaps with std::string’s SSO (Short String Optimization), although it does not.

One of the reason is that SSO is implementation dependent:

  • 15 bytes with gcc 5
  • 22 bytes with clang 5
  • 15 bytes with VS 2015 and 2017

Another reason is that even though everybody thinks that using gcc 5 is enough to get SSO, it is wrong: as of this writing, most of the popular Linux distributions (CentOS, Debian, RedHat) patched the glibc/gcc to keep the previous std::string implementation without SSO (and with COW!). This has been done to not break std::string’s ABI when C++11 has been release, cf this post on RedHat developers’ blog.

More importantly, inplace_string guarantees that your code will never allocate. If, for any reason, your string grows more than you expected, it will throw… and you will know that you were wrong, as your application will likely crash! On the other hand, crossing fingers and relying on std::string’s SSO is somewhat a bet, and you will never know if one day, strings get bigger and start allocating.

string_view

In the current C++ world, std::string has a monopoly and the following code is common practice:

void foo(const std::string& bar)
{
   ...
}


In a code base with std::string as unique string class, this code is reasonable — it is not when multiple string classes live together, as it defeats the purpose of not using std::string if you need to do conversions all over the place.

foo("foobar"); // compiles, but allocates

inplace_string<15> ss = "foobar";
foo(ss); // does not compile, std::string does not know inplace_string

foo(ss.c_str()); // compiles, but allocates and copies the string


To solve this issue, C++17 brings us std::string_view: std::string, inplace_string and friends implement an implicit cast operator to std::string_view, and mixing all of them work nicely:

void foo(std::string_view bar)
{
   ...
}

foo("foobar"); // compiles, does not allocate

inplace_string<15> ss = "foobar";
foo(ss); // compiles, building string_view is cheap (taking pointer + size)

std::string s = "foobar";
foo(s); // compiles


Mixing in-place containers

Let’s consider the following code:

using MetadataTag = std::string;
using MetadataValue = std::experimental::any;

using Metadata = std::pair<MetadataTag, MetadataValue>;

struct MetadataTree 
{
    template <typename StringT, typename ValueT>
    void Add(StringT&& str, ValueT&& value)
    {
        _metadata.emplace_back(std::forward<StringT>(str), std::forward<ValueT>(value));
    }

    std::vector<Metadata> _metadata;
};


This object is very similar to QVariantMap, which is a QMap<QString, QVariant>QVariant is actually not like std::variant, but similar to std::any. I use this pattern a lot, as it is convenient to use a string to index various objects. One issue with this was speed: having to allocate memory due to many std::string is a major obstacle.

Its equivalent with in-place containers could be:

using MetadataTag = inplace_string<15>;
using MetadataValue = static_any<16>;

using Metadata = std::pair<MetadataTag, MetadataValue>;

struct MetadataTree 
{
    template <typename StringT, typename ValueT>
    void Add(StringT&& str, ValueT&& value)
    {
        _metadata.emplace_back(std::forward<StringT>(str), std::forward<ValueT>(value));
    }

    static constexpr MaxCapacity = 8;
    boost::container::static_vector<Metadata, MaxCapacity> _metadata;
};

Now, let’s benchmark these two guys — I know, it is unfair, but we all love looking at numbers! For that, let’s take a simple usage of our MetadataTree class:

template <typename TreeT>
TreeT GetTree(int i, double d, bool b)
{
  TreeT tree;
  tree.Add("metadata1", i);
  tree.Add("metadata2", d);
  tree.Add("metadata3", b);
  return tree;
}

… and the result (time, instructions and cycles per iteration):

Test                        Time (ns)          INS          CYC 
---------------------------------------------------------------
MetadataTree                      536        2,205        1,010 
in-place MetadataTree               3            3            2


You can find the benchmark on my github. Here is the full output of perf:

$ perf stat -e cycles,instructions,cache-misses ./inplace_examples notinplace
notinplace3000000

 Performance counter stats for './inplace_examples notinplace':

     1 005 689 341 cycles                   
     2 209 217 853 instructions              #    2,20  insns per cycle        
            20 059 cache-misses                                                

       0,531293829 seconds time elapsed

$ perf stat -e cycles,instructions,cache-misses ./inplace_examples inplace
inplace3000000

 Performance counter stats for './inplace_examples inplace':

         2 540 791 cycles                   
         2 384 036 instructions              #    0,94  insns per cycle        
            10 970 cache-misses                                                

       0,001584907 seconds time elapsed
Blog Logo

David Gross


Published

Image

Thoughts from a Wall Street developer

A blog about C++, with an emphasis on low-latency, performance measurements and system programming.

Back to Overview