QStringView Diaries: Zero-Allocation String Splitting

栏目: IT技术 · 发布时间: 4年前

内容简介:After four months of intensive development work, I am happy to announce that theWhile the version in Qt will be Qt 6-only, KDAB will release this tool for Qt 5 as part of itsThis is a good time to recapitulate what QStringTokenizer is all about.

After four months of intensive development work, I am happy to announce that the first QStringTokenizer commits have landed in what will eventually become Qt 6.0. The docs should show up, soon.

While the version in Qt will be Qt 6-only, KDAB will release this tool for Qt 5 as part of its KDToolBox productivity suite . Yes, that means the code doesn’t require C++17 and works perfectly fine in pure C++11.

This is a good time to recapitulate what QStringTokenizer is all about.

QStringTokenizer: The Zero-Allocation String Splitter

Three years ago, when QStringView was first merged for Qt 5.10 , I already wrote that we wouldn’t want to have a method like QString::split() on QStringView. QStringView is all about zero memory allocations, and split() returns an owning container of parts, say QVector , allocating memory.

So how do you return the result of string splitting, if not in a container? You take a cue from C++20’s std::ranges and implement a Lazy Sequence . A Lazy Sequence is like a container, except that it’s elements aren’t stored in memory, but calculated on the fly. That, in C++20 coroutine terms, is called a Generator .

So, QStringTokenizer is a Generator of tokens, and, apart from its inputs, holds only constant memory.

Here’s the example from 2017, now in executable form:

const QString s = ~~~;
for (QStringView line : QStringTokenizer{s, u'\n'})
    use(line);

Except, we beefed it up some:

const std::u16string s = ~~~;
for (QStringView line : QStringTokenizer{s, u'\n'})
    use(line);

Oh, and this also works now:

const QLatin1String s = ~~~;
for (QLatin1String line : QStringTokenizer{s, u'\n'})
    use(line);

QStringTokenizer: The Universal String Splitter

When I initially conceived QStringTokenizer in 2017, I thought it would just work on QStringView and that’d be it. But the last example clearly shows that it also supports splitting QLatin1String . How is that possible?

This is where C++17 comes in, on which Qt 6.0 will depend. C++17 brought us Class Template Argument Deduction (CTAD) :

std::mutex m;
std::unique_lock lock(m); // not "std::unique_lock<std::mutex> lock(m);"

And that’s what we used in the examples above. In reality, QStringTokenizer is a template, but the template arguments are deduced for you.

So, this is how QStringTokenizer splits QStrings as well as QLatin1Strings: in the first case, it’s QStringTokenizer<QStringView, QChar>, in the second, QStringTokenizer<QLatin1String, QChar>. But be warned: you should never, ever, explicitly specify the template arguments yourself, as you will likely get it wrong, because they’re subtle and non-intuitive. Just let the compiler do its job. Or, if you can’t rely on C++17, yet, you can use the factory function qTokenize():

const QLatin1String s = ~~~;
for (QLatin1String line : qTokenize(s, u'\n'))
    use(line);

QStringTokenizer: The Safe String Splitter

One thing I definitely wanted to avoid is dangling references a la QStringBuilder :

auto expr = QString::number(42) % " is the answer"; // decltype(expr) is QStringBuilder<~~~, ~~~>
QString s = expr; // oops, accessing the temporary return value of QString::number(), since deleted

The following must work:

for (QStringView line : QStringTokenizer{widget->text(), u'\n'})
    use(line);

But since the ranged for loop there is equivalent to

{
   auto&& __range = QStringTokenizer{widget->text(), u'\n'};
   auto __b = __range.begin(); // I know, this is not the full truth
   auto __e = __range.end();   // it's what happens for QStringTokenizer, though!
   for ( ; __b != __e; ++__b) {
     QStringView line = *__b;
     use(line);
   }
}

if QStringTokenizer simply operated on QStringView or QLatin1String, the following would happen: The __range variable keeps the QStringTokenizer object alive throughout the for loop (ok!), but the temporary returned from widget->text() would have been destroyed in line 3, even before we enter the for loop (oops).

This is not desirable, but what can we do against it? The solution is as simple as it is complex: detect temporaries and store them inside the tokenizer.

Yes, you heard that right: if you pass a temporary (“rvalue”) owning container to QStringTokenizer, the object will contain a copy (moved from the argument if possible) to extend the string’s lifetime to that of the QStringTokenizer itself.

Future

Now that we have developed the technique, we very strongly expect it to be used in Qt 6.0 for QStringBuilder, too.

By Qt 6.0, we expect QStringTokenizer to also handle the then-available QUtf8StringView as haystack and needle, as well as QRegularExpression and std::boyer_moore_searcher and std::boyer_moore_horspool_searcher as needles. We might also re-implement it as a C++20 coroutine on compilers that support them, depending on how much more performance we’ll get out of it.

Conclusion

QStringTokenizer splits strings, with zero memory allocations, universally, and safely. Get it for free right now from KDToolBox , and you can future-proof your code with an eye towards Qt 6.

About KDAB

If you like this blog and want to read similar articles, consider subscribing viaour RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and developmentservices andtraining in Qt, C++ and 3D/OpenGL.Contact us.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

文明之光(第四册)

文明之光(第四册)

吴军 / 人民邮电出版社 / 2017-3 / 69.00元

计算机科学家吴军博士继创作《浪潮之巅》、《数学之美》之后,将视角拉回到人类文明史,以他独具的观点从对人类文明产生了重大影响却在过去被忽略的历史故事里,选择了有意思的几十个片段特写,有机地展现了一幅人类文明发展的画卷。《文明之光》系列创作历经整整四年,本书为其第四卷。 作者所选的创作素材来自于十几年来在世界各地的所见所闻,对其内容都有着深刻的体会和认识。《文明之光》系列第四册每个章节依然相对独......一起来看看 《文明之光(第四册)》 这本书的介绍吧!

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换