An Extraterrestrial Guide to C++20 Text Formatting

栏目: IT技术 · 发布时间: 4年前

内容简介:In C++20, we have a new and cool way to do text formatting. It’s more like Python style and combines C-StyleThis is a guest post from Victor Zverovich.

An Extraterrestrial Guide to C++20 Text Formatting

In C++20, we have a new and cool way to do text formatting. It’s more like Python style and combines C-Style printf and with modern C++ type-safety. In this guest post written by the author of the proposal - Victor Zverovich - you’ll learn how to use this new technique!

  • The C++20 formatting library
  • Custom formatting functions
  • Date and time formatting

This is a guest post from Victor Zverovich.

Victor is a software engineer at Facebook working on the Thrift RPC framework and the author of the popular {fmt} library , a subset of which is proposed into C++20 as a new formatting facility. He is passionate about open-source software, designing good APIs, and science fiction.You can find Victor online on Twitter , StackOverflow , and GitHub .

Victor originally wrote that blog post for Fluent C++ , but this one is heavily updated with the information about C++20.

Intro

(with apologies to Stanisław Lem)

An Extraterrestrial Guide to C++20 Text Formatting

Consider the following use case: you are developing the Enteropia[2]-first Sepulka[3]-as-a-Service (SaaS) platform and have server code written in C++ that checks the value of requested sepulka’s squishiness received over the wire and, if the value is invalid, logs it and returns an error to the client. Squishiness is passed as a single byte and you want to format it as a 2-digit hexadecimal integer, because that is, of course, the Ardrite[1] National Standards Institute (ANSI) standard representation of squishiness. You decide to try different formatting facilities provided by C++ and decide which one to use for logging.

First you try iostreams:

#include <cstdint>
#include <iomanip>
#include <iostream>

void log_error(std::ostream& log, std::uint_least8_t squishiness) {
  log << "Invalid squishiness: "
      << std::setfill('0') << std::setw(2) << std::hex
      << squishiness << "\n";
}

The code is a bit verbose, isn’t it? You also need to pull in an additional header, <iomanip> , to do even basic formatting. But that’s not a big deal.

However, when you try to test this code (inhabitants of Enteropia have an unusual tradition of testing their logging code), you find out that the code doesn’t do what you want. For example,

log_error(std::cout, 10);

prints

Invalid squishiness: 0

Which is surprising for two reasons: first it prints one character instead of two and second the printed value is wrong. After a bit of debugging, you figure out that iostreams treat the value as a character on your platform and that the extra newline in your log is not a coincidence. An even worse scenario is that it works on your system, but not on the one of your most beloved customers.

So you add a cast to fix this which makes the code even more verbose (see the code @Compiler Explorer )

log << "Invalid squishiness: "
    << std::setfill('0') << std::setw(2) << std::hex
    << static_cast<unsigned>(squishiness) << "\n";

Can the Ardrites do better than that?

Yes, they can.

Format strings

Surprisingly, the answer comes from the ancient 1960s (Gregorian calendar) Earth technology, format strings (in a way, this is similar to the story of coroutines). C++ had this technology all along in the form of the printf family of functions and later rediscovered in std::put_time .

What makes format strings so useful is expressiveness. With a very simple mini-language, you can easily express complex formatting requirements. To illustrate this, let’s rewrite the example above using printf :

#include <cstdint>
#include <cstdio>

void log_error(std::FILE* log, std::uint_least8_t squishiness) {
  std::fprintf(log, "Invalid squishiness: %02x\n", squishiness);
}

Isn’t it beautiful in its simplicity? Even if you’ve somehow never seen printf in your life, you can learn the syntax in no time. In contrast, can you always remember which iostreams manipulator to use? Is it std::fill or std::setfill ? Why std::setw and std::setprecision and not, say, std::setwidth or std::setp ?

A lesser-known advantage of printf is atomicity. A format string and arguments are passed to a formatting function in a single call which makes it easier to write them atomically without having an interleaved output in the case of writing from multiple threads.

In contrast, with iostreams, each argument and parts of the message are fed into formatting functions separately, which makes synchronization harder. This problem was only addressed in C++20 with the introduction of an additional layer of std::basic_osyncstream .

However, the C printf comes with its set of problems which iostreams addressed:

Safety:C varargs are inherently unsafe, and it is a user’s responsibility to make sure that the type of information is carefully encoded in the format strings. Some compilers issue a warning if the format specification doesn’t match argument types, but only for literal strings. Without extra care, this ability is often lost when wrapping printf in another API layer such as logging. Compilers can also lie to you in these warnings.

Extensibility:you cannot format objects of user-defined types with printf.

With the introduction of variadic templates and constexpr in C++11, it has become possible to combine the advantages of printf and iostreams. This has finally been done in C++20 formatting facility based a popular open-source formatting library called {fmt} .

The C++20 formatting library

Let’s implement the same logging example using C++20 std::format :

#include <cstdint>
#include <format>
#include <iostream>

void log_error(std::ostream& log, std::uint_least8_t squishiness) {
  log << std::format("Invalid squishiness: {:02x}\n", squishiness);
}

As you can see, the formatting code is similar to that of printf with a notable difference being {} used as delimiters instead of %. This allows us and the parser to find format specification boundaries easily and is particularly essential for more sophisticated formatting (e.g. formatting of date and time).

Unlike standard printf , std::format supports positional arguments i.e. referring to an argument by its index separated from format specifiers by the : character:

log << std::format("Invalid squishiness: {0:02x}\n", squishiness);

Positional arguments allow using the same argument multiple times.

Otherwise, the format syntax of std::format which is largely borrowed from Python is very similar to printf ’s. In this case format specifications are identical (02x) and have the same semantics, namely, format a 2-digit integer in hexadecimal with zero padding.

But because std::format is based on variadic templates instead of C varargs and is fully type-aware (and type-safe), it simplifies the syntax even further by getting rid of all the numerous printf specifiers that only exist to convey the type information. The printf example from earlier is in fact incorrect exhibiting an undefined behaviour. Strictly speaking, it should have been

#include <cinttypes> // for PRIxLEAST8
#include <cstdint>
#include <cstdio>

void log_error(std::FILE* log, std::uint_least8_t squishiness) {
  std::fprintf(log, "Invalid squishiness: %02" PRIxLEAST8 "\n",
               squishiness);
}

Which doesn’t look as appealing. More importantly, the use of macros is considered inappropriate in a civilized Ardrite society.

Here is a (possibly incomplete) list of specifiers made obsolete: hh, h, l, ll, L, z, j, t, I, I32, I64, q, as well as a zoo of 84 macros:

Prefix intx_t int_leastx_t int_fastx_t intmax_t intptr_t
d PRIdx PRIdLEASTx PRIdFASTx PRIdMAX PRIdPTR
i PRIix PRIiLEASTx PRIiFASTx PRIiMAX PRIiPTR
u PRIux PRIuLEASTx PRIuFASTx PRIuMAX PRIuPTR
o PRIox PRIoLEASTx PRIoFASTx PRIoMAX PRIoPTR
x PRIxx PRIxLEASTx PRIxFASTx PRIxMAX PRIxPTR
X PRIXx PRIXLEASTx PRIXFASTx PRIXMAX PRIXPTR

In fact, even x in the std::format example is not an integer type specifier, but a hexadecimal format specifier, because the information that the argument is an integer is preserved. This allows omitting all format specifiers altogether to get the default (decimal for integers) formatting:

log << std::format("Invalid squishiness: {}\n", squishiness);

User-defined types

Following a popular trend in the Ardrite software development community, you decide to switch all your code from std::uint_least8_t to something stronger-typed and introduced the squishiness type:

enum class squishiness : std::uint_least8_t {};

Also, you decide that you always want to use ANSI-standard formatting of squishiness which will hopefully allow you to hide all the ugliness in operator<< :

std::ostream& operator<<(std::ostream& os, squishiness s) {
  return os << std::setfill('0') << std::setw(2) << std::hex
            << static_cast<unsigned>(s);
}

Now your logging function looks much simpler:

void log_error(std::ostream& log, squishiness s) {
  log << "Invalid squishiness: " << s << "\n";
}

Then you decide to add another important piece of information, sepulka security number (SSN) to the log, although you are afraid it might not pass the review because of privacy concerns:

void log_error(std::ostream& log, squishiness s, unsigned ssn) {
  log << "Invalid squishiness: " << s << ", ssn=" << ssn << "\n";
}

To your surprise, SSN values in the log are wrong, for example:

log_error(std::cout, squishiness(0x42), 12345);

gives

Invalid squishiness: 42, ssn=3039

After another debugging session, you realize that the std::hex flag is sticky, and SSN ends up being formatted in hexadecimal. So you have to change your overloaded operator<< to

std::ostream& operator<<(std::ostream& os, squishiness s) {
  std::ios_base::fmtflags f(os.flags());
  os << std::setfill('0') << std::setw(2) << std::hex
     << static_cast<unsigned>(s);
  os.flags(f);
  return os;
}

A pretty complicated piece of code just to print out an SSN in decimal format.

std::format follows a more functional approach and doesn’t share formatting state between the calls. This makes reasoning about formatting easier and brings performance benefits because you don’t need to save/check/restore state all the time.

To make squishiness objects formattable you just need to specialize the formatter template and you can reuse existing formatters:

#include <format>
#include <ostream>

template <>
struct std::formatter<squishiness> : std::formatter<unsigned> {
  auto format(squishiness s, format_context& ctx) {
    return format_to(ctx.out(), "{:02x}", static_cast<unsigned>(s));
  }
};

void log_error(std::ostream& log, squishiness s, unsigned ssn) {
  log << std::format("Invalid squishiness: {}, ssn={}\n", s, ssn);
}

You can see the message "Invalid squishiness: {}, ssn={}\n" as a whole, not interleaved with << , which is more readable and less error-prone.

Custom formatting functions

Now you decide that you don’t want to log everything to a stream but use your system’s logging API instead. All your servers run the popular on Enteropia GNU/systemd operating system where GNU stands for GNU’s, not Ubuntu, so you implement logging via its journal API. Unfortunately, the journal API is very user-unfriendly and unsafe. So you end up wrapping it in a type-safe layer and making it more generic:

#include <systemd/sd-journal.h>
#include <format> // no need for <ostream> anymore

void vlog_error(std::string_view format_str, std::format_args args) {
  sd_journal_send("MESSAGE=%s", std::vformat(format_str, args).c_str(),
                  "PRIORITY=%i", LOG_ERR, nullptr);
}

template <typename... Args>
inline void log_error(std::string_view format_str,
                      const Args&... args) {
  vlog_error(format_str, std::make_format_args(args...));
}

Now you can use log_error as any other formatting function and it will log to the system journal:

log_error("Invalid squishiness: {}, ssn={}\n",
          squishiness(0x42), 12345);

The reason why we don’t directly call sd_journal_send in log_error , but rather have the intermediary vlog_error is to make vlog_error a normal function rather than a template and avoiding instantiations for all the combinations of argument types passed to it. This dramatically reduces binary code size. log_error is a template but because it is inlined and doesn’t do anything other than capturing the arguments, it doesn’t add much to the code size either.

The std::vformat function performs the actual formatting and returns the result as a string which you then pass to sd_journal_send . You can avoid string construction with std::vformat_to which writes to an output iterator, but this code is not performance critical, so you decide to leave it as is.

Date and time formatting

Finally you decide to log how long a request took and find out that std::format makes it super easy too:

void log_request_duration(std::ostream& log,
                                std::chrono::milliseconds ms) {
  log << std::format("Processed request in {}.", ms);
}

This writes both the duration and its time units, for example:

Processed request in 42ms.

std::forma supports formatting not just durations but all chrono date and time types via expressive format specifications based on strftime , for example:

std::format("Logged at {:%F %T} UTC.",
            std::chrono::system_clock::now());

Beyond std::format

In the process of developing your SaaS system you’ve learnt about the features of C++20 std::format , namely format strings, positional arguments, date and time formatting, extensibility for user-defined types as well as different output targets and statelessness, and how they compare to the earlier formatting facilities.

Note to Earthlings: your standard libraries may not yet implement C++20 std::format but don’t panic: all of these features and much more are available in the open-source {fmt} library} . Some additional features include:

  • formatted I/O
  • high-performance floating-point formatting
  • compile-time format string checks
  • better Unicode support
  • text colors and styles
  • named arguments

All of the examples will work in {fmt} with minimal changes, mostly replacing std::format with fmt::format and <format> with <fmt/core.h> or other relevant include.

Glossary

  • [1] Ardrites – intelligent beings, polydiaphanohedral, nonbisymmetrical and pelissobrachial, belonging to the genus Siliconoidea, order Polytheria, class Luminifera.
  • [2] Enteropia – 6th planet of a double (red and blue) star in the Calf constellation
  • [3] Sepulka – pl: sepulki, a prominent element of the civilization of Ardrites from the planet of Enteropia; see “Sepulkaria”
  • [4] Sepulkaria – sing: sepulkarium, establishments used for sepuling; see “Sepuling”
  • [5] Sepuling – an activity of Ardrites from the planet of Enteropia; see “Sepulka”

The picture and references come from the book "Star Diaries" by Stanislaw Lem.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

算法设计与分析

算法设计与分析

王红梅 / 清华大学 / 2006-7 / 23.00元

《算法设计与分析》(普通高校本科计算机专业特色教材精选)将计算机经典问题和算法设计技术很好地结合起来,系统地介绍了算法设计技术及其在经典问题中的应用。全书共12章,第1章介绍了算法的基本概念和算法分析方法,第2章从算法的观点介绍了NP完全理论,第3章~~第11章分别介绍了蛮力法、分治法、减治法、动态规划法、贪心法、回溯法、分支限界法、概率算法和近似算法等算法设计技术,第12章基于图灵机计算模型介绍......一起来看看 《算法设计与分析》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试