内容简介:TheI finally got some time to clean up my library based on aSignal-slots is a form of
The signals library achieves clean API design, standard conformity, the best possible runtime performance defeating virtual function, while also securing recursion- and modification-safety, all with just below 400 lines of code.
I finally got some time to clean up my library based on a previous research
. Despite having implemented observer patterns of all different flavors, every time I learned a bit more - a new pitfall in the use case, a new optimization opportunity that was overlooked, or trade-offs I haven’t previously thought of. Learning Boost::signals2 lets me rethink the usability design; Despite its (relatively) high latency, it does a great job hiding the differences between signal types into a plain connection
class, and thus the function signature is repeated to the bare minimum - exactly once in the callback body. I’ve also realized that a performant library does not necessarily need to be absolutely fastest (in the sense of overhead, not algorithmic complexity) in every interface it provides, because those not on the critical path don’t matter to the overall performance; They are the perfect opportunities to trade speed for easier use and flexibility.
Virtual Function Observer Pattern
Signal-slots is a form of observer pattern that dates back to GoF design patterns. If you have worked with any "event driven" codebase, the following code may look familiar:
void CStreamEngine::OnSystemEvent(ESystemEvent event, UINT_PTR wparam, UINT_PTR lparam) { switch (event) { case ESYSTEM_EVENT_GAME_POST_INIT_DONE: { // unpause the streaming engine, when init phase is done PauseStreaming(false, -1); break; } case ESYSTEM_EVENT_LEVEL_LOAD_PREPARE: ... 10 other cases ...
The well-known implementation of observer pattern stores a set (or alternatively a list or array) of objects, all inheriting from the same interface. This straightforward approach has many issues, but let me only focus on the performance aspect for now:
-
Bad iteration performance (hash table or list): with the typical
unordered_set
orlist
implementations, the cache misses would add up to the invocation overhead. - Bad removal performance (array): If we use arrays, iteration would be optimal but removal becomes the new problem. Every time the listener unregisters itself from the source, which often happens in a dynamic world, it has to linear-search and erase the object, which leads to worst-case O(N^2) runtime. Swapping the element to be removed with the end would help reduce the complexity, but the linear lookup remains a bottleneck.
-
Unnecessary calls. The observer pattern is quite verbose considering that it relies on the collaboration of many delicate code snippets: to actually call
OnSystemEvent()
, we would also needISystemEvent
,std::vector<ISystemEvent*>
, and typically alsoRegisterSystemEventListener()
,RemoveSystemEventListener()
andSendSystemEvent()
, while keeping in mind that the names shall match and not conflict with any other event. The whole pattern is so tedious that at some point, people realize they could as well embed a convenientenum ESystemEvent
to control the internal flow. If there are 20 commonly usedESystemEvent
s, instead of storing the objects into 20 different, much smaller lists, we would save them into a large list and call everyone for every event that ever happend in the game, even if they are only interested in one or two cases.
Fast Delegate, Revisited
Don Clugston’s Fast Delegate Library was probably the fastest approach of invoking an object’s member function bound at runtime, though the library involved compiler- and platform-specific hacks. Those were necessary because there was no standard data structure, or even a unified size to describe all possible kinds of member function pointers. The simple reason why it is the fastest possible way is that under the hood, the processor doesn’t care about the type but only the addresses of the function and its arguments, thus the fewer steps it needs to acquire the instructions and data, the better prefetching and the shorter delay. If the calling convention is consistent within the program being executed, there is no reason why casting the pointer back to the function wouldn’t work. Using all these compiler tricks, it allows storing a type-erased, “fat” pointer to the bound member function. Then a typical interface would take an object and a member method:
signal.connect(obj, &my_class::callback); //signal and callback have identical signatures
This is where my exploration started. After more than a decade, there must be a better, standard-conform solution to create a 0-overhead delegate with the help of all the modern C++ features. std::function
is nice, but is at least as slow as
virtual functions because of its virtual-based type erasure implementation. (As a side topic, I believe std::function doesn't have to be virtual-based, but I'm not aware of alternative implementation that focuses on reducing the overhead
except my implementation of CRYENGINE small function
)
Edit: I was corrected that
Gcc doesn’t use virtual functions in std::function (as I remember). There is a union of fn pointer + buffer.
Kudos to DerShokus!
Note that the signal needs to store a fat pointer to the actual member function, which might or might not conform to the same format. Actually, it’s not quite true that the bound member function is only known at runtime. The programmer’s intention is clearly known when the code was written - in most cases, the second argument &my_class::callback
is simply a constant. The issue here is the lack of enforced compile-time argument evaluation, even with the newest standard. Fortunately, C++17 has provided a new way: taking the function pointer as a non-type template parameter with template<auto>
.
But NTTP is nothing new? You might ask. Indeed, it has been there all the time and has always supported member function pointers (PMF), but PMFs that belong to distinct classes are of different types , which means they wouldn’t fit into the template parameter of a fixed type.
template<void(???::*)()> void connect(); // there was no way to define the type of this NTTP // we need to support all of the following: connect<&Foo::callback>(this); connect<&Bar::onclick>(this);
For a long time, the only workaround was asking the user to substitute the class type, so that the PMF can be defined based on the now-known class type:
//Bad. template<typename A...> struct signal<void(A...)> { template<class C, void(C::*ptr)(A...)> //2nd parameter is based on the 1st void connect(C*); //usage: sig.connect<my_class, &my_class::callback>(obj); };
Now with
C++17 template<auto>
we could instead write the following code -
//Better. template<typename A...> struct signal<void(A...)> { template<auto ptr, class C> void connect(C*); //usage: sig.connect<&my_class::callback>(obj); };
I’ve explained the details of this technique in a previous blog Partial Specializing Template Auto . The user would still deal with the cryptic PMF syntax at the interface, but that’s about as good as a member function callback can get without using a macro. I’ll explain further below how lambdas are able to significantly improve the usability without adding overhead like it does in std::function or other functor containers.
Taking the PMF by template, rather than function parameter, allows storing a generated static function or capture-less lambda that internally calls the PMF with the correct type. This technique is often used to store class constructors or destructors, whose addresses cannot be directly taken, or when the type information needs to be preserved while the difference at the interface level is erased. Now, instead of storing the PMF of unknown size, we could save the static / capture-less function pointer void(*)(void*, A...)
which has a predefined signature and a fixed size. No more undefined behaviors!
//PMF does not need to be captured because it’s a template argument func = [](void* obj, A ... args) {((*reinterpret_cast<C**>(obj))->*PMF)(args...); });
Another important advantage is optimization. The invocation (obj.*pmf)(args...)
only extracts the actual function at the time of calling, since it needs to support virtual functions. However, when pmf
is a compile-time constant, such as &my_class::callback
, the compiler would be able to fold it as if it was directly calling obj.callback(args...)
. It usually emits the optimal assembly that either inlines the function call, or contains a statically predictable
jump, which approximately reaches the performance of inlined-calls through instruction prefetching.
Lambdas
Clearly after C++11, any library dealing with functions would need to support lambda. In the context of signal-slots, lambdas (or pre-C++11 functors) are quite different from member functions:
this operator()
class game { /*...*/ }; fteng::signal<void(const game& instance)> game_created; class subsystem{ //Connects a signal with a lambda capturing 'this' fteng::connection on_game_created = game_created.connect([this](const game& instance){ std::cout << "Game is created.\n"; }); };
A lambda object capturing this
has exactly the size of a pointer, which fits perfectly into the delegate. The callback in the example above should run as fast as a member function. However, if the lambda is larger than the size of pointer, the object has to be stored separately. In contrast, the call operator can still be directly saved and thus benefits from triggering fewer cache misses.
Next week I'm going to talk about the design choices of containers, the RAII connection
object, move-semantic support and modification safety!
The code is available on Github and can be directly tested here .
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Scalability Rules
Martin L. Abbott、Michael T. Fisher / Addison-Wesley Professional / 2011-5-15 / USD 29.99
"Once again, Abbott and Fisher provide a book that I'll be giving to our engineers. It's an essential read for anyone dealing with scaling an online business." --Chris Lalonde, VP, Technical Operatio......一起来看看 《Scalability Rules》 这本书的介绍吧!
HTML 编码/解码
HTML 编码/解码
RGB CMYK 转换工具
RGB CMYK 互转工具