内容简介:I tend to think about invariants and preconditions a lot. Pretty much every class has invariants, and most functions have preconditions. I don't think they are complicated concepts, but somehow they seem to confuse people anyway, so I decided it was time t
I tend to think about invariants and preconditions a lot. Pretty much every class has invariants, and most functions have preconditions. I don't think they are complicated concepts, but somehow they seem to confuse people anyway, so I decided it was time to write down some of my thoughts.
Invariants
A class invariant is something that is always true for every instance of that class. Users of objects of that class never see an object for which the invariant is not true; it is true for all states of all objects of that class at all times. This is what makes it invariant.
In order to perform the required operations, class member functions may temporarily break invariants, and then restore them afterwards. Unless you allow concurrent invocations of member functions on the same object, or deliberately pass the object to another function from inside a member function, this is OK, as the outside world will not be able to interact with the object when it is in such a state. It does mean you need to be especially careful when calling other functions from inside a member function, to ensure that the invariants still hold.
Invariants are important. If you writing a function that operates on an object, the only thing that is guaranteed to be true about it is that the class invariants hold, unless you choose to impose additional preconditions.
Preconditions
Preconditions are those that must
be true when a function is called. Every
operation on a class has the implicit precondition that the class invariants
hold, but operations can add additional preconditions. For example, you always
call v.empty()
to check if a vector is empty, but you only call v.front()
to
retrieve the first element if the vector is not empty. v[i]
and v.at(i)
differ only in their preconditions; v[i]
requires the vector to have more than i
elements, whereas v.at(i)
is specified to work for any vector, throwing if i
is out of range.
Some people like to write classes with two-stage construction — first you
default-construct the object, then you call x.init()
or similar to "finish
off" the initialization, and you can't call any other function on that object
until after the initialization is complete. Though I don't like this pattern it
is valid code; every member function has a precondition that x.init()
has been
called.
Some people like to say that such a class can have "invariants" that do not hold until after initialization is complete. I think this is a misuse of the term; invariants must always hold for every object of a class, outside the internals of its member functions. As I just stated above, what such a class really has is a precondition on every function, not an invariant.
If I write a function
void do_stuff(X& x){ // my code here }
then I can rely on the invariants of X
holding, but cannot rely on anything
else unless I add preconditions to do_stuff
.
Likewise, it should not be possible to do anything to an object that breaks its invariants. If you can then either you gave a bug, or they are not really invariants, just preconditions for most operations.
Some people like to write classes that have operations to transfer the internals from one object to another, leaving the source in a special "emptier than empty" state. This can be good for efficiency, especially when doing otherwise would require allocating resources to return the source to the plain "empty" state, and it is likely that the source will be destroyed immediately anyway.
Again, this is a perfectly reasonable thing to do, particularly when the resources are expensive to allocate and performance of the code is important. The problem comes when people then want to say that the class invariants don't hold for this state. Just as with the pre-initialization state above, I think this is a misuse of the term; they are not invariants, it is just that most operations have a precondition that the object is not "emptier than empty".
Move semantics
Since C++11, C++ has had the concept of "moving" objects, transferring resources
from the source to the destination. This is an important facility for
expressiveness of code and efficiency. Objects that own resources such as std::unique_ptr
, std::thread
or std::fstream
can now be moved
,
transferring ownership of the resource from the source to the destination. This
allows such objects to be put in containers, and transferred between scopes,
which can greatly simplify code. Likewise, containers like std::vector
can be
moved, transferring the entire set of contained objects from one vector to
another without reallocation. These facilities are of great importance for
writing clean, efficient code.
One thing all these operations have in common is that after a move, the class invariants still hold for the source object. This should go without saying: the source is still an object of its type, so of course the invariants hold.
The issue is with things like std::list
, where some implementations allocate a
sentinel node for the head/tail of the list in the default constructor, which is
thus transferred during move operations, so the move constructor has to allocate
a new sentinel node for the source, in order to ensure the class invariants
still hold. A similar thing occurs with anything that uses the pimpl idiom: if
the implementation pointer is transferred then either the class invariants must
allow for there to be a null implementation pointer, or a new implementation
object must be allocated.
Some people therefore argue that move operations should allow the source to be left with broken invariants, as a hollow shell of an object, in an "emptier than empty" state. I think this is misuse of the term "invariants" . By all means, steal the internals and leave the source with a null pointer to the internals, or whatever. However, you now need to change the invariants of your class to allow this, or they are not invariants , because they would no longer apply to all objects of that class. There is no need to update any of the functions on your class to handle this new state, but if you do not do so, then any functions that don't work for objects in this "emptier than empty" state now have an additional precondition that the object is not in that state.
This is all perfectly fine, objects with such states are plentiful, and have
legitimate reasons for existing, but it is important to accept that their
functions now have preconditions. My do_stuff
function above can only call
member functions that are safe to use in such a state unless it too has a
precondition that the object x
is not in such a state.
From a user perspective, it would be nice to be able to query such a state, so I
could know what operations are permitted. For example, std::future
provides a
member function valid
so you can call f.valid()
before trying to do anything
with it, and std::thread
provides the joinable
member function, which can be
used to see if the object holds a thread handle. My do_stuff
function can then
call x.is_emptier_than_empty()
in order to check for the special state and
take appropriate action. That said, this is a "nice to have": the absence of
such a function doesn't mean that the state can't exist, just that it's
potentially harder to deal with.
Interaction with other code
If you pass an object to a function or class, then you need to know what that
function requires of your object and ensure that those expectations are met. If
the function expects your object to be a container with at least one element and
you pass an empty container, then you've broken the expectations and your code
has a bug. Likewise, if the function expects to be able to call x.foo()
on
your object, but foo cannot be called on an "emptier than empty" object, and you
passed such an object to the function, your code has a bug.
The difficulty comes when such a state arises as a consequence of other
actions. If x.foo()
can leave x
in an "emptier than empty" state, then do_stuff
had better be prepared to handle the scenario, otherwise there is a
bug somewhere. Where the bug is depends on the documentation: if do_stuff
requires that you can always perform certain actions on the object it is passed
as a parameter, and those actions require that the object is not "emptier than
empty", then the bug is in the caller, or maybe the implementation of class X
. If there is no such documentation, the bug is in do_stuff
.
Note that requiring that x.foo()
can only be called with objects that are not
"emptier than empty" is a precondition. It should be perfectly acceptable for do_stuff
to call any functions on x
that do not have preconditions, even if x
ends up in an "emptier than empty" state.
This is the exactly the same scenario we get with other code. For example if you
pass a vector to a function that requires that the vector isn't empty, the
function can erase the last element without a problem. If it wishes to erase the
last element again
, thus removing two elements in total, then it needs to
check for and handle the case that the first erasure left the vector
empty. Calling v.clear()
or v.empty()
or v.push_back(y)
would be fine
without checking, as those functions have no preconditions.
If x.foo()
is instead spelled y=std::move(x)
, then nothing changes: it is
perfectly fine for x
to end up in an "emptier than empty" state if do_stuff
knows how to handle such a state, or doesn't care, because it doesn't touch x
again.
One option is for do_stuff
to say that after a move-assignment like this, it
can't rely on x
having any particular value, but it is
still an instance of
class X
, so its invariants must hold, and therefore any operation without a
precondition can be used. It can therefore call x.is_emptier_than_empty()
and
behave appropriately.
The other option is for do_stuff
to place stronger
requirements on x
, such
as requiring that it does not end up "emptier than empty" after a move, or even
requiring that it has a specific state.
Valid but unspecified states
The standard library has chosen option 1: after move, objects must be valid - i.e. still actually be objects of the expected type, and not destroyed - but they have an unspecified state , so the function cannot rely on any specific value, and can therefore only perform operations without preconditions, until it has verified that the preconditions for other operations hold.
This holds both for standard library types such as std::string
or std::vector<int>
, but also for user defined types that you use with the
standard library. If you write
std::string s="hello"; std::string s2=std::move(s);
then s
was the source of a move operation, and thus is in a valid, but
unspecified state. For implementations that use the Small String Optimization,
such that short strings are stored entirely within the string object itself,
without requiring dynamic allocation, this might mean that s
still
has the
value hello
after the move, because then it doesn't have to store an empty
string value in s
as well as copying the contents to s2
. It is also possible
that the implementation might clear s
, for consistency with longer
strings. Both are valid choices, as in either case s
is still a std::string
object, and the exact details of the state are unspecified.
Likewise, if you write
std::vector<MyWidget> v=create_some_widgets(); v.insert(v.begin(),MyWidget());
then that call to v.insert
is going to have to move the widgets around in
order to make room at the beginning for the extra one. The library requires that
moving widgets leaves them in a valid, but unspecified, state. In this case,
that means that having moved a widget from one position to another, it is OK to
move another widget on top of the first one, as that is the only operation the
vector will perform anyway. If you pass your object to a standard library
function that does something other than move things around (such as std::sort
or std::remove_if
), then you need to check that the other operations the
function might do can still be done on a moved-from object. By calling the
library function you are stating that your objects meet (and will continue to
meet) any preconditions you have imposed on the operations that the library
function specification says it might perform.
Invariants and Concurrency
Right back at the beginning of this article I stated that "Users of objects of that class never see an object for which the invariant is not true", but also that "class member functions may temporarily break invariants, and then restore them afterwards". These two things don't work together very well if an object can be accessed concurrently from multiple threads, and this is one aspect that makes writing concurrent code hard, especially if you try to use fine-grained locks or atomic operations.
Consider a simple class Names
which holds two vectors, one for first names and
one for last names:
class Names{ std::vector<std::string> firstNames,lastNames; };
I want the invariant
of this class to be that the number of elements in firstNames
and lastNames
is always the same, so that I can add operations to
this class knowing that is always true. There is a strong argument that this
ought to be a std::vector<std::pair<std::string,std::string>>
rather than two
vectors, but assume that the class author has a legitimate reason for the
separate vectors.
At first glance, a member function to add an entry is quite simple:
void Names::addEntry(std::string const& first,std::string const& last){ firstNames.push_back(first); lastNames.push_back(last); }
However, even for the single-threaded case, this isn't good: if lastNames.push_back(last)
throws an exception, then our invariant is broken,
as we successfully added an element to firstNames
but not to lastNames
. We
therefore need to handle that case:
void Names::addEntry(std::string const& first,std::string const& last){ firstNames.push_back(first); try{ lastNames.push_back(last); } catch(...){ firstNames.resize(lastNames.size()); throw; } }
Now, if lastNames.push_back(last)
throws, then std::vector
guarantees that lastNames
is unchanged, so we can ensure our invariant holds again by
shrinking firstNames
back to the same size, and the single-threaded case is
now sorted. If our Names
object is only ever accessed by a single thread, then
the invariant holds at all times outside the internals of a member function.
What about the concurrent case? If we call Names::addEntry
from multiple
threads, then everything is broken: std::vector
is not safe for concurrent
access from multiple threads, so we have data races and undefined
behaviour. Using a ThreadSafeVector
class instead which provides the
operations we need but is safe for concurrent access removes these data races
and the undefined behaviour, but doesn't fix the overall problem. Thread
safety is not composable
. In this case, the invariants are still broken during
the call to Names::addEntry
, so a concurrent call will see an object
with broken invariants: the thread safety of the two vectors doesn't matter if
the second thread can see firstNames
and lastNames
with different sizes.
We can fix the problem by using a mutex: the mutex lock prevents the second
thread from accessing the internals of the object until the first thread has
finished, so the second thread cannot see the state with a broken invariant. It
also allows us to revert back to std::vector
rather than ThreadSafeVector
since the individual vectors are only ever accessed by one thread at a time:
class Names{ std::mutex mutex; std::vector<std::string> firstNames,lastNames; public: void addEntry(std::string const& first,std::string const& last){ std::lock_guard guard(mutex); firstNames.push_back(first); try{ lastNames.push_back(last); } catch(...){ firstNames.resize(lastNames.size()); throw; } } };
This is one of the big reasons why lock-free data structures are so hard to write: we cannot just stick a mutex lock on the outside to ensure other threads never see broken invariants. Instead, we must ensure that the invariants hold at all times, even during the operation of member functions . For this reason, lock-free data structures are often designed so that as much as possible is done off to the side, without modifying the core data structures, and then either the entire change is committed with a single atomic operation, or care is taken to ensure that the invariants still hold after each incremental step.
End note
Invariants are important . They are vital to working with objects, and without them it is much harder to write correct code. However, you have to choose your invariants carefully: invariants that make the rest of your code easier to reason about can have a cost, so you have to be aware of the trade-offs. Like everything in programming it is up to you what trade-off you choose: simplicitly vs performance is a common choice, but it can also be a choice of which operations are fast and which are slow. Whatever you choose, there will be cases where the other choice was more appropriate.
Whatever you choose, you must be honest with yourself, and the users of your class. Don't claim something is an invariant just because it holds most of the time; it must hold all of the time. Obviously, you can make it so a precondition of a function that objects it operates on are only in certain states, and then write your program logic to ensure that the conditions are met, but don't confuse the two.
以上所述就是小编给大家介绍的《Invariants and Preconditions》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
莱昂氏UNIX源代码分析
(澳)John Lions / 尤晋元 / 机械工业出版社 / 2000-7-1 / 49.00
本书由上、下两篇组成。上篇为UNIX版本6的源代码,下篇是莱昂先生对UNIX操作系统版本6源代码的详细分析。本书语言简洁、透彻,曾作为未公开出版物广泛流传了二十多年,是一部杰出经典之作。本书适合UNIX操作系统编程人员、大专院校师生学习参考使用。一起来看看 《莱昂氏UNIX源代码分析》 这本书的介绍吧!