Generic<Programming>: Move Constructors
Andrei Alexandrescu
As I'm sure you know very well, creating, copying around, and destroying temporary objects is the favorite indoor sport of your C++ compiler. Temporaries get created at the drop of a hat, and unfortunately that considerably impacts the performance of C++ programs. Indeed, temporaries are often credited as one of the top sources of inefficiency in C++ programs [1].
It would be nice if you could write:
vector<string> ReadFile(); vector<string> vec = ReadFile();
or:
string s1, s2, s3; ... s1 = s2 + s3;
Alas, if you care about efficiency, you need to refrain from using such code. The temporaries created by ReadFile() and by operator+, respectively, will be copied to the destination and then thrown away ... what a waste!
To solve this, you need to rely on less elegant conventions. For example, you might pass the destination as an argument to the function:
void ReadFile(vector<string>& dest); vector<string> dest; ReadFile(dest);
Quite annoying. Worse, operators don't give you that option, so if you want to deal with large objects efficiently, you have to limit yourself to operators that don't create temporaries:
string s1, s2, s3; ... s1 = s2; s1 += s3;
These awkward idioms often creep up within large teams working on large programs, imposing a continuous annoyance that curbs the joy of writing code and adds to the total number of lines. Wouldn't it be nice if we could return values from functions, use operators, and pass around temporaries, in confidence that time is not wasted on this create/copy/destroy spree?
Wouldn't it?
Actually it's not about "would be nice" only. There is huge community pressure to solve the unnecessary copying problem. The interest in the subject is staggering. A formal proposal for a language-based solution has already been submitted to the standardization committee [2]. Discussions are ravaging the Usenet, and the article you're now reading has been intensely reviewed.
This article shows you how you can address the problem of unnecessary copying in existing C++. There is no 100% satisfactory solution, but a remarkable degree of purity can be achieved. We will build together, step by step, a powerful framework that helps eliminate unnecessary copying of temporaries from your programs. The solution is not 100% transparent, but it does eliminate all unnecessary copies, and is encapsulated enough to serve as a reliable placeholder until years from now when a cleaner, language-based, approach will hopefully be standardized and implemented.
Temporaries and the "Move Constructor"
After fighting temporaries for a while, people have realized that eliminating the actual temporaries is not really the point in most cases. Most of the time, the point is to eliminate unnecessary copying of temporaries. Allow me to detail this issue a bit.
Most "expensive-to-copy" data structures store their data in the form of pointers or handles. Typical examples are: a String type that holds a size and a char*, a Matrix type that holds a couple of integral dimensions and a double*, or a File type that stores a handle.
As you see, the cost in copying String, Matrix, or File does not come from copying the actual data members; it comes from duplicating the data referred to by the pointer or the handle.
So given that our purpose is to eliminate copying, a good way of doing that is to detect a temporary. In cynical terms given that an object is a goner anyway, we might just as well use it as an organ donor while it's still fresh.
But what's a temporary, by the way? We'll come up with a heretic definition:
An object is considered temporary in a context if and only if the only operation executed on that object upon exiting that context is the destructor.
The context can be an expression or a lexical scope (such as a function body).
The C++ Standard doesn't define temporaries, but it assumes that they are anonymous temporaries (such as the values returned by functions). By our (more general) definition, named stack-allocated variables defined inside a function are temporary all right. We'll later use this insight to our advantage.
Consider this run-of-the-mill implementation of a String class:
class String { char* data_; size_t length_; public: ~String() { delete[] data_; } String(const String& rhs) : data_(new char[rhs.length_]), length_(rhs.length_) { std::copy(rhs.data_, rhs.data_ + length_, data_); } String& operator=(const String&); ... };
Here the cost of copying consists largely of duplicating data_, that is, allocating new memory and copying it. It would be so nice if we could detect that rhs is actually a temporary. Consider the following pseudo-C++ code:
class String { ... as before ... String(temporary String& rhs) : data_(rhs.data_), length_(rhs.length_) { // reset the source string so it can be destroyed // don't forget that destructors are still executed // on temporaries rhs.data_ = 0; } .... };
The fictitious overloaded constructor String(temporary String&) would enter in action when you create a String from an object that's temporary by our definition (such as the one returned by a function call). Then, the constructor performs a move of rhs to the object under construction, by simply copying the pointer (without duplicating the memory chunk it points to). Last but not least, the move constructor resets the source pointer rhs.data_. This way, when the temporary is destroyed, delete[] will be harmlessly applied to a null pointer.
An important detail is that rhs.length_ is not set to zero after the move construction. This is incorrect from a pedantic viewpoint (we have a broken String with data == 0 and length_ != 0), but is a good pretext to make a point. The state in which rhs must be left must not be consistent, just merely destroyable. This is because the one and only operation that will ever be applied to rhs is the destructor nothing else. So as long as rhs is safely destroyable, it does not have to look like a valid string at all.
Move construction is a good solution for eliminating unnecessary copying of temporaries. We only have one little problem there's no temporary keyword within the C++ language.
(It should be noted that detection of temporaries doesn't help all classes. Sometimes, all data is stored straight in the container. Consider:
class FixedMatrix { double data_[256][256]; public: ... operations ... };
For such a class, actually copying the sheer sizeof(FixedMatrix) bytes is the costly operation, and detecting temporary objects doesn't help.)
Past Solutions
Unnecessary copying is a long-standing problem within the C++ community. There are two lines of attack, one from a coding/library writing direction, and the other from a language definition/compiler writing direction.
From a language/compiler perspective, we have the "return value optimization," in short, RVO. RVO is expressly allowed by the C++ language definition [3]. Basically, of all functions you could ever write, your C++ compiler can assume it knows what one of them does. That function is the copy constructor, and the compiler assumes that the copy constructor copies.
Exactly because it assumes that, the compiler can eliminate unnecessary copies. For example, consider:
vector<string> ReadFile() { vector<string> result; ... fill result ... return result; } vector<string> vec = ReadFile();
A smart compiler can pass vec's address as a hidden argument to ReadFile and create result at exactly that address. So the code generated from the source above looks like this:
void ReadFile(void* __dest) { // use placement new to build a vector // at the address dest vector<string>& result = *new(__dest) vector<string>; ... fill result ... } // assume proper alignment char __buf[sizeof(vector<string>)]; ReadFile(__buf); vector<string>& vec = *reinterpret_cast<vector<string>*>(__buf);
RVO has a couple of different flavors, but the gist is the same: the compiler eliminates a call to the copy constructor by simply constructing the function return value in the final destination.
Unfortunately, implementing RVO is not as easy as it might seem. Consider a slightly modified version of ReadFile:
vector<string> ReadFile() { if (error) return vector<string>(); if (anotherError) { vector<string> dumb; dumb.push_back("This file is in error."); return dumb; } vector<string> result; ... fill result ... return result; }
Now there's not one local variable that needs to be mapped to the final result, there are several. Some are named (dumb, result) and some are unnamed temporaries. Needless to say, confronted with such a situation, many optimizers would give up and rely on the conservative and less efficient approach.
Even if you want to write "straight" code that would not confuse RVO implementations, you'll be disappointed to hear that each compiler, and often each compiler version, has its own rules for detecting and applying RVO. Some apply RVO only to functions returning unnamed temporaries (the simplest form of RVO). The more sophisticated ones also apply RVO when there's a named result that the function returns (the so-called Named RVO, or NRVO).
In essence, when writing code, you can count on RVO being portably applied to your code depending on how you exactly write the code (under a very fluid definition of "exactly"), the phase of the moon, and the size of your shoes.
But wait, there's less. Oftentimes the compiler cannot apply RVO even if it aches to. Consider this slightly changed call to ReadFile():
vector<string> vec; vec = ReadFile();
As innocent as this change seems, it makes a huge difference. Now instead of the copy constructor, we're calling the assignment operator, which is a different beast. Unless your compiler's optimization skills border on pure magic, now you can surely kiss your RVO goodbye: vector<T>::operator=(const vector<T>&) expects a const reference to a vector, so a temporary will be returned by ReadFile(), bound to the const reference, copied into vec, and thrown away. Unnecessary temporaries hit again!
From a coding perspective, a technique that has been recommended for a long time is COW (copy-on-write) [4], which is a technique based on reference counting.
COW has several advantages, one of which is that it detects and eliminates unnecessary copies. For example, when a function returns, the returned object has a reference count of 1. Then you copy it, which bumps its reference count to 2. Finally, you destroy the temporary, and the reference count goes back to 1, at which point the destination is the only owner of the data. No actual copy has been made.
Unfortunately, reference counting also has many drawbacks in terms of threading safety, incurring its own overhead, and many hidden gotchas [4]. COW is so unwieldy, that, in spite of its advantages, recent STL implementations don't use reference counting for std::string, in spite of the fact that std::string's interface was intently designed to support reference counting!
Several idioms for "non-duplicable" objects have been developed, of which auto_ptr is the most refined one. auto_ptr is easy to use correctly, but, unfortunately, just as easy to use incorrectly. The solution discussed in this article extends the techniques used in defining auto_ptr.
Mojo
Mojo (Move of Joint Objects) is a coding technique and a small framework for eliminating unnecessary copying of temporary objects. Mojo works by discriminating between temporary objects and legitimate, "non-temporary," objects.
Passing Arguments to Functions
An interesting analysis prompted by Mojo is a scrutiny of the conventions used for passing arguments to functions. The common advice of the pre-Mojo era goes:
- If the function intends to change the argument as a side effect, take it by reference/pointer to a non-const object. Example:
void Transmogrify(Widget& toChange); void Increment(int* pToBump);
- If the function doesn't modify its argument and the argument is of primitive type, take it by value. Example:
double Cube(double value);
- Otherwise, the argument is (or could be, in case you define a template) a user-defined type and must not be mutated, so take it by reference to const. Example:
String& String::operator=(const String& rhs); template<class T> vector<T>::push_back(const T&);
The third rule's intent is to avoid accidental copying of large objects. However, sometimes this very rule forces an unnecessary copy instead of preventing it! Consider you have a function like Connect below:
void Canonicalize(String& url); void ResolveRedirections(String& url); void Connect(const String& url) { String finalUrl = url; Canonicalize(finalUrl); ResolveRedirections(finalUrl); ... use finalUrl ... }
Connect takes a reference to const as an argument and, presto, creates a copy of it. Then it further processes the copy.
This function exhibits a const that stands in the way of efficiency. Connect's declaration says: "I don't need a copy; a reference to const would suffice" while the body actually does create a copy. So if you now say:
String MakeUrl(); ... Connect(MakeUrl());
then you can count on MakeUrl() returning a temporary, which will be copied and then destroyed: the dreaded unnecessary copy pattern. For a compiler to optimize away the copy, it has to do the Herculean job of (1) getting access to Connect's definition (hard with separately compiled modules), (2) parse Connect's definition to develop an understanding of it, and (3) alter Connect's behavior so that the temporary is fused with finalUrl.
Say now you change Connect as follows:
void Connect(String url) // notice call by value { Canonicalize(url); ResolveRedirections(url); ... use url ... }
From the viewpoint of Connect's callers, there is absolutely no difference: although you changed the syntactic interface, the semantic interface stays the same. To the compiler, this syntactic change makes all the difference in the world. Now the compiler has more leeway for taking care of the url temporary. For example, in the example above:
Connect(MakeUrl());
the compiler doesn't have to be really smart to fuse the temporary returned by MakeUrl with the temporary needed by Connect. Indeed, it would be harder work to do otherwise! Ultimately, the very result of MakeUrl will be altered and used inside Connect. The former version was choking the compiler, preventing it from performing any optimizations. This version smoothly cooperates with the compiler.
The downside of the new setting is that now calls to Connect might generate more machine code. Consider:
String someUrl = ...; Connect(someUrl);
In this case, the first version would simply pass a reference to someUrl. The second version would create a copy of someUrl, call Connect, and destroy that copy. This code-size overhead increases with the static number of calls to Connect. On the other hand, the calls involving a temporary such as Connect(MakeUrl()) can just as well generate less code in the second version. At any rate, it is unlikely that the size difference would create a problem.
So we identified a different set of recommendations:
3.1. If the function always makes a copy of its argument inside, take it by value.
3.2. If the function never makes a copy of its argument, take it by reference to const.
3.3. If the function sometimes makes a copy of its argument and if you care about efficiency, follow the Mojo protocol.
The only thing left to do is developing the "Mojo protocol," whatever that is.
The main idea is to overload the same function (such as Connect) with the intent of discriminating between temporary and non-temporary values. (The latter are also known as "lvalues" for historical reasons: colloquially, lvalues could appear on the left-hand side of an assignment.)
Now in starting to overload Connect, an idea would be to define Connect(const String&) to catch "genuine" constant objects. This, however, would be a mistake because this declaration will "eat" all String objects be they lvalues or temporaries. So the first good idea is to not declare a function that accepts a const reference, because it swallows all objects like a black hole.
A second try is to define Connect(String&) in an attempt to catch non-const lvalues. This works well, and in particular const values and unnamed temporaries can't be "eaten" by this overload a good start. Now we only have to differentiate between const objects and non-const temporaries.
To do this, the technique we apply is to define two "type sugar" classes ConstantString and TemporaryString, and to define conversion operators from String to those objects:
class String; // "type sugar" for constant Strings struct ConstantString { const String* obj_; }; // "type sugar" for temporary Strings // (explanation coming) struct TemporaryString : public ConstantString {}; class String { public: ... constructors, destructors, operations, you name it ... operator ConstantString() const { ConstantString result; result.obj_ = this; return result; } operator TemporaryString() { TemporaryString result; result.obj_ = this; return result; } };
So now String defines two conversion operators. One notable difference between them is that TemporaryString doesn't apply to const String objects.
Now say you define the following three overloads:
// binds to non-const temporaries void Connect(TemporaryString); // binds to all const objects (lvalues AND temporaries) void Connect(ConstantString); // binds to non-const lvalues void Connect(String& str) { // just forward to the other overload Connect(ConstantString(str)); }
Here's how it all works. Constant String objects are "attracted" by Connect(ConstantString). There is no other binding that could work; the other two work for non-const Strings only.
Temporary objects can't go to Connect(String&). They could, however, go to either Connect(TemporaryString) or Connect(ConstantString), and the former overload must be chosen unambiguously. That's the reason for deriving TemporaryString from ConstantString, a trick that deserves some attention.
Consider for a moment that ConstantString and TemporaryString were totally independent types. Then, when prompted to copy a temporary object, the compiler would be equally motivated to go either:
operator TemporaryY() > Y(TemporaryY)
or:
operator ConstantY() const > Y(ConstantY)
Why the equal motivation? This is because the non-const to const conversion is "frictionless" as far as selecting member functions is concerned.
The need, therefore, is to give the compiler more "motivation" to choose the first route than the second. That's where the inheritance kicks in. Now the compiler says: "Ok, I guess I could go through ConstantString or TemporaryString... but wait, the derived class TemporaryString is a better match!"
The rule in action here is that matching a derived class is considered better than matching a base class when selecting a function from an overloaded set.
Finally, an interesting twist the inheritance doesn't necessarily have to be public. Access rules are orthogonal onto overloading rules.
Let's see how Connect works on an example:
String s1("http://moderncppdesign.com"); // invoke Connect(String&) Connect(s1); // invoke operator TemporaryString() // followed by Connect(TemporaryString) Connect(String("http://moderncppdesign.com")); const String s4("http://moderncppdesign.com"); // invoke operator ConstantString() const // followed by Connect(ConstantString) Connect(s4);
As you see, we achieved the main goal we wanted: we can make a difference between temporary objects and all other objects. This is the gist of Mojo.
There are some less-than stellar aspects, most of which we'll set out to fix. First off, there's a minor code duplication: Connect(String&) and Connect(ConstantString) must basically do the same thing. The code above solves the issue by forwarding from the first overload to the second.
Second, let's face it, writing two little classes to give each type some Mojo doesn't sound very attractive, so let's start making things a little more generic so as to make them easier to use. We define a namespace mojo in which we put two generic constant and temporary classes:
namespace mojo { template <class T> class constant // type sugar for constants { const T* data_; public: explicit constant(const T& obj) : data_(&obj) { } const T& get() const { return *data_; } }; </p> template <class T> // type sugar for temporaries class temporary : private constant<T> { public: explicit temporary(T& obj) : constant<T>( obj) { } T& get() const { return const_cast<T&>(constant<T>::get()); } }; }
Let's also define a base class mojo::enabled that defines the two operators:
template <class T> struct enabled // inside mojo { operator temporary<T>() { return temporary<T>(static_cast<T&>(*this)); } operator constant<T>() const { return constant<T>(static_cast<const T&>(*this)); } protected: enabled() {} // intended to be derived from ~enabled() {} // intended to be derived from };
With this scaffolding in place, the task of "mojoing" a class becomes considerably simpler:
class String : public mojo::enabled<String> { ... constructors, destructors, operations, you name it ... public: String(mojo::temporary<String> tmp) { String& rhs = tmp.get(); ... perform a destructive copy of rhs into *this ... } };
This is the Mojo protocol for passing arguments to functions.
Sometimes, things seem to work together so nicely, you get a nice design artifact without having worked towards it. True, those perky situations are in short supply, which makes them all the more valuable.
It just happens that with Mojo's design we can detect very easily whether a class supports Mojo or not. Simply write:
namespace mojo { template <class T> struct traits { enum { enabled = Loki::SuperSubclassStrict< enabled<T>, T >::value }; }; };
Loki readily offers the mechanics for detecting whether a type is derived from another [5].
Now you can find out whether an arbitrary type X was designed for the Mojo protocol by saying mojo::traits<X>::enabled. This detection mechanism is very important in generic code, as we'll soon see.
Optimizing Returning Values from Functions
Now that we have passing arguments right, let's see how to extend Mojo to optimize returning values from functions. Again, the goal is portable efficiency 100% elimination of unnecessary copies, without dependence on one particular RVO implementation.
Let's first see what the common advice says. For good reasons, some authors recommend constifying return values as well [7]. Continuing the old rules:
4. When a function returns a user-defined object by value, return a const value. Example:
const String operator+(const String& lhs, const String& rhs);
The idea underlying rule 4 is to make user-defined operators behave much as built-in operators by forbidding wrong expressions such as if (s1 + s2 = s3), when the intent is to say if (s1 + s2 == s3). If operator+ returns a const value, this particular bug will be detected at compile time. However, other authors [6] recommend against returning const values.
On a philosophical note, any return value is transitory par excellence; it's an ephemerid that just was created and will disappear soon. Then, why force operator+'s client to get a constant value? What's constant about this butterfly? Seen from this perspective, const temporary looks like an oxymoron, a contradiction in terms. Seen from a practical perspective, const temporaries force copying at destination.
Assuming we now agree it's best to keep const away from the return value if efficiency is important, how do we convince the compiler to move the result of a function to its destination instead of copying it?
When copying an object of type T, the copy constructor is invoked. Given that the copy constructor is a function like any other, it would appear that we can just apply the same ideas as above, leading to the following setting:
class String : public mojo::enabled<String> { ... public: String(String&); String(mojo::temporary<String>); String(mojo::constant<String>); };
This is a very nice setting, except for a little detail it doesn't work.
Remember when I said: "the copy constructor is a function like any other?" Well, I lied. The copy constructor is a special function in annoying ways. In particular, if for a type X you define X(X&) in lieu of X(const X&), then the following code doesn't work:
void FunctionTakingX(const X&); FunctionTakingX(X()); // Error! // Can't find X(const X&)!
This badly disables X, so we are forced to include the String(const String&) constructor. Now if you'll allow me to quote this very article, at a point I said: "So the first good idea is to not declare a function that accepts a const reference, because it swallows all objects like a black hole."
Can you say "conflict of interest"?
Clearly, copy construction needs special treatment. The idea here is to create a new type, fnresult, which serves as a "mover" for String objects. Here are the steps we need to take:
- Define fnresult such that a function that previously returned T will now return fnresult<T>. For this change to be transparent to callers, fnresult<T> must be convertible to T.
- Establish move semantics for fnresult: whenever an fnresult<T> object is copied, the T contained inside is moved.
- Similarly to operator constant and temporary, provide a conversion operator to fnresult in class mojo::enabled.
- A mojoed class (such as String in our example) defines a constructor String(mojo::fnresult<String>) that completes the move.
The definition of fnresult looks like this:
namespace mojo { template <class T> class fnresult : public T { public: // The cast below is valid given that // nobody ever really creates a // const fnresult object fnresult(const fnresult& rhs) : T(temporary<T>(const_cast<fnresult&>(rhs))) { } explicit fnresult(T& rhs) : T(temporary<T>(rhs)) { } }; }
Because fnresult<T> inherits T, step 1 is taken care of: an fnresult<T> converts to a T. Then, copying an fnresult<T> object implies moving its T subobject by forcing a conversion to temporary<T>, thus taking care of step 2.
As mentioned, we add a conversion to enabled that returns an fnresult. The final version of enabled looks like this:
template <class T> struct enabled { operator temporary<T>() { return temporary<T>(static_cast<T&>(*this)); } operator constant<T>() const { return constant<T>(static_cast<const T&>(*this)); } operator fnresult<T>() { return fnresult<T>(static_cast<T&>(*this)); } protected: enabled() {} // intended to be derived from ~enabled() {} // intended to be derived from };
Finally, String defines the constructor mentioned in step 4. Here's String with all its constructors:
class String : public mojo::enabled<String> { ... public: // COPY rhs String(const String& rhs); // MOVE tmp.get() into *this String(mojo::temporary<String> tmp); // MOVE res into *this String(mojo::fnresult<String> res); };
Now consider the following function:
mojo::fnresult<String> MakeString() { String result; ... return result; } ... String dest(MakeString());The route between MakeString's return statement and dest is: result > String::operator fnresult<String>() > fnresult<String>(const fnresult<String>&) > String::String(fnresult<String>).
A compiler using RVO can eliminate the fnresult<String>(const fnresult<String>&) call in the middle of the chain. What's most important, however, is that no function involved performs a true copy: they all are defined such that the actual content of result smoothly moves to dest. There is no memory allocation and no memory copying involved.
Now, as you see, there are two, maximum three, move operations. It might be possible that for certain types and under certain conditions, you are better off (speed wise) with one copy than with three moves. There is an important difference, though: copying might fail (throw an exception), while the move never fails.
Scaling Up
Ok, we got Mojo working, and quite nicely, on individual classes. Now how about scaling Mojo up to compound objects that contain many other objects, some of which are mojoed as well?
The task is to "pass down" a move constructor from a class to its members. Consider, for example, embedding class String above inside a class Widget:
class Widget : public mojo::enabled<Widget> { String name_; public: Widget(mojo::temporary<Widget> src) // source is a temporary : name_(mojo::as_temporary(src.get().name_)) { Widget& rhs = src.get(); ... use rhs to perform a destructive copy ... } Widget(mojo::fnresult<Widget> src) // source is a function result : name_(mojo::as_temporary(src.name_)) { Widget& rhs = src; ... use rhs to perform a destructive copy ... } };
The initialization of name_ in the destructive constructor uses an important Mojo helper function:
namespace mojo { template <class T> struct traits { enum { enabled = Loki::SuperSubclassStrict< enabled<T>, T >::value }; typedef typename Loki::Select< enabled, temporary<T>, T&>::Result temporary; }; template <class T> inline typename traits<T>::temporary as_temporary(T& src) { typedef typename traits<T>::temporary temp; return temp(src); } }
All as_temporary does is force creation of a temporary from an lvalue. This way, the move constructor of member_ is invoked for the destination object.
If String is mojoed, Widget takes advantage of that; if not, a straight copy is performed. In other words: if String is a superclass of mojo::enabled<String>, then as_temporary returns a mojo::temporary<String>. Otherwise, as_temporary(String& src) is simply the identity function that takes a String& and returns the same String&.
We availed ourselves of another Loki feature: Select<condition, T, U>::Result is either T or U, depending on whether the Boolean condition is true or false, respectively [5].
Application: auto_ptr's Cousin and Mojoed Containers
Consider a class mojo_ptr that disables a couple of constructors by making them private:
class mojo_ptr : public mojo::enable<mojo_ptr> { mojo_ptr(const mojo_ptr&); // const sources are NOT accepted public: // source is a temporary mojo_ptr(mojo::temporary<mojo_ptr> src) { mojo_ptr& rhs = src.get(); ... use rhs to perform a destructive copy ... } // source is a function's result mojo_ptr(mojo::fnresult<mojo_ptr> src) { mojo_ptr& rhs = src.get(); ... use rhs to perform a destructive copy ... } ... };
This class has an interesting behavior. You can't copy const objects of that class. You also can't copy lvalues of that class, ouch! But you can copy (with move semantics) temporary objects of that class, and you can explicitly move an object to another by writing:
mojo_ptr ptr1; mojo_ptr ptr2 = mojo::as_temporary(ptr1);
That's not too big a deal by itself; auto_ptr could have done that by simply making auto_ptr(auto_ptr&) private. The interesting part is actually not about mojo_ptr, but rather how, by using as_temporary, you can build efficient containers storing "classic" types, general mojoed types, and mojo_ptr alike. All such a container has to do is to use as_temporary whenever it needs to shuffle elements around. For "classic" types, as_temporary is the identity function that does nothing; for mojo_ptr, as_temporary is the function that facilitates smooth moving. The move and uninitialized_move function templates (see the attached code) come in very handy, too.
In standard terms, mojo_ptr is neither "copyable" nor "assignable." However, mojo_ptr could be considered as part of a new category of types, called "moveable." This is an important new category that also might include locks, files, and other non-duplicable handles.
The result? If you ever wished for an "owning container" à la vector< auto_ptr<Widget> > with safe, clear semantics, you just got it with sugar on top. Also, mojoized vectors scale well when containing expensive-to-copy types, such as vector< vector<string> >.
Conclusion
Mojo is a technique and a compact framework for eliminating unnecessary copying of temporaries. Mojo works by detecting temporaries and guiding them through a different function overload than lvalues. By doing so, the function taking a temporary can perform a destructive copy on it, confident that no other code is going to use that temporary.
Mojo applies if client code follows a set of simple rules for passing arguments to, and returning values from, functions.
Mojo defines a separate mechanism for eliminating copying in case of function returns.
The extra machinery and type drudgery makes Mojo less that 100% transparent to client code; however, the degree of integration is pretty good for a library-based solution. In the best of all worlds, Mojo will stand as a robust placeholder until a more elegant, language-based feature will be standardized and implemented.
Acknowledgements
Mojo has been intensely scrutinized and has had a short, but intense childhood.
David Abrahams made salient contributions to the implementation of move constructors. Rani Sharoni pointed out subtle bugs. Peter Dimov sent Mojo back to the whiteboard by figuring out a fatal problem in an earlier design.
Gary Powell worked a lot to extend Mojo to work with inheritance, and Evgeny Karpov simplified code considerably in the presence of template functions. I hope we'll be able to discuss these improvements in a future article.
Howard Hinnant, Peter Dimov, and Dave Abrahams deserve credit for the proposal to add move constructors to the language.
Many, many enthusiastic volunteers all over the world reviewed this article. Thank you all! I would like to especially mention Walter E. Brown, David Brownell, Marshall Cline, Peter Dimov, Mirek Fidler, Daniel Frey, Dirk Gerrits, Fredrik Hedman, Craig Henderson, Howard Hinnant, Kevin S. Van Horn, Robin Hu, Grzegorz Jakacki, Sorin Jianu, Jozef Kosoru, Rich Liebling, Ray Lischner, Eric Niebler, Gary Powell, William Roeder, Maciej Sinilo, Dan Small, Alf P. Steinbach, Tommy Svensson, David Vandevoorde, Ivan Vecerina, Gerhard Wesp, and Yujie Wu.
Bibliography
[1] Dov Bulka and David Mayhew. Efficient C++: Performance Programming Techniques, (Addison-Wesley, 1999).
[2] Howard E. Hinnant, Peter Dimov, and Dave Abrahams. "A Proposal to Add Move Semantics Support to the C++ Language," ISO/IEC JTC1/SC22/WG21 C++, document number N1377=02-0035, September 2002, <http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2002/n1377.htm>.
[3] "Programming Languages C++," International Standard ISO/IEC 14882, Section 12.2.
[4] Herb Sutter. More Exceptional C++ (Addison-Wesley, 2002).
[5] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).
[6] John Lakos. Large-Scale C++ Software Design (Addison-Wesley, 1996), Section 9.1.9.
[7] Herb Sutter. Exceptional C++ (Addison-Wesley, 2000).
Download the Code
About the Author
Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at [email protected]ge.com. Andrei is also one of the featured instructors of The C++ Seminar (<http://thecppseminar.com>).