Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Discriminated Unions (I)


Believe me: in spite of the appearances, if you were looking for an article on programming, you're in the right place. It's not about discrimination against work unions. This installment of "Generic<Programming>" discusses the discriminated union data type.

Discriminated union is a concept with many aliases: disjoint union, variant type, or algebraic type. Regardless of what you call them, discriminated unions come in very useful as data abstractions in a variety of applications. Many languages (mostly the functional languages) rely on discriminated unions for pretty much all interesting data structures. It is possible that you will use discriminated unions with C++ in the near future as well, because...

But what is a discriminated union, and what's in it for you? You will find much more about this soon, but first, per popular request, allow me to make an addendum to my previous column.

An Add-Hoc Addendum to Ad-Hoc Visitation

The previous installment of "Generic<Programming>" [1] defined a class template AdHocVisitor, which implements Visitor-like functionality over a class hierarchy post factum. (In Latin, "post factum" stands for "without modifying that class hierarchy.")

AdHocVisitor's semantics are pretty simple. Say you instantiate:

typedef AdHocVisitor<cons<TextArea, VectorGraphics, 
    Bitmap>::type>
    DocElementVisitor;

Then, AdHocVisitor's workings will ensure proper visitation: when you call DocElementVisitor::StartVisit, the appropriate overload Visit(TextArea*), Visit(VectorGraphics*), or Visit(Bitmap*) will be automatically invoked.

The mechanism through which AdHocVisitor performs this argument type deduction is equivalent to an if-else chain; that is, the code does something like:

if (TextArea* pTextArea = dynamic_cast<TextArea*>(p))
{
    Visit(pTextArea);
}
else if (VectorGraphics* pVectorGraphics =
    dynamic_cast<VectorGraphics*>(p))
{
    Visit(pVectorGraphics);
}
else if (Bitmap* pBitmap = dynamic_cast<Bitmap*>(p))
{
    Visit(pBitmap);
}
else
{
    throw "Unknown type passed";
}

As my previous column explains, the way AdHocVisitor implements the cascading tests is through a recursive template that operates on a typelist, but the net semantics is the same as that of the above chain of if-else statements.

Now imagine that later you define a new class FormattedTextArea that inherits TextArea. Naturally, you would append a type to the DocElementVisitor's definition so that you can visit FormattedTextArea objects as well:

typedef AdHocVisitor<cons<TextArea, VectorGraphics, 
    Bitmap, FormattedTextArea>::type>
    DocElementVisitor;

When visiting with DocElementVisitor, the added dynamic_cast<FormattedTextArea*> test will be placed at the bottom of the if-else food chain, which means that it will never succeed — any FormattedTextArea "Is-A" TextArea, so dynamic_cast<TextArea*> works. Consequently, the code will always think that a FormattedTextArea is really a TextArea and will visit it as such. The last test "starves" because is always preempted by the precedent one. This never helps.

A simple solution is to shuffle the typelist so that FormattedTextArea comes before TextArea:

typedef AdHocVisitor<cons< FormattedTextArea, TextArea, 
    VectorGraphics, Bitmap>::type>
    DocElementVisitor;

A more elaborate solution automates this task. Indeed, Loki comes with a nice DerivedToFront typelist shuffler that does exactly what we need here. You need to look up Loki's code [2] or the explanations [3] for the full story, but the usage is remarkably simple:

typedef DerivedToFront<cons< FormattedTextArea, TextArea, 
    VectorGraphics, Bitmap>::type>
    ShuffledHierarchy;
typedef AdHocVisitor<ShuffledHierarchy::Result>
    DocElementVisitor;

So this is pretty much the addendum. Now, when writing the previous column, the word limit was looming close, and I decided not to address this warning, but kept the focus on typelists, not on hierarchy visitation subtleties. In doing this, I was reminded of a remark made by Scott Meyers in an email:

"Too many digressions [...] prevent readers from absorbing the full impact of what you want to say, because they get distracted. [...] When writing, I think it's important to maintain focus."

Sometimes real life is more fun than the best sitcom, because the first person to send an email about the previous column's omission was ... Scott Meyers himself. The second person was Uwe Behrens. Scott really deserves an extra bonus because he also knew all about the DerivedToFront-based solution. Obviously, Uwe hasn't read Modern C++ Design [3]; otherwise he'd be much more optimistic about this approach than his comments suggest:

"I have used similar schemes in the past for my own applications, including implementations of the Visitor pattern, and during that, I have learned about a serious flaw in this approach. And I learnt it the hard way (i.e., in night-long sessions with a debugger)."

To conclude, the bad news is that I didn't mention an important issue in "Typelists and Applications" [1], and the good news is that that issue has an automated solution [2, 3]. Thanks much to Scott Meyers and Uwe Behrens.

So What Are Discriminated Unions?

Ok, back to le menu du jour. There's a paper on discriminated unions [4] that is the basis of the article you're now reading. That paper, however, is written for a narrow, specialized audience — so many readers asked for a more detailed presentation.

A discriminated union is a data type that's able to store one value of a bounded collection at a time and to offer typesafe set and access primitives for that value.

There are a couple of keywords here that deserve close attention. First off, the offered access must be typesafe, which means that the C vintage union feature, although it seems to qualify, fails because it doesn't offer that typesafe access — in other words, it's not discriminate.

union SomeData
{
    int anInteger_;
    double aDouble_;
    char* aPointer_;
};

The compiler simply stores the three member variables in overlapped memory and doesn't stop you from illegal uses such as:

// Likely a silent reinterpret_cast
SomeData dt;
dt.aPointer_ = "Hello, cruel world of undiscriminated unions";
int weird = dt.anInteger;

The second important point to make is about the bounded aspect of the collection of types. That marks an important difference between full-blown dynamic polymorphism and discriminated unions. You can think of a pointer to a base class as some sort of a union, because it can point to anything derived from that base class. However, the set is unbounded — you don't know the set of possible types beforehand. In contrast, a discriminated union only stores a bounded set of possible types, set known at its definition point. Some of those types, however, can of course be polymorphic themselves.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.