Meta-Meta: 2011

Sunday, November 13, 2011

I did Two Shoes Weekend II: The Shoesening

Two Shoes Weekend is an event put on by my friend Casey. The basic idea is that you do two projects (two shoes if you will) over a weekend. It is inspired by Shoes, which is a _why the Lucky Stiff thing, but you don't have to use Shoes. This is the second run of two shoes weekend. Last time I almost did something, but I got distracted. At the time, I was using Vista (mock my operating system all you want, it's not germane to story time), and the Windows version of shoes had some issues there. Specifically, the background of the window was an inky black darkness, impervious to light. You could "fix" this problem by lying to Shoes, and saying you were XP. Unfortunately, this triggered the Vista bug that the code that created the inky blackness was designed to avoid, an abominable amount of flickering that would give you a headache. Part of my day job involves wrangling GDI and windows so I started to take a look at the code. I didn't really accomplish anything by doing this, and the weekend largely ended with me having little to nothing to show for it.

This year, I decided to avoid that whole class of problem by making one of my shoes shoes itself. More specifically, I took one part IronRuby, and one part WPF and whipped up a reasonable facisimle of Shoes. It has the "advantage" of being pure ruby since the IronRuby guys did all the work of binding ruby to the CLR for me, and the WPF guys did all the hard work of implementing the moral equivalent of "slots" for me. I took a rather haphazard approach though, basically implementing the bits that were needed for some initial sample projects. It wasn't until relatively late in the weekend that I actually started going through the manual and adding the features in the right way. It will probably never replace real Shoes for Shoes use case, for one thing you have to install IronRuby to run it, but it is sort of interesting for slapping a GUI on to some existing .NET functionality perhaps. Shoes seems remarkably well-suited for being implemented in WPF, stack might as well be a StackPanel, and flow might as well be a WrapPanel, and that's what I did really. WPF's pervasive tendency to use URIs might things like image "http://thing.png" worked with almost no effort on my part.

My second shoe was pretty boring, but I felt compelled to do two. I made a "lonely blog", aka a diary. It displays a stream of text and images that it saves/loads from a local file on your computer. It looks like this:

You enter either image urls or commentary and click the appropriate button. The save button saves the blog, and the entries will appear the next time you load it.

All in all, a fun time. I haven't done anything in Ruby in a while so that was fun. The code for the WPF/IronRuby shoes work-a-like is here, and the lonelyblog shoes app is here.

Sunday, June 26, 2011

Lambda didn't (or won't) kill the expression template star

Now that C++0x is on it's way, and many compilers already implement lambdas, it may be thought that the venerable "functor" is on it's way out. Well aside from the obvious reasons of supporting legacy code, that's simply not true. operator() is here to stay, and I will tell you why.

Lambdas, though convenient, are opaque. You can't peek inside of a lambda or peel back it's inner workings, and the vast majority of the time, that's fine, you don't need to. The same is true for normal named functions and member functions. On occasion however, a slightly more transparent tool can be useful, and operator() can help you build such translucent functions.

Let me try to justify this with an example. LLVM is a library for building compiler backends. It is written in C++, and it happens to have a coding standards document. One of the style points is "Turn Predicate Loops into Predicate Functions".

Basically, instead of code like

bool pred = false;
for(unsigned i = 0; i < list.size(); ++i)
{
  if( list[i].satisfies() )
  {
     pred = true;
     break;
  }
}

if( pred )
{
  ...
}

You should have code like

if( TestPred(list) )
{
  ...
}

They have their justifications, which you can read if you're interested. This really isn't about whether this is a good idea or not, but rather how to write TestPred. There is an obvious approach:

bool TestPred(const std::vector<Something>& list)
{
  for(unsigned i = 0; i < list.size(); ++i)
  {
    if( list[i].satisfies() ) return true;
  }
  return false;
}

The problem with this, is that you can't reuse the predicate. Or rather, that you can, but only in a very specific way. If you need to test a std::list of Something instead you need to either copy the list or duplicate the code. Worse, if you don't know what the code is, you don't even have duplication as an option.

A brief aside, yes a solution to this mess would be to just use std::find_if in the first place. This is just however the motivating example that came to mind when I happened to be reading the LLVM coding guidelines, so bear with me.

So we can change TestPred to operate on iterators. Now we can use it with list, vector, etc.


template <typename FwdIter>
bool TestPred(FwdIter begin, FwdIter end)
{
  for(FwdIter i = begin; i != end; ++i)
  {
    if( i->satisfies() ) return true;
  }
  return false;
  // or return find_if(begin, end, [](const Something& s) { return s.satisfies(); }) != end
}

But before we had the nice feature of being able to call this on a std::vector directly. We could add overloads for that, but that gets tedious especially since we're writing one of these functions every time we have a new predicate we need to loop over. We can use templates to let us get that std::vector overload back (and more, std::list for example, and your plain old array) by writing the code for iteration once and plugging in the varying i->satisfies part.


template<typename F>
struct RangedAnyPredicate
{
 typedef F Tester;
 template<typename FwdIter>
   bool operator()(FwdIter begin, FwdIter end) const
   {
     for(FwdIter i = begin; i != end; ++i)
     {
       if(F::test(*i))
       {
         return true;
       }
     }
     return false;
   }
};

struct satisfier
{
 typedef const Something& testee_type;
 static bool test(const Something& s)
 {
   return s.satifies();
 }
};

static RangedAnyPredicate<satisfier> TestPred;

Here we've extracted out just the looping over part. TestPred will behave almost identically to our templated on a pair of iterators version of the function (exceptions being things like, we can't pass this TestPred around as a function pointer). This is more typing that previous version of course, even with multiple predicates the overhead of a struct for each one is high. Where it starts to pay dividends is when provide more overloads for free. We're going to need some traits to do that, and again while this is a bunch of code up front, some of these traits are independently useful and already exist.

We want to be able to call this on two iterators, on a container with begin and end methods, and on a standard array. We can distinguish between the two argument and single argument forms easily with the normal overloading mechanism. For distinguishing between containers with begin and end and standard arrays we'll assume anything not an array should have a begin and an end.

So to detect arrays will use an is_array trait:

template<typename T>
struct is_array
{ enum {val = false}; };

template<typename T, long N>
struct is_array<T [N]>
{ enum {val = true}; };

We can then use this information to decide to either call TestPred(arg.begin(), arg.end()) or TestPred(arg, arg + N).


 template<bool C>
    struct do_test;

    template<>
    struct do_test<true>
    {
       template <typename Self, typename T, int N>
       static bool go(const Self& self, T (&a)[N])
       {
         return self(a, a + N);
       }
   };

   template<>
   struct do_test<false>
   {
     template<typename Self, typename C>
     static bool go(const Self& self, const C& c)
     {
       return self(c.begin(), c.end());
     }
   };

   ...
   // back inside RangedAnyPredicate
    template<typename Container>
    bool operator()(const Container& c) const
    {
      return do_test<is_array<Container>::val>::go(*this, c);
    }
   ...

Now we've got overloads for vector, list, and any other container that might come along and behave like a standard sequence, as well as old fashioned arrays.

But there's more. You may have noticed we included a Tester typedef in RangedAnyPredicate. This is where the ability to make a functor less opaque than a regular function or a lambda starts to be useful. Let's say we have two predicates, PredA and PredB. And furthermore the coding guidelines were followed and we have TestPredA and TestPredB functions to work with. If we want to find out if a list has an eliminate that satisfies PredA or PredB we can do the obvious:


if( TestPredA(list) || TestPredB(list) )
{
   ...
}

But this is C++! The code above has the potential to needlessly traverse the list twice. We've got to be able to do something about that, right? Well not if TestPredA and TestPredB are regular functions. We can write a third function that does the or-ing, but it can't peek inside the functions. If however, we use RangedAnyPredicate to define TestPredA and TestPredB we can extend them as needed. First will need a predicate to represent the or-ing of the internal predicates of A and B:


template
struct OrCombiner
{
  typedef typename Lhs::testee_type testee_type;
  static bool test(testee_type testee)
  {
    return Lhs::test(testee) || Rhs::test(testee);
  }
};

Next we use a little operator-overloading-fu in RangedAnyPredicate to make this easy to use:


// Back in RangedAnyPredicate
template<typename TestF>
    RangedAnyPredicate< OrCombiner<F, TestF> > operator|(RangedAnyPredicate  )
    {
      return RangedAnyPredicate< OrCombiner<F, TestF> >();
    }

Now our code becomes



if( (TestPredA | TestPredB)(list) )
{
   ...
}

and only one pass is made over the list. If you want different semantics, of course you can just simply use the prior code. The same can be done for logical and as well of course.

This was of course a contrived example, but I hope of demonstrated how the functor still has its place, even in a world of lambdas.

Thursday, June 02, 2011

Taking Herb Sutter out of Context (or type system assisted type safety)

I was watching this interview with Herb Sutter and an off-hand comment was made about how you need GC (or automatic memory management) for "type safety". What is "type safety"? I'm going to use a very informal definition, that is, we say a program is "type safe" if a value of T* is not null it is correct to interpret the memory it refers to as a T. This is pretty hand-wavy, but good enough for this exercise.

Now you might say "Well duh, malloc returns void*", but let's instead imagine a C++ variant with no reinterpret_cast (or C casts) no void *, no malloc and no placement or global operator new.

Is this a "type safe" language? Or does it need GC to take it all the way?

Consider the following program, X and Y are classes unrelated by inheritance:


X* x = new X;
x->something();
delete x;
Y* y = new Y;
X x2;
*x = x2;

I just called X::operator= on a Y with no casts. I was able to do this because this hypothetical implementation happens to be super aggressive with regards to using recently freed memory. Now, this is undefined behavior in C++ and would presumably be so in our hypothetical "type safe(er)" subset.

But is that useful? If you're the kind of person who fervently believes in nasal demon style consequences for undefined behavior and that it is foolish to ask for more you can skip to the end, because any undefined behavior in the program can scramble it, you don't need to rely on "aggressive" heap managers and plausible explanations.

So then what we need to do is either a) define this behavior, or b) disallow it or c) refine"undefined behavior" to allow for nasal demons et. al. but not for breaking the type safety of our programs.

Option C is easily said, and since a language spec. is just things you say, might even be easily done (although saying it specifically and clearly would probably be difficult in practice).

We could try to define this behavior. What if delete also set the pointer to null (and we made dereferencing null defined behavior, it calls std::terminate let's say)? This obviously won't be sufficient:


X* a = new X ;
X* b = a;
delete a;

A delete like this could also only ever operate on lvalues.


delete i_only_love_for_your_sideeffects();

would become illegal.

The b = a program snippet is also a counter example to a delete that takes its argument out of scope.

Maybe we can fix this by messing with our "aggressive" allocator. We could specify that once an X, always an X (if not necessarily the same X). That is to say our allocator would be specified as reserving blocks of memory for specific types, once reserved a block could not be reused for allocations of different types.

This places some constraints on our allocator implementation, and is somewhat dissatisfying. But it does satisfy our hand-wavy definition of type-safety. Part of the problem is that our definition of type-safety doesn't seem to let us write safer programs. A pointer that unpredictably becomes an alias to another object without any assignment will probably result in a buggy program, even if we can confidently state the type it aliases.

Except this allocator doesn't even give us that. Remember we're talking about C++ (or a C++ variant anyway) here. Have you ever called a virtual method from a constructor? What about a pure virtual method? What about from a destructor? By the time delete is done our T might not be a T anymore. Assuming a less aggressive allocator (or the program just never news another T) we're left with a type-unsafe program again.

delete seems to be a serious problem. The way garbage collection handles it is, they don't let you delete anything. Can we do the same? We could just remove delete from the language. Then we're fine as long as we have infinite memory. As a practical matter though, we need some way to reclaim the unused memory, since we don't have infinite memory. Oops garbage collection again.

So getting rid of delete is tantamount to GC. Can we keep it somehow? Earlier we discussed the idea of having delete null out its argument. That didn't work because of aliasing. Sounds like we need to get rid of aliasing.

To simplify eliminating aliasing we're going to remove some features from the language. Don't worry: later we'll try and put them back.

We're going to get rid of the address of operator and references (basically any use of the ampersand that doesn't involve Boolean logic). By removing references we just throw away a whole slew of opportunities for aliasing. Ditching the address of operator means the only way of manufacturing a new pointer value is with, well, new. We'll allow aliasing NULL (or we can treat it as syntax if you like).

Ok first we'll do the obvious way of disallowing aliasing: no copy constructor or assignment operator for pointer types. This is clearly useless as you can't even pass pointers to functions then.

Let's get a little more complicated then: when a pointer typed variable appears on the right hand side of an assignment or as a parameter to a function, that variable is no longer considered to be in scope.

It's as though


A * x = new A;
A * y = x;

were transformed to


A * y = NULL;
{
  A * x = new A;
  y = x;
}

As you can see, this is a purely mechanical transform, and can be enforced at compile-time. Except for one little wrinkle: global pointers. So ban 'em, we don't need them. Or we can be a little nicer and just ban assigning from them, but that has its own wrinkle. Note that this restriction is infectious, no global structs with pointer members or members with pointer members.

As a convenience we can allow assigning to an assigned from pointer variable and treating it as though we've introduced a new variable with same type and name. This allows code like:


x = munge(x);

We have to be careful with conditionals: a simple rule is to require that the same variables are assigned-from in all branches of the conditional, and this can be checked at compile-time. This is some more pain for our programmers but such is life.

We're now type-safe but still manually managing memory. Once you say


delete x;

x goes out of scope, so we can't use the value x referred to after it's been deleted, and we know there are no aliases to it.


T* sneaky(T* x)
{
delete x;
return x; // compiler error here
}

T * x = ...;
T * y = x;
delete x; // compiler error here
y->stuff();

As you may have realized by now I didn't invent this idea, it's called linear types and there's been a fair amount of research into it which you can look up with your favorite search engine. In this case we've made all pointers linear types, in languages that support this for "real" they'll typically have non-linear versions of the types as well.

The usual example actually isn't memory (because GC makes using linear types for this redundant) but rather file handles. A linear file handle let's you make sure you've got an open handle before you try to read or write to it, and insures you don't close a handle twice, etc. I think you can imagine how that would work by this point.

Sunday, May 22, 2011

Management (or the art of delegation)

Microsoft's .NET has a method called GetFunctionPointerForDelegate. It allows you to turn a managed delegate into a function pointer that can be passed around to native code. A delegate is usually a method bound to a particular instance of a class.

This sort of thing can be useful outside of the managed world as well. Many APIs that take callbacks allow you to pass in some additional user-specified data, usually as a void* or similar. However, some times they don't provide this capability. There are a few ways around this involving some variety of globals or thread-local storage, but this becomes awkward if we need to use more than one instance of this callback at a time.

AsmJit is a cute little C++ library for doing runtime code-gen for x86 and x64. In the following example I'm going to use it to let me use a member function as a callback. This code is for x86 using the cdecl calling convention, on linux.


#include "AsmJit.h"
#include <iostream>

typedef void (*callback)(int);

struct A
{
   int m_accum;
   void doIt(int x)
   {
     m_accum += x;
     std::cout << "A: " << x << " " << m_accum << std::endl;
   }
};

void A_trampoline(A* a, int x)
{
  a->doIt(x);
}

void takes_callback(callback cb)
{
  cb(1);
  cb(2);
  cb(3);
}

int main()
{
  using namespace AsmJit;
  A a = A();

  Assembler assemblr;


  assemblr.push(dword_ptr(esp, 4)); // push the original int arg
  assemblr.push(imm(reinterpret_cast<intptr_t>(&a))); // push a
  assemblr.call(imm(reinterpret_cast<intptr_t>(&A_trampoline))); // call A_trampoline(&a, [esp + 4])
  assemblr.add(esp, 8); // clean up the stack, cdecl is caller cleans up
  assemblr.ret(); // return

  callback cb = function_cast<callback>(assemblr.make());
  takes_callback(cb);

}

Here takes_callback is one of those lame callback taking apis that don't let you pass around any user-speicified info. Using a little bit of x86 assembly we create a thunk, that adds an additional argument (the instance of A) and forwards it to A's doIt method.

If we run this we get as ouptut:


A: 1 1
A: 2 3
A: 3 6

Now an interesting thing to consider is, what would this code look like with different calling conventions? In particular, Microsoft's __thiscall is interesting. __thiscall is like Microsoft's __stdcall, except for the quirk that the this pointer is passed in the ecx register. __stdcall is like cdecl, except the callee is responsible for cleaning up the stack. Knowing this, and assuming our callback function was expected to be __stdcall our thunk code could look like (pseudo code)


mov ecx, &a
jmp A::doIt

Only two instructions! Furthermore this template of two instructions works regardless of the arity of the member function we're forwarding to, and we don't have to copy any stack arguments or shuffle it around like we needed to for cdecl. Given the thunk uses a jmp, rather than a call, the thunk itself doesn't appear on the call stack during a stack trace, but it looks rather as though the member function had directly been called. This is arguably a nice feature from a debugging standpoint (is this a dynamically generated function on the stack, or did we go off into the woods?). It almost looks like __thiscall was at least partially designed with this use case in mind.

Sunday, March 13, 2011

Memory leaks and other bugs

It was pointed out to me that my last post had a bug (one that I wasn't aware of). Specifically, pure_modify leaks the old value. It is a simple fix:



template<typename F>
  T pure_modify(F f)
  {
    std::auto_ptr<T> mem;
    T* old_val; // not an auto_ptr becase if we throw we have not taken ownership
               // We either threw from user code, or from new. And throwing user
               // user code would be wrong anyway (impure)
    AdapterF<F,T> adapter;
    adapter.f = f;
    adapter.mem = &mem;
    adapter.old_val = &old_val;
    m_impl.pure_modify(adapter);
    delete old_val;
    return *mem.release();
  }

We'll also need to adjust AdapterF for this


template<typename F, typename V>
  struct AdapterF
  {
    F f;
    std::auto_ptr<V>* mem;
    V** old_val;
    void* operator()(void *in)
    {
      V r = f(*(*old_val = static_cast<V*>(in)));
      mem->reset(new V(r));
      return static_cast<void*>(mem->get());
    }
  };

Once pure_modify returns, we know that the last pointer value in old_val belongs to us, because we've successfully replaced it with something else, and it becomes our responsibility to clean it up.

So then, what was the other bug I alluded to last time? That was the so-called "ABA" problem. Basically, with this code I treat pointers as being unique identities for the life of the process. However, all heap implementations reuse memory (that's sort of the point, if you didn't reuse it you wouldn't need free) so two different objects can have the same address at different points during the lifetime of the program.

Two threads walk into an MVar. They both read the value and begin their operation. Thread A's operation is particular CPU intensive. Thread B is done lickity split and puts a new value in. Meanwhile Thread C comes along, and his operation is also fast. Thread B returned the old block of memory to the heap manger, so when Thread C finishes, it happens to grab that very same block for its result and puts it back into the MVar. Thread A finally finishes up, sees the pointer value is the same as last time, and happily shoves his modification in as the new value, completely unaware he had just wiped out the effects of two prior modifications.

And these sorts of things are the reasons why I'm probably not going to go off and implement this kind of abstraction at my day job. That and pure_modify's constraints aren't all that C++-friendly and it's hard to imagine good use cases. Some kind of atomic bignum library maybe?

Wednesday, February 02, 2011

Purity, laziness, and stupid compiler tricks

So now we have a pointer sized (on x86-32 Linux anyway), type-safe concurrenc ry primitive in our MVar<T>. I'm going to add an additional operation, and we're going to talk about the ramifications.

First, we need to add it to ptrmvar32:



  template<typename F>
  void* pure_modify(F f)
  {
    uintptr_t curr;
    while(1) {
      curr = m_val;
      if( curr == 0 )
      {
        if( __sync_bool_compare_and_swap(&m_val, 0, 1) )
          futex_wait(&m_val, 1);
      } else if( curr == 1 ) {
        futex_wait(&m_val, 1);
      } else {
        void *new_val = f(reinterpret_cast<void*>(curr & ~1));
        if( __sync_bool_compare_and_swap(&m_val, curr, new_val) )
           break;
      }
    }
  }

We're going to add this to the type-safe version as well before we get into it, so don't worry if this looks confusing.


private:
  template<typename F, typename V>
  struct AdapterF
  {
    F f;
    std::auto_ptr<V>* mem;
    void* operator()(void *in)
    {
      V r = f(*static_cast<V*>(in));
      mem->reset(new V(r));
      return static_cast<void*>(mem->get());
    }
  };

public:
  template<typename F>
  T pure_modify(F f)
  {
    std::auto_ptr<T> mem;
    AdapterF<F,T> adapter;
    adapter.f = f;
    adapter.mem = &mem;
    m_impl.pure_modify(adapter);
    return *mem.release();
  }

Because this is C++, there is some noise with regards to memory management, but it should be pretty clear what this new operation does. It takes the current value of the MVar, applies a function to the value, and returns a new value tha t is put into the MVar. Now, the analogous operation for Haskell is modifyMVar/m odifyMVar_. Interestingly the type of the function that the Haskell modifyMVar t akes is


  a -> IO a

This is because Haskell's modifyMVar has semantics more like the following:


  template<typename F>
  T haskell_modify(F f)
  {
     T val = take();
     struct ValRestorer {
        T val;
        bool restore;
        MVar<T> *self;
        ~ValRestorer() { if(restore) self->put(val); }
        ValRestorer(T val_, MVar<T> *self_) : val(val_), self(self_),
        restore(true) {}

        void Dismiss() { restore = false; }
     } restorer(val, this);
     T new_val = f(val);
     put(new_val);
     restorer.Dismiss();
  }

In other words, Haskell's modify is to make the update transactional in the face of exceptions in the modifying code. To properly use modify, you have to ma ke sure all the threads take from the mvar before they put to it.

I called attention to the type of modifyMVar earlier, and also by naming my m ethod pure_modify. As should be obvious from the code, the user supplied functio n may be called many times (although we hope it isn't in the common case). In Ha skell, the user supplied function would have to have the type


a -> a

This is wonderful! Wonderful because we can actually express it. Among other things, we know the user supplied function won't take any other locks for instan ce, so modification is deadlock free. The other great thing, is that we can feel comfortable calling it as many times as we need, because beign a pure function the observable effect is that it atomically modified the value in the MVar.

Often when we think about purity and how it enables laziness (or is it the ot her way around?), we think in terms of functions being called less often (someti mes we even think laziness implies automatic memoization, this is not the case, and if you think about the memory usage you'll quickly realize why). But here, w e can leverage the same properties of pure code to spend less time blocked or wa iting, and hopefully, in the common case waste less CPU time.

Finally, see if you can see what classic "lock-free programming" bug I've int roduced into this code (or bugs? There's at least one) by adding this pure_modif y operation.

Sunday, January 16, 2011

Implementing Haskell-style MVars in C++ using futexes

Last time we talked about what Linux futexes were. Now, I want to use them to implement a concept from Haskell called an MVar. An MVar can be thought of as concurrent queue with a size of 1, that is it is a box which you can put things in, and take things out safely from multiple threads. The basic operations are two:

put(T value)

T take()

When the MVar is empty, take'ing blocks, when it is full, put'ing blocks. Real Haskell MVars have guarantees with regards to fairness, we're not going to address those here. In fact, we're going to do a "cute" implementation, and it has several flaws. First, it's only going to work on x86-32, this will not be a portable implemnetation. But this lets us be "cute", our MVar is going to be only the size of an int. Below is `ptrmvar32`, it is a 32bit void* MVar implementation that we we'll use to build our MVar<T>. NULL will represent empty, and we'll use the LSB to note whether we need to wake anyone up (using only one bit for this is one of the flaws).


#ifndef PTRMVAR32_H
#define PTRMVAR32_H
#include <stdint.h>
#include <assert.h>
class ptrmvar32
{
  uintptr_t m_val;
  static char s_pointers_are_32_bits[sizeof(void*) == 4 ? 1 : -1];
  static void futex_wait(void *, uintptr_t);
  static void futex_wake(void *);
  public:
  ptrmvar32(void *initial_value) : m_val(reinterpret_cast<uintptr_t>(initial_value))
  {
    assert(m_val);
  }

  ptrmvar32() : m_val(0)
  {}

  void* take()
  {
    uintptr_t curr;
    while(1) {
      curr = m_val;
      if( curr == 0 )
      {
        if( __sync_bool_compare_and_swap(&m_val, 0, 1) )
          futex_wait(&m_val, 1);
      } else if( curr == 1 ) {
        futex_wait(&m_val, 1);
      } else {
        if( __sync_bool_compare_and_swap(&m_val, curr, 0) )
        {
          if( curr & 1 )
          {
            futex_wake(&m_val);
          }
          curr &= ~1;
          return reinterpret_cast(curr);
        }
      }
    }
  }

  void put(void *val)
  {
    const uintptr_t new_val = reinterpret_cast<uintptr_t>(val);
    uintptr_t curr;
    while(1) {
      curr = m_val;
      if( curr == 0 )
      {
        if( __sync_bool_compare_and_swap(&m_val, 0, new_val) )
          return;
      } else if( curr == 1 ) {
        if( __sync_bool_compare_and_swap(&m_val, 1, new_val) ) {
          futex_wake(&m_val);
          return;
        }
      } else {
        if( __sync_bool_compare_and_swap(&m_val, curr, curr | 1) ) {
          futex_wait(&m_val, curr | 1);
        }
      }
    }
  }

  void* unsafe_value()
  {
    return reinterpret_cast<void*>( m_val & ~1);
  }
};
#endif

ptrmvar32.cpp contains the implementation of futex_wake and futex_wait which are wrappers around the futex syscall


#include "ptrmvar32.h"
#include &t;linux/futex.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <limits>

void ptrmvar32::futex_wait(void *on, uintptr_t when)
{
  syscall(__NR_futex, on, FUTEX_WAIT, when, NULL, 0, 0);
}

void ptrmvar32::futex_wake(void *on)
{
   syscall(__NR_futex, on, FUTEX_WAKE, std::numeric_limits<int32_t>::max(), NULL, 0, 0);
}

As you can see futex_wake wakes up everyone, not just one waiter. This is because someone could be waiting to take and someone could be waiting to put and if we wake the wrong one of the two we'll never wake up. If this situation does actually occur we end up in thudering herd territory.

Next we build our typesafe, templated version on top of this:


#include "ptrmvar32.h"

template<typename T>
class MVar
{
  ptrmvar32 m_impl;

  public:
  MVar() : m_impl()
  {}

  MVar(const T& val) : m_impl(static_cast(new T(val)))
  {}

  void put(const T& val)
  {
    T* p = new T(val);
    m_impl.put(static_cast(p));
  }

  T take()
  {
    T *p = static_cast(m_impl.take());
    T r(*p);
    delete p;
    return r;
  }

  ~MVar()
  {
    delete static_cast(m_impl.unsafe_value());
  }
};

Now put and take use new and delete, really we should be using a thread local allocator here, but I got too lazy to implement that part. Maybe next time. Ignoring that, you can see that this is a very light weight primitive, only taking 4 bytes. The pthread based implementation would take at least a mutex and condition variable, probably two condition variables to avoid the problem of waking up the wrong threads that this implementation is suspectible to. However, this is implemented the way it is to set up for next time, where we are going to talk about the implications of laziness and purity, and the operation that Haskell doesn't provide but could.

Here's a sample program making use of the MVars:


#include "mvar_template.h"
#include <string>
#include <iostream>
#include <pthread.h>

using namespace std;

namespace
{
  MVar<string> in;
  MVar<string> out;

  void *worker(void *)
  {
    cout << "waiting for value" << endl;
    string v = in.take();
    cout << "got value: " << v << endl;
    v += " gotten";
    out.put(v);
    return NULL;
  }

}

int main()
{
  cout << sizeof(in) << endl;
  string s = "in";
  pthread_t t;
  pthread_create(&t, NULL, &worker, NULL);
  pthread_detach(t);
  in.put(s);
  string r = out.take();
  cout << "got " << r << endl;
}