Thursday, June 02, 2011

Taking Herb Sutter out of Context (or type system assisted type safety)

I was watching this interview with Herb Sutter and an off-hand comment was made about how you need GC (or automatic memory management) for "type safety". What is "type safety"? I'm going to use a very informal definition, that is, we say a program is "type safe" if a value of T* is not null it is correct to interpret the memory it refers to as a T. This is pretty hand-wavy, but good enough for this exercise.

Now you might say "Well duh, malloc returns void*", but let's instead imagine a C++ variant with no reinterpret_cast (or C casts) no void *, no malloc and no placement or global operator new.

Is this a "type safe" language? Or does it need GC to take it all the way?

Consider the following program, X and Y are classes unrelated by inheritance:


X* x = new X;
x->something();
delete x;
Y* y = new Y;
X x2;
*x = x2;


I just called X::operator= on a Y with no casts. I was able to do this because this hypothetical implementation happens to be super aggressive with regards to using recently freed memory. Now, this is undefined behavior in C++ and would presumably be so in our hypothetical "type safe(er)" subset.

But is that useful? If you're the kind of person who fervently believes in nasal demon style consequences for undefined behavior and that it is foolish to ask for more you can skip to the end, because any undefined behavior in the program can scramble it, you don't need to rely on "aggressive" heap managers and plausible explanations.

So then what we need to do is either a) define this behavior, or b) disallow it or c) refine"undefined behavior" to allow for nasal demons et. al. but not for breaking the type safety of our programs.

Option C is easily said, and since a language spec. is just things you say, might even be easily done (although saying it specifically and clearly would probably be difficult in practice).

We could try to define this behavior. What if delete also set the pointer to null (and we made dereferencing null defined behavior, it calls std::terminate let's say)? This obviously won't be sufficient:

X* a = new X ;
X* b = a;
delete a;


A delete like this could also only ever operate on lvalues.


delete i_only_love_for_your_sideeffects();


would become illegal.

The b = a program snippet is also a counter example to a delete that takes its argument out of scope.

Maybe we can fix this by messing with our "aggressive" allocator. We could specify that once an X, always an X (if not necessarily the same X). That is to say our allocator would be specified as reserving blocks of memory for specific types, once reserved a block could not be reused for allocations of different types.

This places some constraints on our allocator implementation, and is somewhat dissatisfying. But it does satisfy our hand-wavy definition of type-safety. Part of the problem is that our definition of type-safety doesn't seem to let us write safer programs. A pointer that unpredictably becomes an alias to another object without any assignment will probably result in a buggy program, even if we can confidently state the type it aliases.

Except this allocator doesn't even give us that. Remember we're talking about C++ (or a C++ variant anyway) here. Have you ever called a virtual method from a constructor? What about a pure virtual method? What about from a destructor? By the time delete is done our T might not be a T anymore. Assuming a less aggressive allocator (or the program just never news another T) we're left with a type-unsafe program again.

delete seems to be a serious problem. The way garbage collection handles it is, they don't let you delete anything. Can we do the same? We could just remove delete from the language. Then we're fine as long as we have infinite memory. As a practical matter though, we need some way to reclaim the unused memory, since we don't have infinite memory. Oops garbage collection again.

So getting rid of delete is tantamount to GC. Can we keep it somehow? Earlier we discussed the idea of having delete null out its argument. That didn't work because of aliasing. Sounds like we need to get rid of aliasing.

To simplify eliminating aliasing we're going to remove some features from the language. Don't worry: later we'll try and put them back.

We're going to get rid of the address of operator and references (basically any use of the ampersand that doesn't involve Boolean logic). By removing references we just throw away a whole slew of opportunities for aliasing. Ditching the address of operator means the only way of manufacturing a new pointer value is with, well, new. We'll allow aliasing NULL (or we can treat it as syntax if you like).

Ok first we'll do the obvious way of disallowing aliasing: no copy constructor or assignment operator for pointer types. This is clearly useless as you can't even pass pointers to functions then.

Let's get a little more complicated then: when a pointer typed variable appears on the right hand side of an assignment or as a parameter to a function, that variable is no longer considered to be in scope.

It's as though

A * x = new A;
A * y = x;

were transformed to

A * y = NULL;
{
A * x = new A;
y = x;
}


As you can see, this is a purely mechanical transform, and can be enforced at compile-time. Except for one little wrinkle: global pointers. So ban 'em, we don't need them. Or we can be a little nicer and just ban assigning from them, but that has its own wrinkle. Note that this restriction is infectious, no global structs with pointer members or members with pointer members.

As a convenience we can allow assigning to an assigned from pointer variable and treating it as though we've introduced a new variable with same type and name. This allows code like:

x = munge(x);


We have to be careful with conditionals: a simple rule is to require that the same variables are assigned-from in all branches of the conditional, and this can be checked at compile-time. This is some more pain for our programmers but such is life.

We're now type-safe but still manually managing memory. Once you say

delete x;


x goes out of scope, so we can't use the value x referred to after it's been deleted, and we know there are no aliases to it.


T* sneaky(T* x)
{
delete x;
return x; // compiler error here
}

T * x = ...;
T * y = x;
delete x; // compiler error here
y->stuff();


As you may have realized by now I didn't invent this idea, it's called linear types and there's been a fair amount of research into it which you can look up with your favorite search engine. In this case we've made all pointers linear types, in languages that support this for "real" they'll typically have non-linear versions of the types as well.

The usual example actually isn't memory (because GC makes using linear types for this redundant) but rather file handles. A linear file handle let's you make sure you've got an open handle before you try to read or write to it, and insures you don't close a handle twice, etc. I think you can imagine how that would work by this point.

No comments: