Monday, April 27, 2009

INVARIANCE, COVARIANCE & CONTRAVARIANCE

Lately, the world of C# developers is wondering about the the meaning of two unusual words: covariance and contravariance.

There is a bunch of articles out there right now attempting to explain the concepts behind those words mainly by example, and why they matter more for the upcoming version 4.0 of C#.

To mention just a few articles on this topic:

Since this subject is new for many, plus, and let’s be honest here, it is NOT THAT easy to understand at first sight, I have decided to add my two cents by writing yet another article -explaining the way I understand it- with the hope it will eventually help others to also dig it once and for all.

Ok, enough introduction! Let’s jut begin …

From a design point of view, whenever we want to refer to inheritance, we would have to use words like “generalizations” and “specifications”. But in practice or in terms of implementation, we do use the word “inheritance” itself, plus the following two words: “based types” and “derived types”.

In what follows, I'll use the latter group of terms for the sake of easier understanding. So let’s start with some basic concepts, shall we?

(I) Basic Grounds

In theory, when talking about reference types, it is stated that a base type is bigger than a derived type because it can hold either an instance of its own type or an instance of all its derived types.

Therefore, a derived type is smaller than its base type because the former cannot hold an instance of the latter.

This is usually referred as T >= S, being T the base type and S its derived type. For instance,

Object >= String

Now, we are dealing with covariance when the reference to an object is declared as its real type or to one of its base types (and by “real” I mean the type used to create an instance of it, for example: “new Foo()”).

In other words, a derived type is covariant with its based type since the direction of attribution –and in what follows I will refer to this as “movement”, going from a derived type to a base type is thus allowed to happen.

Following this rationale, we are therefore dealing with contravariance when the “movement” goes along the opposite way.

And this is probably the most difficult concept to picture. Let’s hope this example clarifies it: if you have a delegate that, say, takes a string as an input parameter and returns a bool, then you can pass a method that takes an object as an input parameter and returns the same type, in this example: a bool. Why? Because if you can deal with a derived type within the expected method then you can also deal specifically with one of its base types (we will see an example of this later).

Finally, there’s a third concept: if “no movement” is allowed to happen, then we are dealing with invariance.

For instance, to prevent conflicting variance, mutable arrays “should” be always invariant on the base type. If you are creating an array of objects then you shouldn’t be able to insert, say, a string in that array (covariance). And thus, contrary, if you’re creating an array of strings you shouldn’t be able to remove an instance of an object from it (contravariance).

Thus, from object >= string, we could then infer that object[] >= string[], ONLY IF both arrays were immutable.

All the above-mentioned explanation is in line with Liskov’s Substitution Principle, specially in the sense that the way an object is referenced in a program must never alter any of its (“desirable”) properties.

(II) The Problem

In a static language like C#, “type-safety” implies that the compiler is capable of catching “casting” errors in the code at compile time.

As explained above, mutable arrays should be invariant to enforce type-safety. But this is not the case for arrays in C#, which are covariant on the base type. Puzzled? Then read the next paragraphs.

This means that “an error” in the source code that should be always caught by the C# compiler has been turned into an exception that can only be detected at runtime by the CLR!

Why? To answer this question and from now on let’s define three classes, which many seems to like to use in code samples on the subject, lately: Animal, Cat and Dog, being Cat and Dog both derived types of the same base class: Animal.

Well? You can always allow to store a Cat in an array of Animals, but with array covariance the real type to store could be an array of Dogs.

Take for instance the following code:

   1: Animal[] animals = new Dog[10];
   2: animal[1] = new Cat();

What do you expect it will happen in the above code? Possibly the compiler will detect an error here, right? Right?!

Wrong! Believe it or not, this code only throws a runtime exception instead of an error at compile time, on any version of C#!

If this is wrong, why is it allowed, then? Because the compiler knows that both derived-types are Animals, and therefore implicit casting is allowed for an array declared to expect Animals, even though the actual instance of the array is declared to contain a whole different derived type with a common base type (in the above code: Dog[10]).

Remember that in C# there is implicit casting from a derived type to its base type because it’s a type-safe operation, but the same does not apply the other way around. And thus, you must always use an explicit cast to convert a based type back to a derived type.

Unfortunately, this problem exists in C# since version 1 and will remain for version 4 (shhhhh! … don’t say it loud, but it’s a Java-related thing).

But let’s move further to what will do change …

In the previous section I talked about delegates. Well, in C#, delegates are covariant for return reference types and also contravariant for reference-type parameters. Puzzled again? Just read on …

Continuing with this animal thing, let’s take a look at the following example for covariance:

   1: public delegate Animal MyDelegate(int i);
   2:  
   3: MyDelegate myDelegate = new MyDelegate(MyMethod);
   4:  
   5: public Cat MyMethod(int i) { … }

If the declared delegate prompts for a method waiting for an Animal as a return type, the compiler knows that a Cat is an Animal and due to implicit casting from a derived type to its base class, you can pass a method that receives an Animal.

However, the opposite is not true. If the return type is a Cat, you cannot pass an Animal instead without an explicit cast to a Cat type. Why? Due to the fact that an Animal object could have really been created as a Dog!

Now, let’s move onto an example for delegate’s contravariance for parameters:

   1: public delegate int MyDelegate(Cat myCat);
   2:  
   3: MyDelegate myDelegate = new MyDelegate(MyMethod);
   4:  
   5: public int MyMethod(Animal myAnimal) { … }

If a declared delegate prompts for a method that expects a Cat as a parameter, then you can deal with its base class. Why? Because a Cat is an Animal and by inheritance you know how to deal with it as an Animal. So anything that can be assigned like a Cat can be passed and treated specifically as an Animal with type-safety, as long as the declared delegate and the passed method, both return the same type (in the example above, an integer).

Understood. But why is this contravariance? We are in fact reversing the direction of attribution since we are passing a method that takes an Animal as a parameter to a function pointer (or delegate) that expects a Cat as an argument.

Remember the definition for contravariance? We are passing a bigger type to a smaller type, here. So we are reversing the direction of attribution.

As with the example for covariance, the opposite is not true. If the delegate expects an Animal as a parameter, you cannot treat it as a Cat in the passed method, since again there is no guarantee at all that the passed Animal parameter is a Cat. It could be a Dog, instead.

A great example to fully understand this is related to two delegates expecting methods with the MouseEventArgs and KeyEventArgs parameters; they could refer instead to one method expecting a parameter of the type EventArgs. Meaning? In that situation, in both cases the same behavior is expected. So, as you wouldn’t care about the added functionality on the two “derived” classes, you could just deal with both arguments equally using the base type in common.

Ok, if this does work for delegates in current versions of C#, where’s the problem to solve then? Well, the above-mentioned rule is not applicable when storing generic delegates, which are always invariant in C# 3.0, for both the parameter and return types (and the same applies to interfaces).

Two examples:

  • You cannot return a IEnumerable<string> if the method returns IEnumerable<object> (that would be covariance), and
  • You cannot use an Action<object> delegate to replace an Action<string> delegate (that would be contravariance). Please bear in mind that I’m referring here to assigning, say:
   1: Action<Cat> myDelegate = new Action<Animal>
   2: ( myAnimal => myAnimal.DoSomethingWithTheAnimal() );

And not to something like- which of course does work just fine:

   1: Action<Cat> myDelegate = MyMethod;
   2:  
   3: public void MyMethod(Animal myAnimal) { … }

And yes, please! Try it with C# 3.0 if you are in doubt of my words.

(III) The Solution

So, what's all this fuzz with the new added use to the existing reserved words "in" and "out" in the upcoming C# 4.0?

For reference types:

  • in = contravariance (only passing arguments),
  • out = covariance (only returning types).

From the examples above:

  • Covariance: IEnumerable<T> will become IEnumerable<out T>, so a delegate declared to return a IEnumerable<object> will in fact be able to return an IEnumerable<string>, and
  • Contravariance: with Action<in T>, you will be able to assign an Action<object> delegate whenever you expect an Action<string> one.

I advice you to check the articles listed at the very beginning of this post for complete code samples on the subject.

Phew, this is it! As you can see, the theory behind these concepts is not that easy to understand but it’s neither that difficult, so I really hope you have found this explanation useful to finally accomplish that task.

‘till next time,
~Pete

> Link to Spanish version.

1 comment:

  1. Karl Bluemlinger7:14 AM, May 13, 2009

    Good article to start, great links, thanks.

    ReplyDelete

Any thoughts? Post them here ...