Polymorphism in C++ - By ProfMariaMichael

Transcription

Polymorphism in C++ - By ProfMariaMichael
Polymorphism in C++
2012
The Four Polymorphisms in C++
When people talk about polymorphism in C++ they usually mean the thing of using a derived class
through the base class pointer or reference, which is called subtype polymorphism. But they often
forget that there are all kinds of other polymorphisms in C++, such as parametric polymorphism, adhoc polymorphismand coercion polymorphism.
These polymorphisms also go by different names in C++,

Subtype polymorphism is also known as runtime polymorphism.

Parametric polymorphism is also known as compile-time polymorphism.

Ad-hoc polymorphism is also known as overloading.

Coercion is also known as (implicit or explicit) casting.
In this article I'll illustrate all the polymorphisms through examples in C++ language and also give insight
on why they have various other names.
Subtype Polymorphism (Runtime Polymorphism)
Subtype polymorphism is what everyone understands when they say "polymorphism" in C++. It's the
ability to use derived classes through base class pointers and references.
Here is an example. Suppose you have various cats like these felines,
www.profmariamichael.com
Page 1
Polymorphism in C++
2012
Polymorphic cats on a mat by James Halliday.
Since they are all of Felidae biological family, and they all should be able to meow, they can be
represented as classes inheriting from Felid base class and overriding the meow pure virtual function,
// file cats.h
class Felid {
public:
virtual void meow() = 0;
};
class Cat : public Felid {
public:
void meow() { std::cout << "Meowing like a regular cat! meow!\n"; }
};
class Tiger : public Felid {
public:
void meow() { std::cout << "Meowing like a tiger! MREOWWW!\n"; }
};
www.profmariamichael.com
Page 2
Polymorphism in C++
2012
class Ocelot : public Felid {
public:
void meow() { std::cout << "Meowing like an ocelot! mews!\n"; }
};
Now the main program can use Cat, Tiger and Ocelot interchangeably through Felid (base class)
pointer,
#include <iostream>
#include "cats.h"
void do_meowing(Felid *cat) {
cat->meow();
}
int main() {
Cat cat;
Tiger tiger;
Ocelot ocelot;
do_meowing(&cat);
do_meowing(&tiger);
do_meowing(&ocelot);
}
Here the main program passes pointers to cat, tiger and ocelot to do_meowing function that
expects a pointer to Felid. Since they are all Felids, the program calls the right meow function for
each felid and the output is:
Meowing like a regular cat! meow!
Meowing like a tiger! MREOWWW!
Meowing like an ocelot! mews!
Subtype polymorphism is also called runtime polymorphism for a good reason. The resolution of
polymorphic function calls happens at runtime through an indirection via the virtual table. Another way of
explaining this is that compiler does not locate the address of the function to be called at compile-time,
www.profmariamichael.com
Page 3
Polymorphism in C++
2012
instead when the program is run, the function is called by dereferencing the right pointer in the virtual
table.
In type theory it's also known as inclusion polymorphism.
Parametric Polymorphism (Compile-Time Polymorphism)
Parametric polymorphism provides a means to execute the same code for any type. In C++ parametric
polymorphism is implemented via templates.
One of the simplest examples is a generic max function that finds maximum of two of its arguments,
#include <iostream>
#include <string>
template <class T>
T max(T a, T b) {
return a > b ? a : b;
}
int main() {
std::cout << ::max(9, 5) << std::endl;
// 9
std::string foo("foo"), bar("bar");
std::cout << ::max(foo, bar) << std::endl; // "foo"
}
Here the max function is polymorphic on type T. Note, however, that it doesn't work on pointer types
because comparing pointers compares the memory locations and not the contents. To get it working for
pointers you'd have to specialize the template for pointer types and that would no longer be parametric
polymorphism but would be ad-hoc polymorphism.
Since parametric polymorphism happens at compile time, it's also called compile-time polymorphism.
Ad-hoc Polymorphism (Overloading)
Ad-hoc polymorphism allows functions with the same name act differently for each type. For example,
given two ints and the + operator, it adds them together. Given two std::strings it concatenates
them together. This is called overloading.
Here is a concrete example that implements function add for ints and strings,
#include <iostream>
#include <string>
www.profmariamichael.com
Page 4
Polymorphism in C++
2012
int add(int a, int b) {
return a + b;
}
std::string add(const char *a, const char *b) {
std::string result(a);
result += b;
return result;
}
int main() {
std::cout << add(5, 9) << std::endl;
std::cout << add("hello ", "world") << std::endl;
}
Ad-hoc polymorphism also appears in C++ if you specialize templates. Returning to the previous example
about max function, here is how you'd write a max for two char *,
template <>
const char *max(const char *a, const char *b) {
return strcmp(a, b) > 0 ? a : b;
}
Now you can call ::max("foo", "bar") to find maximum of strings "foo" and "bar".
Coercion Polymorphism (Casting)
Coercion happens when an object or a primitive is cast into another object type or primitive type. For
example,
float b = 6; // int gets promoted (cast) to float implicitly
int a = 9.99 // float gets demoted to int implicitly
Explicit casting happens when you use C's type-casting expressions, such as (unsigned int
*) or(int) or C++'s static_cast, const_cast, reinterpret_cast, or dynamic_cast.
Coercion also happens if the constructor of a class isn't explicit, for example,
www.profmariamichael.com
Page 5
Polymorphism in C++
2012
#include <iostream>
class A {
int foo;
public:
A(int ffoo) : foo(ffoo) {}
void giggidy() { std::cout << foo << std::endl; }
};
void moo(A a) {
a.giggidy();
}
int main() {
moo(55);
// prints 55
}
If you made the constructor of A explicit, that would no longer be possible. It's always a good idea to
make your constructors explicit to avoid accidental conversions.
Also if a class defines conversion operator for type T, then it can be used anywhere where type T is
expected.
For example,
class CrazyInt {
int v;
public:
CrazyInt(int i) : v(i) {}
operator int() const { return v; } // conversion from CrazyInt to int
};
The CrazyInt defines a conversion operator to type int. Now if we had a function, let's
say,print_int that took int as an argument, we could also pass it an object of type CrazyInt,
#include <iostream>
void print_int(int a) {
std::cout << a << std::endl;
}
www.profmariamichael.com
Page 6
Polymorphism in C++
2012
int main() {
CrazyInt b = 55;
print_int(999);
// prints 999
print_int(b);
// prints 55
}
Subtype polymorphism that I discussed earlier is actually also coercion polymorphism because the
derived class gets converted into base class type.
www.profmariamichael.com
Page 7
Polymorphism in C++
2012
Introduction
Inheritance, encapsulation and polymorphism are undoubtedly the cornerstones of OOP/OOD in
general and C++ in particular.
When programming C, it is very easy to remember how things work. You know that when you add
an int variable to a structure it mostly grows by four bytes. You know that long is either four or eight
bytes long depending on the architecture you’re working with.
Things are less obvious when moving to C++. OOP brings more abstractions to the program. As a
result you are no longer sure if a+b sums two numbers or calls some overloaded operator method
that concatenates contents of two files together.
In this article, I would like to give you a short insight into what’s going on behind the scenes. In
particular we’ll see how the three whales of OOP work in C++.
Things that I am going to show in this article may differ from compiler to compiler. I will talk mostly
about g++ (version 4.2.3). Note however, that same ideas apply everywhere.
Encapsulation
As you know, encapsulation is a principle by which same entity, the object, encapsulates data and
methods that manipulate the data. You may be surprised to find out that underneath, class methods
are just plain functions.
How methods work
In C++ there’s one fundamental difference between plain functions and class methods. Class
methods receive one additional argument and that is the pointer to the object whose data the
method is expected to manipulate. I.e. first argument to a method is pointer to this.
To speed things up, C++ developers used single CPU register (ECX/RCX on x86/x86_64) to pass
pointer to this, instead of passing it via stack as if it was a regular function argument (no longer true
in x86_64).
Otherwise, objects know nothing about methods that operate on them.
www.profmariamichael.com
Page 8
Polymorphism in C++
2012
How overloading works
Another thing that we have to take care of in C++ is how to distinguish
betweensome_function() and some_class::some_function().
Or
betweensome_class::some_function( int ) and some_class::some_function() I.e. what’s the
difference between two methods with the same name that receive different number and type of
arguments? What is the difference between method and function that has same name?
Obviously, out of linker, compiler and preprocessor, linker is the one that should be aware of the
above difference. This is because we may have some_function() in some distant object file. Linker is
the component that should find this distant function and interconnect the call to the function and the
actual function. Linker uses function name as a unique identifier of the function.
To make things work, g++ and any other modern compiler, mangles the name of the
method/function and makes sure that:
1. Mangled method name includes name of the class it belongs to (if it belongs to any class).
2. Mangled method name includes number and type of arguments method receives.
3. Mangled method name includes namespace it belongs to.
With these three, some_class::some_function() and some_function() will have totally different
mangled name. See the following code sample.
01
namespace some_namespace
02
{
03
class some_class
04
{
05
public:
06
some_class() { }
07
void some_method() { }
08
};
www.profmariamichael.com
Page 9
Polymorphism in C++
09
2012
};
10
11
class some_class
12
{
13
public:
14
some_class() { }
15
void some_method() { }
16
};
17
18
void some_method()
19
{
20
21
int a;
}
g++ will turn:



void some_class::some_method() into _ZN10some_class11some_methodEv
void
some_namespace::some_class::some_method() into_ZN14some_namespace10some_class
11some_methodEv
void some_method() into _Z11some_methodv
Adding
integer
argument
to void
from _Z11some_methodv to_Z11some_methodi.
www.profmariamichael.com
some_method() will
turn
it
Page 10
Polymorphism in C++
2012
How mangling solves the problem
So when you create two methods with same name, but with different arguments, compiler turns
them into two functions with different names. Later, when linker links the code together it doesn’t
know that these are two methods of the same class. From linkers standpoint, these are two different
functions.
Structure and size of the object
You probably already know that C++ class and good old C structures are nearly the same thing.
Perhaps the only difference is that all class members are private unless specified otherwise. On the
contrary, all structure members are public.
When looking at the memory layout of the object, it is very similar to C structure.
Differences begin when you add virtual methods. Once you add virtual methods to the class,
compiler will create virtual methods table for the class. Then it will place pointer to the table in the
beginning of each instance of this class.
www.profmariamichael.com
Page 11
Polymorphism in C++
2012
So, bear in mind that once your class has virtual methods, each object of this class will be four or
eight bytes (depends on whether you have 64-bit support or not) bigger.
Actually, pointer to the virtual methods table does not have to be at the beginning of the object. It is
just handy to keep it at the beginning, so g++ and most of the modern compilers do it this way.
Adding virtual methods to the class will also increase amount of RAM your program consumes and
its size on your hard drive.
How inheritance and
polymorphism work
Let’s say we have two classes. A and B. Class B inherits from class A.
01
#include <iostream>
02
03
using namespace std;
04
05
class A
06
{
07
public:
08
A() { a_member = 0; }
09
int a_member;
10
};
www.profmariamichael.com
Page 12
Polymorphism in C++
2012
11
12
class B : public A
13
{
14
public:
15
B() : A() { b_member = 0; };
16
int b_member;
17
};
18
19
int main()
20
{
21
A *a = new B;
22
a->a_member = 10;
23
24
25
return 0;
}
The interesting thing to notice here is that a actually points to instance of class B. When
dereferencing a_member, we’re actually dereferencing a_member that defined in class A, but
belongs to class B (via inheritance). To make this happen, compiler has to make sure that common
part of both classes (a_member in our case) located at the same offset in the object.
www.profmariamichael.com
Page 13
Polymorphism in C++
2012
Now what if we have some virtual methods.
How basic polymorphism works
Let’s change our example a bit and add some virtual methods.
01
#include <iostream>
02
03
using namespace std;
04
05
class A
06
{
07
public:
08
A() { a_member = 0; }
09
virtual int reset() { a_member = 0; }
10
void set_a_member( int a ) { a_member = a; }
11
int get_a_member() { return a_member; }
www.profmariamichael.com
Page 14
Polymorphism in C++
12
protected:
13
14
2012
int a_member;
};
15
16
class B : public A
17
{
18
public:
19
B() : A() { b_member = 0; };
20
virtual int reset() { a_member = b_member = 0; }
21
virtual void some_virtual_method() { }
22
void set_b_member(int b ) { b_member = b; }
23
int get_b_member() { return b_member; }
24
protected:
25
26
int b_member;
};
27
28
int main()
www.profmariamichael.com
Page 15
Polymorphism in C++
29
2012
{
30
B *b = new B;
31
A *a = b;
32
33
b->set_b_member( 20 );
34
b->set_a_member( 10 );
35
36
a->reset();
37
38
cout << b->get_a_member() << " " << b->get_b_member() <<
39
endl;
40
41
42
return 0;
}
If you compile and run this program it will obviously print “0 0″. But how, you may ask. After all we
did a->reset(). Without our understanding of polymorphism we could think that we’re calling method
that belongs to class A.
The reason it works is because when compiler sees code that dereferences pointer to A it expects
certain internal object structure and when it dereferences pointer to B it expects different object
structure. Let us take a look at both of them.
www.profmariamichael.com
Page 16
Polymorphism in C++
2012
However even more important here is the structure of the virtual methods tables of both classes.
It is because of the virtual methods table structure compilers knows what virtual method to call.
When it generates the code that dereferences pointer to A, it expects that first method in the virtual
methods table of the object will be pointer to right reset() routine. It doesn’t really care if the pointer
actually points to B object. It will call first method of the virtual methods table anyway.
How multiple inheritance works
Multiple inheritance makes things much more complicated. The problem is that when class C inherits
from both A and B, we should have both members of class A and class B in the instance of class C.
01
#include <iostream>
02
03
using namespace std;
04
05
class A
www.profmariamichael.com
Page 17
Polymorphism in C++
06
{
07
public:
08
09
A() { a_member = 0; }
protected:
10
11
2012
int a_member;
};
12
13
class B
14
{
15
public:
16
17
B() { b_member = 0; }
protected:
18
19
int b_member;
};
20
21
class C : public A, public B
22
{
www.profmariamichael.com
Page 18
Polymorphism in C++
23
public:
24
25
C() : A(), B() { c_member = 0; }
protected:
26
27
2012
int c_member;
};
28
29
int main()
30
{
31
C c;
32
33
A *a1 = &c;
34
B *b1 = &c;
35
36
A *a2 = reinterpret_cast<A *>( &c );
37
B *b2 = reinterpret_cast<B *>( &c );
38
39
printf( "%p %p %p\n", &c, a1, b1 );
www.profmariamichael.com
Page 19
Polymorphism in C++
40
2012
printf( "%p %p %p\n", &c, a2, b2 );
41
42
43
return 0;
}
Once we cast pointer to class C into class B, we cannot keep the value of the pointer as is because
first fields in the object occupied by fields defined in class A (a_member). Therefore, when we do
casting we have to do a very special kind of casting – the one that changes the actual value of the
pointer.
If you compile and run above code snippet, you will see that all the values are the same except for
b1, which should be 4 bytes bigger than other values.
www.profmariamichael.com
Page 20
Polymorphism in C++
2012
This is what (C style casting in our case) casting does – it increments the value of the pointer to
make sure that it points to the beginning of the, inherited from B, part of the object.
In case you wonder what other types of casting will do, here is a short description.
Difference between different casting types
There are five types of casting in C++.
1.
2.
3.
4.
5.
reinterpret_cast<>()
static_cast<>()
dynamic_cast<>()
const_cast<>()
C style cast.
I guess you know already what const_cast<>() does. Also, it is only a compile time casting. C style
cast is same as static_cast<>(). This leaves us with three types of casting.
1. reinterpret_cast<>()
2. static_cast<>()
3. dynamic_cast<>()
From the above example we learn that reinterpret_cast<>() does nothing to the pointer value and
leaves it as is.
static_cast<>() and dynamic_cast<>() both modify value of the pointer. The difference between two
is that the later relies on RTTI to see if the casting is legal – it looks inside the object to see if it truly
belongs to the type we’re trying to cast from. static_cast<>() on the other hand, simply increments
the value of the pointer.
www.profmariamichael.com
Page 21
Polymorphism in C++
2012
Polymorphism and multiple inheritance
Things getting even more complicated when we have virtual methods in each one of the classes A,
B and C that we already met. Let’s add following virtual methods to the classes.
virtual void set_a( int new_a ) { a_member = new_a; }
To class A.
virtual void set_b( int new_b ) { b_member = new_b; }
To class B and
virtual void set_c( int new_c ) { c_member = new_c; }
To class C.
www.profmariamichael.com
Page 22
Polymorphism in C++
2012
You could have assumed that even in this case class C objects will have only one virtual tables
methods, but this is not true. When you static_cast class C object into class B object, class B object
must have its own virtual tables method. If we want to use same casting method as with regular
objects (that is adding few bytes to the pointer to reach right portion of the object), then we have no
choice but to place another virtual tables method in the middle of the object.
As a result, you can have many different virtual methods tables for the same class. The above
diagram shows very simple case of inheritance and the truth is that it does not get more complicated
than this. Take a look at the following, more complex, class hierarchy.
It may surprise you, but structure of the class X object will be quiet simple. In our previous example
inheritance hierarchy had two branches. This one has three:
1. A-C-F-X
2. D-G-X
3. B-E-H-X
All end up with X of course. They are a little longer than in our previous example, but there is nothing
special about them. The structure of the object will be the following:
www.profmariamichael.com
Page 23
Polymorphism in C++
2012
As a rule of thumb, g++ (and friends) calculates the branches that lead to the target class, class X in
our case. Next it creates a virtual methods table for each branch and places all virtual methods from
all classes in the branch into virtual methods table. This includes pointer to virtual methods of the
class itself.
If we project this rule onto our last example. A-C-F-X branch virtual methods table will include
pointers to virtual methods from classes A, C, F and X. Same with other two branches.
What if we try something even more complicatedBACK TO TOC
The thing is that you can’t. Lets say we try to create even more complicated hierarchy by changing
class D from our previous example to inherit from class C.
www.profmariamichael.com
Page 24
Polymorphism in C++
2012
This will immediately create ambiguous inheritance and the compiler will not hesitate to tell you that
this is what happened. This is because now class X will have all members of classes A and C twice.
Once it will have it via A-C-F-X branch and once via A-C-D-G-X branch. It will not tell you that there’s
a problem immediately. Instead, once you will try to reference one of the members of X inherited
from either A or C, g++ will tell you that it has two variations of the same member/method and that it
does not know which one of them to call.
This what would be g++ output if you try to compile this file.
main.cc: In function 'int main()':
main.cc:110: error: request for member 'set_a' is ambiguous
main.cc:29: error: candidates are: virtual void A::set_a(int)
main.cc:29: error:
virtual void A::set_a(int)
All this because I was trying to do x.set_a( 20 ); in line 110.
Few words about C++ constructors
I guess you know what constructors are good for. In light of what we’ve seen, you may ask yourself,
who is building all those virtual methods tables and who writes right pointer into the object.
www.profmariamichael.com
Page 25
Polymorphism in C++
2012
Obviously compiler builds all the virtual methods tables. And constructor is the one who fills in the
right virtual methods table. And this is another reason why you cannot call constructor directly – you
don’t want to mess up with virtual methods tables.
www.profmariamichael.com
Page 26