Behind The Scenes of Dynamic Polymorphism

Polymorphism is a very useful concept that makes the software design flexible. So this is a concept that is frequently used by object oriented programming languages. But Have you ever wondered how does the runtime environment really handles all that elegance behind the curtains. So in this article, the focus is on the data structures and mechanisms that are used to handle dynamic polymorphism (through inheritance).
The examples are described using c++ language, so that the concepts can be explained more clearly.

As you may already know this is handled using virtual keyword. In c++ base classes' virtual functions can be overridden by sub classes that inherits the base class.
The virtual table is the data structure responsible for this task.

Virtual table of vtable is a data structure  that consists of function pointers. It points to the memory addresses of functions that are virtual.
In c++ once a function is declared as a virtual keyword, in all the subclasses that function is considered as a virtual function.
Another important point is that the compiler generates vtables only if the class contains at least one  virtual function in that class or any of its super classes.
The vtables are generated per class not per object. All objects of that particular class will share the same vtable.

So the next question would be how does the object of a particular class finds a vtable. That is handled by the compiler. In such classes compiler adds a hidden member to the class called vptr (v_pointer) so that it is just a pointer to the corresponding vtable. Since it is just a data member it is not first initialized to point the vtable. So in order to initialize, compiler adds hidden code inside the constructor of the class which ensures the vptr is properly initialized. Normally this vptr is added as the first member or the last member of the class depending on the compiler.

So as mentioned above, vtable is just a table with function pointers. vtable resides in the bss area in the memory which is usually used to allocate uninitialized static variables.

Each class has its own vtable (If virtual methods exists in the class hierarchy). So lets see how this vtable is populated.

class Animal
{
   public:
             virtual void move(){ printf("Animal moving"); }
             virtual void sleep(){ printf("Animal sleeping"); }
             void eat(){ printf{"Animal eating"} }       
};
class Dog: public Animal
{
   public:
             void move(){ printf("Dog moving"); }
             virtual void bark(){ printf("Dog barking"); }
             void eat(){ printf("Dog eating"); }
};
int main()
{
   Animal *a=new Dog();
   a->move(); // prints Dog moving
   a->eat(); // prints Animal eating
   a->sleep(); // prints Animal sleeping
   
   return 0;
}
 So in the above code, the vtables are as follows.

Note that the non-virtual functions are not included in the vtable and the base classes method is called ( eat()).
Also note that sleep() method is called in the base classes method.
Also note that the move() method is taken as virtual in the Dog class as well.

So the vtable that is referenced at the runtime is the one that corresponds to the containing object's class.
In above case it is 'Dog' classes vtable. Since the move() is a virtual function, it is considered as virtual even when creating the Dog class. So in the vtable of the Dog class there is an entry for move() function. But it points to the move() function in the Dog class. 

The entries in the vtable is made for :
  • virtual functions in the base classes
  • virtual functions in the current class
 So in that case the virtual functions that reside :
  • In base class only (sleep())
  • In both base and derived classes (move())
  • In derived class only (bark())                      
are getting entries in the corresponding vtable of the derived class.

So in the vtable for the Derived class, corresponding function pointers of the virtual functions are pointed as follows.
If the corresponding virtual function to the vtable entry resides :
  • In base class only (sleep())-> vtable entry points to the base classes function
  • In both base and derived classes (move()) -> vtable entry points to the derived classes function  
  • In derived class only (bark()) -> vtable entry points to the derived classes function
So in the runtime when a method is checked whether it is virtual, and if virtual, via the vptr, the vtable is accessed and then it accesses the corresponding vtable entry. So the function that is pointed by that entry is executed.

The concept of abstract classes also goes with the same concept.

In c++ when at least one function is pure virtual, then the whole class becomes an abstract class and no object is allowed to be made with that class. So in a derived class those pure virtual methods must be overridden so that objects can be made through derived classes.

Pure virtual methods are as follows

virtual void move() = 0;

So the functions is initialized to point to null.('0' address)

So since this is a virtual function, an entry in the virtual table is allocated. But it points to null. So the virtual table has a null pointer which makes the vtable to be incomplete. So objects are not allowed to be made with incomplete vtables.

Comments