Cache Coherency and Volatile Data

Before going into details about how to change the code to withstand the effects coming out of cache coherence, lets look at briefly what this cache coherence means.

Cache coherency is the consistent nature of the shared data among multi cores. Cache coherency is a good thing. We need cache coherency in multi-core environments. But if we are not aware of the issues while programming, unexpected behaviors might occur during the run time. This is an issue that occurs in multi-core environments due to the local caches that the each of the cores are having. Lack of cache coherency would cause each of the cores to read different values from their local caches. So we need coherency among the local caches but we need to be aware what might go wrong.

Lets consider the above diagram. Consider a situation with 2 cores. Lets assume the same program is running two threads T1 and and T2 distributed among the cores as Processor P1 runs T1 and P2 runs T2.

So the memory location X holds a variable that is shared among the threads. So if one thread updates the location X, other thread should be able to see it immediately. Otherwise the results being produced may contain faults.

So lets consider an example where cache coherency is not present.

Time	The event occurring (Processors reading and writing)	In L1 of P1 (Local Memory)	L1 of P2 (Local Memory)	L3 (Shared Memory)
0				X=10
1	Processor P1 reads X=10	X=10
2	Processor P2 reads X=10	X=10	X=10
3	Processor P1 writes X=20	X=20	X=10	X=20 (Assume write through cache)
4	Processor P2 reads X=10	X=20	X=10 (Reading the wrong value)	X=20

So first the X=10 in the shared memory (L3 in this case. Also can be the main memory). Then at time unit 1, P1 reads the value X=10 from the shared memory to its local memory. At time unit 2, so does the P2. But at time unit 3, P1 writes to its local memory as X=20. But the values in P1's local cache and L3 shared memory still indicate X=10. So the values are not fresh in those two memory locations. At time unit 4, if P2 reads X from its local data, it reads X=10, which is not the newest value. So the calculations might be wrong from P2.

There are several cache coherence protocols developed in order to maintain the cache coherency between those local caches in multi-core environments. Some of those coherency protocols are ,

Snooping Protocol
Directory Based Protocol

I will talk about those protocols in a future post. But for the moment, what those protocols do is, whenever a new value is written into a memory location like X in the local cache, it makes sure that other processors also see that updated variable when they are of need (those protocols involve broadcasting invalidate bits, broadcasting data or storing reference directories in the outer-most cache and etc.) So in the end, expectation is to provide the newest data value of the particular memory location for other processors having the same memory location in their local caches when needed.

As mentioned above, the memory locations in cache might get changed without the intention of the running program because maintaining coherency is not part of the application but the hardware itself. This may become an issue during the compile time if the compiler becomes too smart and does certain optimizations to the code. During the compile time, compiler has no way of knowing such issues.

Lets consider the following example,

int X=10;

while( X==10 ){

/* Do something */

}

In the above code, the compiler may decide that X would not change during the run time. So the compiler might do an optimization to the code to omit X=10 considering it will not change. Because of this optimization, the system does not need to fetch X each iteration in the while loop

So the updated code would look something like this.

int X=10;

while( true ){

/* Do something */

}

So in the above code X=10 can be changed due to some other interruption as well. But if we consider the cache coherency situation, X's value might get changed just because the value of X in the other core's local cache has changed already. This is not done by the program but the hardware protocol.

So in order to avoid such issue, we need to instruct the compiler using a special keyword, not to optimize that variable in the code.

So that keyword is "volatile". So in above example we can use,

volatile int X=10;

Here are some important notes on using volatile.

If we apply volatile to a composite data type, the all the members of that composite element would become volatile.

If we apply to a struct, all the data members in the struct would become volatile.
Also if we apply volatile to a struct member, the whole struct would become volatile.

Instances where volatile keyword would have no effect.

If the data type is too long such that it cannot be fetched using a single instruction.

eg:- If applied to a struct, if the struct is longer than the maximum data length that can be fetched using a single instruction, depending on the architecture of the host machine, the struct would not be volatile.

The variables named as volatile would be fetched from the memory each time they are needed unless stored in a register. It would not be subject to certain optimizations by the compiler. So using volatile makes the program run a little slower.

So, those are the basics we need to know about using volatile keyword in our programs.

Comments

deepakJuly 1, 2019 at 1:12 AM
I am not sure about 'The variables named as volatile would be fetched from the memory each time they are needed unless stored in a register'. If hardware is taking care of cache coherency, does specifying volatile actually forces variable to be loaded from memory. I think specifying variable volatile only avoids optimization at compiler level, it has nothing to do with memory or caches.
gjbjgMarch 24, 2021 at 12:21 PM
Absolutely correct Deepak. Even i keep telling people about this. Cache coherency is hardware problem. It is invisible to even OS.

The Frozen Lake

Search This Blog

Cache Coherency and Volatile Data

Comments

Post a Comment