

$ gcc ordering.c -o ordering -O2 -lpthread
Berkeley upc memory fence code#
To prevent compiler reordering, we also embed assembly code in the C program. Besides, we define USE_CPU_FENCE as 0, it means that we would not use memory fence.

# if USE_CPU_FENCE asm volatile ( "mfence"::: "memory") Īs you can see in ordering.c, X, Y, r1, r2 are all global variables, and POSIX semaphores are used to co-ordinate the beginning and the end of each loop. Sem_post(&endSema) // Notify transaction complete # if USE_CPU_FENCE asm volatile ( "mfence"::: "memory") // Prevent CPU reordering # else asm volatile ( ""::: "memory") // Prevent compiler reordering # endif Repeat the experiment ad infinitum int detected = 0 įor( int iterations = 1 iterations++) // Random delay //- THE TRANSACTION!. Pthread_create(&thread2, NULL, thread2Func, ( void*)&ID2) Pthread_create(&thread1, NULL, thread1Func, ( void*)&ID1) Spawn the threads pthread_t thread1, thread2 Ordering.c: # include # include # include # include // Set it to 1 to prevent CPU reordering # define USE_CPU_FENCE 0 /* That's why we've written a small sample program to show this type of reordering actually happening. It's all well and good to be told this kind of thing might happen, but there's nothing like seeing it with your own eyes. And processor 2 executes r2 = X before processor 1 executes X = 1 resulting at that both r1 and r2 to be 0. The table below shows an extreme circumstance:īoth processor 1 and processor 2 reorder the memory operations. For instance, processor 1 can executes operation r1 = Y first, then executes X = 1. One way to understand this is that Intel x86/64 processors, like most out-of-order processor families, are allowed to reorder the memory operations according to certain rules. The specification says it's legal for both r1 and r2 to equal 0 at the end of this example - a counter-intuitive result, to say at least. But according to Intel's specification, that won't necessarily be the case. Now, no matter which processor writes 1 to memory(X or Y) first, it's natural to expect the other processor to read that value back, which means we should end up with either r1 = 1, r2 = 1, or perhaps both. Each processor stores 1 to X and Y respectively, then processor 1 assigns the value of integer Y to integer r1, and processor 2 assigns the value of X to integer r2. It's really the best way to illustrate CPU ordering. Two processors, running in parallel, execute the following memory operations: Suppose you have four integers r1, r2, X and Y somewhere in memory, both initially 0. Intel lists several such surprises in Volume 3, §8.2.3 of their x86/64 Architecture Specification. When writing lock-free code in C or C++, one must often take special care to enforce correct memory ordering. Unpredictable behavior without memory fence Then, we'd like to introduce an example that would cause mistake without memory fence. This reordering of memory operations(loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behavior in concurrent programs unless carefully controlled. Memory fences are necessary because most modern CPUs employ performance optimizations that can result in out-of-order execution. This typically means that operations issued prior to the fence are guaranteed to performed before operations issued after the fence. Memory fence is a type of barrier instruction that causes a CPU or compiler to enforce ordering constraint on memory operations issued before and after the memory fence instruction.
