TLS
Thread Local Storage is one of the
better ways to work in a parallelized system: non - parallel.
The TLS is a private copy that every thread has. This copy is
saved in the thread's Stack just like local variables in a function, and since
it is on the thread's private Stack then other threads do not use it. All
threads share the same address space of the parent process and so all the Stacks
that belong to these threads are visible by all other threads. The Stack is a
buffer in memory. The special thing about the Stack is that it is managed
automatically by the CPU.
Calling a function pushes the return address to the Stack.
Local variables and some function parameters are also pushed to the Stack. When
the code does not need the data it is released from the Stack, an action called
pop.
See this possible Stack frame:
| Address |
Meaning |
| 1024 |
Start of Stack |
| 1020 |
Second data |
| 1016 |
Third data |
| ....... |
........ |
| 512 |
Return Address Func1 |
| 508 |
Parameter 1 (int) |
| 504 |
Parameter 2 (double) |
| 496 |
Local Variable (int) |
| 492 |
Return Address Func2 |
| 488 |
Return Address Func3 |
| 484 |
End of Stack |
The Stack has a pointer that starts with the highest address
and counts down. When it goes beyond offset zero we get a Stack Overflow. The
table above displays a state of the thread when it is currently running the code
in function Func3 that was called from Func2 that was called from Func1. Here is
the equivalent code:
int Func1 ( int param1, double param2 )
{
int localVar = 5;
Func2( );
return 0;
}
int Func2 ( )
{
Func3();
return 0;
}
int Func3 ( )
{
return 0;
}
The thread is now executing the code in function Func3
and even so it can still see the data that is stored in the variables that are
local to function Func1 and the parameters sent to Func1. In other
words the thread can access data that is berried deep down in the Stack no
matter where it is.
Now think about the following function as the main function
of all the threads in my application (this code is C++):
void* ThreadMainProc ( void* pThParam)
{
MyStruct* pMyStruct = new MyStruct;
// using malloc() in C
Func1();
delete pMyStruct; pMyStruct = NULL;
// using free() in C
return 0;
}
The code above is the main thread function. It allocates an
object in memory and stores the pointer to this object in a local variable. At
this point we start the execution flow by calling the function Func1.
Only when the main thread function ThreadMainProc returns then the local
variable is pop out of the stack and until then it remains in the stack. This
means that every function in the code can see this pointer and access the object
in memory.
If I have a few threads starting on this main function above
then all these threads will have a copy of this object in memory, and every
thread will have its own copy of the object.
The offset of the data pMyStruct in the Stack is the
same for all these threads and so if we access this data by its offset in the
Stack then we know that we are accessing a copy that is private to the thread
that is currently running. See the following explanation.
Here is the Stack image:
| Address |
Meaning |
| 1024 |
Start of Stack |
| 1020 |
Return address from ThreadMainProc
|
| 1016 |
Parameter pThParam |
| 1012 |
pMyStruct local variable |
| 1008 |
Return Address Func1 |
| 1004 |
Parameter 1 (int) |
| 996 |
Parameter 2 (double) |
| 992 |
Local Variable (int) |
| 988 |
Return Address Func2 |
| 984 |
Return Address Func3 |
| 980 |
End of Stack |
Suppose that my code could do this:
int Func3 ( )
{
MyStruct* pMyStruct = (MyStruct*) STACK[ 1012 ];
.... do things with pMyStruct ...
return 0;
}
Now every thread has its own copy of the object and we don't
need to look for that copy we just use it.
This is what the operating system (or library) is doing for
us.
We have an API to ask the system to save a position in the
stack and we receive the offset of the new data.
We have an API to write and read data to this stack location.
We cannot store the pointer to the stack location because every thread has its
own Stack and every Stack starts at a different address. We do however use the
offset in the stack and we save it.
The allocation of the Stack location is performed once and
then every thread calls an API to work with its copy of the object.
No need to deallocate a stack location but we should however
deallocate the object that the TLS pointer is pointing to.
The TLS API usually allocates the size of a pointer because
TLS storage is a very limited resource and is shared between all threads in the
application and all libraries, dlls, ActiveX controls, modules, system hooks,
etc.
A good example for using TLS is the Standard Output (
puts(), printf(), cout, etc.). When we print from several
threads in sequences the outputs overlap. Try this:
int Func ( )
{
puts("Here is the output:\n");
printf("Data: %s\n", str_whatever);
// C++
cout << "Here is the output:" << endl
<< "Data: "
<< str_whatever << endl;
return 0;
}
Every function call can be raced by another thread (C++: and
every operator is a function call). Actually the code below can have every
character interrupted by an input from different thread using the same mechnism.
The Standard Output is a buffered I/O which means that there
is one output stream for the application and one output buffer to write on. This
buffer is shared by all threads. If every thread could have its own copy of the
stream and buffer then we could lock the access to the screen but work on a
private copy until printing to it, for example until there is a new line or by
explicitly calling a Flush function.
TLS is a preferred mechanism for both parallelism and clean
code design.