Multi-threading in C (Posix Thread)
What is a Thread?
The concept of a "procedure" that runs independently from its main program may best describe a thread.
To go one step further, imagine a main program (a.out) that contains a number of procedures. Then imagine all of these procedures being able to be scheduled to run simultaneously and/or independently by the operating system. That would describe a "multi-threaded" program.
How is this accomplished?
Before understanding a thread, one first needs to understand a UNIX process. A process is created by the operating system, and requires a fair amount of "overhead". Processes contain information about program resources and program execution state, including:
Process ID, process group ID, user ID, and group ID
Environment
Working directory.
Program instructions
Registers
Stack
Heap
File descriptors
Signal actions
Shared libraries
Inter-process communication tools (such as message queues, pipes, semaphores, or shared memory).
Threads use and exist within these process resources, yet are able to be scheduled by the operating system and run as independent entities largely because they duplicate only the bare essential resources that enable them to exist as executable code.
This independent flow of control is accomplished because a thread maintains its own:
Stack pointer
Registers
Scheduling properties (such as policy or priority)
Set of pending and blocked signals
Thread specific data.
So, in summary, in the UNIX environment a thread:
Exists within a process and uses the process resources
Has its own independent flow of control as long as its parent process exists and the OS supports it
Duplicates only the essential resources it needs to be independently scheduled
May share the process resources with other threads that act equally independently (and dependently)
Dies if the parent process dies - or something similar
Is "lightweight" because most of the overhead has already been accomplished through the creation of its process.
Because threads within the same process share resources:
Changes made by one thread to shared system resources (such as closing a file) will be seen by all other threads.
Two pointers having the same value point to the same data.
Reading and writing to the same memory locations is possible, and therefore requires explicit synchronization by the programmer.
Shared Memory Model
All threads have access to the same global, shared memory
Threads also have their own private data
Programmers are responsible for synchronizing access (protecting) globally shared data.
Thread-safeness
In a nutshell, refers an application's ability to execute multiple threads simultaneously without "clobbering" shared data or creating "race" conditions.
For example, suppose that your application creates several threads, each of which makes a call to the same library routine:
This library routine accesses/modifies a global structure or location in memory.
As each thread calls this routine it is possible that they may try to modify this global structure/memory location at the same time.
If the routine does not employ some sort of synchronization constructs to prevent data corruption, then it is not thread-safe.
The implication to users of external library routines is that if you aren't 100% certain the routine is thread-safe, then you take your chances with problems that could arise.
Recommendation: Be careful if your application uses libraries or other objects that don't explicitly guarantee thread-safeness. When in doubt, assume that they are not thread-safe until proven otherwise. This can be done by "serializing" the calls to the uncertain routine, etc.
POSIX thread (pthread)
The POSIX thread libraries are a standards based thread API for C/C++. It allows one to spawn a new concurrent process flow. It is most effective on multi-processor or multi-core systems where the process flow can be scheduled to run on another processor thus gaining speed through parallel or distributed processing. Threads require less overhead than "forking" or spawning a new process because the system does not initialize a new system virtual memory space and environment for the process.
Thread Creation and Termination:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *print_message_function( void *ptr );
main()
{
pthread_t thread1, thread2;
char *message1 = "Thread 1";
char *message2 = "Thread 2";
int iret1, iret2;
/* Create independent threads each of which will execute function */
iret1 = pthread_create( &thread1, NULL, print_message_function, (void*) message1);
iret2 = pthread_create( &thread2, NULL, print_message_function, (void*) message2);
/* Wait till threads are complete before main continues. Unless we */
/* wait we run the risk of executing an exit which will terminate */
/* the process and all threads before the threads have completed. */
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
printf("Thread 1 returns: %d\n",iret1);
printf("Thread 2 returns: %d\n",iret2);
exit(0);
}
void *print_message_function( void *ptr )
{
char *message;
message = (char *) ptr;
printf("%s \n", message);
}
Compile:
C compiler: cc -lpthread pthread1.c
orC++ compiler: g++ -lpthread pthread1.c
Run: ./a.out
Results:
Thread 1
Thread 2
Thread 1 returns: 0
Thread 2 returns: 0
Details:
In this example the same function is used in each thread. The arguments are different. The functions need not be the same.
Threads terminate by explicitly calling pthread_exit, by letting the function return, or by a call to the function exit which will terminate the process including any threads.
Function call: pthread_create
int pthread_create(pthread_t * thread,
const pthread_attr_t * attr,
void * (*start_routine)(void *),
void *arg);
Arguments:
thread - returns the thread id. (unsigned long int defined in bits/pthreadtypes.h)
attr - Set to NULL if default thread attributes are used. (else define members of the struct pthread_attr_t defined in bits/pthreadtypes.h) Attributes include:
detached state (joinable? Default: PTHREAD_CREATE_JOINABLE. Other option: PTHREAD_CREATE_DETACHED)
scheduling policy (real-time? PTHREAD_INHERIT_SCHED,PTHREAD_EXPLICIT_SCHED,SCHED_OTHER)
scheduling parameter
inheritsched attribute (Default: PTHREAD_EXPLICIT_SCHED Inherit from parent thread: PTHREAD_INHERIT_SCHED)
scope (Kernel threads: PTHREAD_SCOPE_SYSTEM User threads: PTHREAD_SCOPE_PROCESS Pick one or the other not both.)
guard size
stack address (See unistd.h and bits/posix_opt.h _POSIX_THREAD_ATTR_STACKADDR)
stack size (default minimum PTHREAD_STACK_SIZE set in pthread.h),
void * (*start_routine) - pointer to the function to be threaded. Function has a single argument: pointer to void.
*arg - pointer to argument of function. To pass multiple arguments, send a pointer to a structure.
Function call: pthread_exit
void pthread_exit(void *retval);
Arguments:
retval - Return value of thread.
This routine kills the thread. The pthread_exit function never returns. If the thread is not detached, the thread id and return value may be examined from another thread by using pthread_join.
Note: the return pointer *retval, must not be of local scope otherwise it would cease to exist once the thread terminates.
References:
https://www.geeksforgeeks.org/multithreading-c-2/
https://computing.llnl.gov/tutorials/pthreads/
https://www.cs.cmu.edu/afs/cs/academic/class/15492-f07/www/pthreads.html