CPSC405 Fall 2011 Project 3

A User-Level Thread Package

Due Friday 5 October

This project and writeup are from Professor Mike Dahlin, The University of Texas at Austin. The link to the original assignment is http://www.cs.utexas.edu/users/dahlin/Classes/UGOS/labs/labULT/proj-ULT.html

1. Threads

1.1 Quick Overview

A key abstraction in operating systems is threads and processes for concurrency. To gain a deeper understanding of how these abstractions are constructed, this project asks you to build the core of a user-level threads package. Building kernel-level processes and threads would not be much different, but we’ll do this project at user level since installing new kernels on the instructional machines would be problematic.

Threads provide the illusion that different parts of your program are executing concurrently. Through the years, a model of executing multithreaded programs has emerged as a defacto standard. In this model, threads share the code, heap, and the runtime system. Each thread, however, has a separate stack and, naturally, a separate set of CPU registers. This programming model also provides synchronization primitives so that different threads can coordinate access to shared resources.

1.2 Project Scope

You will construct a library of functions that define a user-level threads package. Using your library, a program can create threads, destroy them, and allow the threads to control the scheduling underneath. Thus, a typical program using your library will typically look like:

main(int argc, char ** argv)
{
   // Some initialization
   // Create threads
   // wait for threads to finish
   // exit
}

// "Main" procedure for thread i
root_i (...)
{
   // do some work
   // yield
   // repeat as necessary
   // return (implicit thread destruction)
}

where “root_i” is a “root function” that the ith thread calls to start executing.

When a thread is created, it is “assigned” a “root function” from which it starts executing. The thread within that function can perform useful work, then yield (voluntarily or involuntarily) the CPU to another thread, and repeat this sequence as necessary. The thread of course can call other functions from within its “root” function. A thread can destroy itself explicitly or implicitly. Explicit destruction is done by calling the thread library. The thread is implicitly destroyed when its root function returns. Additionally, to add more control to the program, a thread may destroy other threads as well.

1.3 User-level v. kernel threads

For practical reasons, this project is done at user level: you will construct a set of functions that your program will call to provide the illusion of concurrency. Modern operating systems provide kernel threads where similar functions are provided as system calls to the kernel rather than function calls to a library. Both types of threads use the same core techniques for providing the concurrency abstraction; you would build kernel threads in essentially the same way you build user threads in this project. Also, kernel processes are built using these techniques.

There are a few differences between kernel and user threads. Mainly:

Multiprocessing. Whereas user level threads provide the illusion of concurrency, on machines with multiple processors, kernel level threads can provide actual concurrency. This is because for user level threads, the kernel schedules the user process on one CPU and the user-level threads package multiplexes threads of control within the process. For kernel threads, the kernel is aware of different threads of control, and it can simultaneously schedule different threads from the same process on different processors.

Note: A key simplifying assumption for this project is that you will allow programs to multiplex some number (e.g., m) of user level threads on one kernel thread. This means that at most one user level thread is running at a time and that your runtime system has complete control over the interleaving of user level threads with each other. More sophisticated systems implement m on n threads packages where m user level threads are multiplexed across n kernel threads.

Asynchronous I/O. When a user-level thread makes a system call that blocks (e.g., reading a file from disk), the kernel scheduler moves the process to the Waiting state and will not schedule it until the I/O has completed. Thus, even if there are other user-level threads within that process, they have to wait, too. Conversely, when a kernel thread blocks for a system call, the kernel scheduler is aware that other threads in the same process may be ready to run. Thus, some kernel threads may be running while others are waiting for I/O.

Timer interrupts. In this project, to simulate the timer interrupts that cause the scheduler to switch from one thread or process to another, we will use POSIX signals. In your implementation, the threads library will will “turn off interrupts” by blocking delivery of these signals using system calls. However, there is nothing to prevent the threads, themselves, from “turning off interrupts” the same way. Thus, even though we will implement “preemptive” threads, a “malicious” thread could turn off interrupts and not be preempted until it calls Yield(), thus hogging the CPU. Note that kernel schedulers don’t have this problem. Only the privileged code in the kernel can turn off the real timer interrupts.

1.4 Provided code

We provide code to help you get started in the git repository: git://github.com/zacharski/labULT.git

2. Thread context

Each thread has per-thread state that represents the working state of the thread—the thread’s program counter, local variables, stack, etc. A thread context is a subset of this state that must be saved/restored from the processer when switching threads. (To avoid copying the entire stack, the thread context includes a pointer to the stack, not the entire stack.) Your library will store the thread context in a thread control block, a structure that holds the state your library keeps track of for each thread.

2.1 Saving/Restoring Thread Context

When a thread yields the CPU, the threads library must save the thread’s current context. The current context contains all the registers that belong to the thread. The library restores the saved context later when the thread gets its turn to run on the processor. Additionally, the library creates a fresh context and a new stack when it creates a thread.

Fortunately, the C runtime system allows an application to retrieve its current context and store it at a memory location and to set its current context to a predetermined value from a memory location. Your library will make use of two existing library calls: getcontext and setcontext.

Study the manual pages for these two calls. Notice that getcontext() saves the current context into a structure of type struct ucontext of type ucontext_t. So, if you allocate a struct ucontext in memory and pass a pointer to that memory to a call to getcontext(), the current registers and other context will be stored to that memory. Later, you can call setcontext() to copy that state from that memory the processor, restoring the saved state.

Look in sys/ucontext.h to find the details for the fields of a struct ucontext. On Ubuntu linux machines, this file is located in /usr/include/sys/ucontext.h.

Finish implementing parseUcontext.c. Make sure you understand how a thread’s context is saved.

You will use getcontext and setcontext in two ways. First, to suspend a currently running thread (to run another one), you will use getcontext to save its state and later use setcontext to restore its state. Second, to create a new thread, you will use getcontext to create a valid context but you will leave the current thread running; you (the current thread, actually) will then change a few registers in this valid context to initialize it as new thread, and put this new thread onto the ready list; finally, at some point the new thread will be run by calling setcontext on this new thread’s context.

2.2 Changing Thread Context

As noted above, just creating identical thread contexts isn’t quite enough. To create a new thread, you can’t just make a copy of the current thread’s context. You need to make a copy and then change 3 things:

1) You need to change the program counter to point to the function that the thread should run.

2) You need to allocate and inititalize a new stack.

3) You need to change the stack pointer to point to the top of the new stack.

In the real world, you would take advantage of an existing library function, makecontext(), to make the first and third changes. In the real world, the advantage of using the function is that it abstracts away the details of how a context is saved in memory, which simplifies things and helps portability. In this class, the disadvantage is that it abstracts away the details of how a context is saved in memory, which might leave you vague on exactly what’s going on.

In the spirit of “there is no magic”, for this project you may not use makecontext(). Instead, you must manipulate the fields in the saved ucontext_t directly.

You will change the program counter to point to a stub function that the thread should run.

You will use malloc() to allocate a new stack, and you will initialize the new stack to include arguments to the stub function.

You will change the stack pointer to point to the top of the new stack. (Warning: in x86, stacks grow down!)

What is the stub function? How does the stack work? Read on.

2.3 Stub function:

When you create a new thread, you want it to run some “root” function that defines the work you want it to do. A thread is destroyed implicitly when it returns from its “root” function, much like the main program thread is destroyed when it returns from the “main” program. So, rather than have your thread begin by running the root function directly, a simple implementation of this feature is to start the thread initially in a “stub” function that calls the actual root function of the thread (much like main is actually being called from the crt0 stub in UNIX). Then, your root thread has somewhere to return to, should it return. This arrangement would look like:

stub(void (*root)(void *), void *arg)
{
    // thread starts here
    Tid ret;
    root(arg); // call root function
    ret = ULT_DestroyThread(ULT_SELF)
    assert(ret == ULT_NONE); // we should only get here if we are the last thread.
    exit(0); // all threads are done, so process should exit
}

In the above code, the argument root is a pointer to the root function that describes the real work the thread should do; notice that in C, a function’s name refers to the address of its code in memory. arg is the argument to pass to that function; we’ll have the root function take a pointer to an arbitrary type as an argument so that you can pass the root function pointer to whatever you want. ULT_DestroyThread, ULT_SELF, and ULT_NONE are defined below.

2.4 On contexts and calling conventions

The context variable contains many data fields, but you need only deal with two of them: the stack pointer and the program counter. Other than that, you don’t need to worry about the fields within the context variable, as long as you do not tamper with them. Also, it is a good idea to use variables that have been initialized through a getcontext call in order to not have bizarre behavior.

Under the C calling conventions in x86, here’s what things look like while any given function is executing:

Image from unixwiz.net

Notice that as a procedure executes, it can allocate local space by moving the stack pointer, and it can find local variables, parameters, return addresses, and the return %ebp by indexing relative to the (fixed) %ebp register.

To make a procedure call, the compiler pushes parameters onto the stack from right to left, saves the current instruction pointer (eip) onto the stack, and changes the instruction pointer to the called procedure. The called procedure then saves the old frame pointer (ebp) onto the stack and sets the new frame pointer to point to the old frame pointer at the top of the stack. Now the called function is free to shove other stuff onto the stack (e.g., local variables, register spills, saved registers, the frames of called procedures) because it can still locate stuff (especially return information) relative to the fixed frame pointer (ebp).

To return to the caller, a procedure simply copies the frame pointer (ebp) to the stack pointer (esp), pops the top stack item into ebp to restore the old ebp, and uses the ret instruction to pop the old instruction pointer off the stack into the processor’s instruction register (eip), returning control to the calling function.

3. Cooperative Thread Package Application Program Interface (API)

In this project you will build a user level threads package. A key simplifying assumption (relaxed in the extra credit portion below) is that threads are cooperative—each thread runs until it explicitly releases the CPU to another thread by yielding the thread or exiting. In contrast preemptive threading systems allow a scheduler to interrupt a running thread at any time and switch the CPU to running a different thread.

The thread package provides several functions calls to allow application programs a degree of control over thread management. In addition, there are a few conventions that application programs must adhere to in order to ensure proper and safe operation. A list of the functions that constitute the User-level threads (ULT) API can be found in ULT.h and are summarized here:

Tid ULT_Yield(Tid tid): This function suspends the caller and activates the thread given by the identifier tid. The caller is put on the ready queue and can be invoked later in a similar fashion. The value of tid may take the identifier of any available thread. It also can take any of the following constants:

The constant ULT_ANY tells the thread system to invoke any thread on the ready queue. A sane policy is to run the thread at the head of the ready queue.

The constant ULT_SELF tells the thread system to continue the execution of the caller. This turns the function call into an no-op, but it may be useful for debugging purposes.

The function returns the identifier of the thread that took control as a result of the function call. Note that the caller does not get to see the result until it gets its turn to run (later). The function also may fail and the caller resumes immediately. To indicate the reason for failure, the call returns one of these constants:

The constant ULT_INVALID alerts the caller that the identifier tid does not correspond to a valid thread.

The constant ULT_NONE alerts the caller that there are no more threads (other than the caller) available to run (in response to a call with tid set to ULT_SELF) or available to destroy (in response to a call wtih tid set to ULT_ANY.)

Tid ULT_CreateThread(void (*fn)(void *), void *arg): This function creates a thread whose starting point is the function fn. . Arg is a pointer that will be passed to the function when the thread starts executing; arg thus allows arguments to be passed to the function. Upon success, the function returns a thread identifier of type Tid. If the function fails, it returns a value that indicates the reason of failure as follows:

The constant ULT_NOMORE alerts the caller that the thread package cannot create more threads.

The constant ULT_NOMEMORY alerts the caller that the thread package could not allocate memory to create a stack of the desired size.

The created thread is put on a ready queue but does not start execution yet. The caller of the function continues to execute after the function returns.

Tid ULT_DestroyThread(Tid tid): This function destroys the thread whose identifier is tid. The caller continues to execute and receives the result of the call. The value of tid may take the identifier of any available thread. It also can take any of the following constants:

The constant ULT_ANY tells the thread system to destroy any thread except the caller. While this sounds too draconian, this function can help in dealing with drastic situations where a thread detects a fatal error that cannot be handled and the only recourse is to stop the program.

The constant ULT_SELF tells the thread system to destroy the caller. In this case, the system destroys the caller and reclaims its resources. The function obviously does not return in this case. Instead, some other ready thread gets scheduled.

Be careful. It is dangerous to use memory once it has been freed. In particular, you should not free the stack of the currently running thread while it is still running. (Yet you still need to make sure that that stack eventually gets deallocated. Think about this a bit!) You should convince yourself that your program would work even if you used a debugging malloc library that overwrites a block with dummy data when that block is free()’d.

Be careful. If you destroy a thread that is holding a lock, deadlock may occur. For this reason, it is usually best to only use DestroyThread to have each thread destroy itself (not other threads.) In fact, the Java library deprecates the equivalent function in their thread library. Note that Java’s alternative makes use of the richer exception model that Java has compared to C. We include the more general form here partly as a design exercise and because you may need it for some of your tests.

Upon success, the function returns the identifier of the destroyed thread. The function also may fail and returns one of these constants:

The constant ULT_INVALID alerts the caller that the identifier tid does not correspond to a valid thread.

The constant ULT_NONE alerts the caller that there are no more other threads available to destroy (in response to a call with tid set to ULT_ANY).

3.1 Structure of and requirements on your solution:

Your library must maintain a “thread control block” for each thread that is running in the system. This is similar to the process control block that an operating system implements to support process management. In addition, your library must maintain a queue of the threads ready to run, so that it can process the application program function calls. Note that the application programs do not explicitly call a function to initialize the library data structures. Therefore, your library must always ensure that any necessary initialization is done during the first time a function is called within the library.

As noted above, your library must use getcontext() and setcontext() to save and restore thread context state, but it may not use makecontext or any other existing code to manipulate a thread’s context; you need to write the code to do that yourself.

Each thread should have a stack of at least ULT_MIN_STACK and your implementation should support the creation exactly ULT_MAX_THREADS threads by a program (including the initial main thread). Your implementation must not statically allocate all stacks at initialization time. Instead, you must dynamically allocate a stack whenever a new thread is created (and delete one each time a thread is destroyed.)

3.2 Hints, leading questions, and advice

This project does not require to write a large number of lines of code. It does require you to think carefully about the code you write. Before you dive into writing code, it will pay to spend time planning and understanding the code you are going to write. If you think the problem through from beginning to end, this project will not be too hard. If you try to hack your way out of trouble, you will spend many frustrating nights in the lab.

As a start, here are some questions you should answer before you write code.

Getcontext “returns” twice. Once when you create a context and again when you switch to that context. What action will you take in each case? How will you tell which case you are in?

Most threads are created with ULT_CreateThread(), but the initial thread is there before your library is invoked. Nonetheless, the original thread must be able to ULT_Yield() to let other threads run, and other threads must be able to ULT_Yield() and let the original thread run.

In fact, a strongly recommended first milestone might be for ULT_Yield(ULT_SELF) to work for the initial thread (where your implementation stores the caller’s state to a thread control block in the ready queue and then resores that state from the thread control block. Get this working before you try to implement ULT_CreateThread() or ULT_DeleteThread().

Also note that when the initial thread in a C process returns, it calls the exit() system call, which causes the OS to destroy the process (even if you have other user level threads in the process that want to run.)

A hard bug to find would be an overflow or underflow of the stack you allocate. How might such a bug manifest itself? What defensive programming strategies can you use to detect stack overflow in a more controlled manner as the system runs?

Use a debugger. As an exercise, put a breakpoint at the instruction after you copy the current thread’s state using getcontext(). You can print the current values of the registers (in gdb, type “info registers”).

(gdb) info registers

eax 0x0 0

ecx 0x0 0

edx 0x804b224 134525476

ebx 0xf7f49ff4 -134963212

esp 0xffee72f0 0xffee72f0

ebp 0xffee7318 0xffee7318

esi 0xf7f8dce0 -134685472

edi 0x0 0

eip 0x8048d15 0x8048d15

eflags 0x246 [ PF ZF IF ]

cs 0x23 35

ss 0x2b 43

ds 0x2b 43

es 0x2b 43

fs 0x0 0

gs 0x63 99

You can print the values stored in your structs (e.g., in gdb I use “print/x myTcb->context” to print the context stored by a call to getcontext()).

You may find this particularly useful in making sure that the state you “restore” when you first run a newly-created thread makes sense.

3.3 Tests

Your library is done, the tests in doTest.c should run. Grade.sh will compare your results to the expected results in doTest.expected

> make doTest

…

> ./doTest

…

Done.

These tests are provided as a guide to help you with your debugging. Note that successfully running these tests is not a guarantee that your solution is correct or that it will get a good grade. We will also look at source code, run other tests, etc.

You should write additional tests of your own. The worst consequence of a subtle bug is not a bad grade, it is the many hours you may end up suffering in the lab. A mark of a good programmer is a good testing strategy — spend some time adding to the tests we provide.

4. Extra credit (20%) Preemptive multi-threading

In this optional phase of the project, you will extend the cooperative multi-threaded system (where ULT_Yield() causes control to be passed from one thread to the next) to make it a pre-emptive system where simulated “timer interrupts” cause the system to switch from one thread to another.

In the files interrupt.h and interrupt.c we provide code to install a signal handler routine that will be called whenever the process receives a SIGALRM signal. This code also uses the alarm() system call to cause the system to periodically send SIGALRM calls to the process. Make the program showHandler and run it to see how this works.

Your task is to modify the interruptHandler() routine (and your threads system) to move the currently executing process to the ready queue and to move some process off the ready queue to become the new currently executing process.

Shared data structures. Note that interrupt signals can be sent to your process at any time even when a thread is in the middle of a ULT_Yield(), ULT_DestroyThread(), or ULT_CreateThread() call. It is a very bad idea to allow multiple threads to access shared variables (such as your ready queue) at the same time. You should therefore ensure that only one thread can be in a critical section (accessing global variables) of the thread library at a time. A simple way to do this is to disable interrupts when you enter procedures of the thread library and enable interrupts when you leave. In the interrupt.c/.h files, we provide the routines interruptsOn() and interruptsOff() to accomplish this.

Hint: think carefully about the invariants you want to maintain about when interrupts are on and when they are off. Note that because of thread context switches, the thread that turns off interrupts may not be the one that turns them on. Maintain the right invariants, and you’ll have no trouble dealing with this.

4.1 Tests

Once phase 2 is done, the tests in doTest2.c should run. Grade.sh will compare your results to the expected. Again, you should not count on these tests to be comprehensive, and you should add your own tests to protect your grade and your sanity.

4.2 Additional notes and disclaimers

If you attempt the extra credit, be sure to include a text file README-XC that documents your approach to help me understand and evaluate it.

If you attempt the extra credit, be sure to save a working version of the lab first. You may want to make use of git to track multiple versions of your code.

The number of points available here is small relative to the work you will have to do. Don’t do this extra credit portion of the assignment for the points — do it for your own interest and benefit. Note that I will not be able to provide much assistance on the extra credit (I’ll be focusing my attention on people still working on the main part of the assignment.) Also note that I won’t spend much time grading the extra credit work you turn in. There are two implications: (1) It either is right or it isn’t. If it isn’t, I am not going to spend time trying to identify places where I can give you partial credit. (2) You must provide concise, clear documentation of your design in README-XC and you must provide and document a systematic, comprehensive, convincing test plan for your code. It is not my job to test your code and find problems, it is your job to test your code and convincintly demonstrate that it works. If I cannot easily and quickly understand what you did, you will not receive extra credit points.

5. On Programming and Logistics

The following guidelines should help smooth the process of delivering your project. You can help us a great deal by observing the following:

You will work on this project in groups of two. If you cannot find a partner, contact me immediately!

I will grade the project on Ubuntu Linux 11.04. Ensure that you project works there! If you do development on another platform, you do so at your own risk. The statement “it worked on my home machine” will not be considered in the grading process. (Note: it should be possible to do most development on other flavors of Unix, OSX, and perhaps even Windows/Cygwin, and you are welcome to do so. But, managing any difficulties porting to Linux is your job, and you should allow time for doing so.)

After you finish your work, please remove object files before submitting your work.

Usage: % make clean

Do not include object files in your submission!!

Zip or tar your files and then submit the archive via email (subject: 405 project 3) to

Select reasonable names for your files and variables. This way, you make grading easier.

Your files should never refer to absolute names. For example, to include foo.h, do not write:

#include “/home/raz/fall2011/cs405/labs/labULT/foo.h” /* poor style */

You must provide a documentation file README.txt that explains briefly your solution and any assumptions that you made. The code, itself, should contain the bulk of your documentation in the form of well-considered comments.

Thorough testing is the mark of a good programmer. In addition to turning in the code for the system specified above, you should also turn in a set of test programs that exercise the functionality of your system. Your documentation should list these programs, explain how to run them, explain the importance of each test, and summarize the expected results of each. I will deduct fewer points for errors that your test system flags than for errors that it does not catch but that we do.

You are encouraged to reuse your own code that you might have developed in previous courses to handle things such as queues, sorting, etc. Alternatively, you may use other publicly available code for basic data structures, etc.

You may not use code that subsumes the heart of this project (e.g., you should not base your solution on wrappers of or code taken from the posix thread library). If in doubt, ask.

You are encouraged to discuss the problem with your colleagues. However, you must follow the collaboration restrictions described in the syllabus. For example, you are not allowed to look at anyone else’s code, and no one else can not look at yours. For example, when you are talking to them, you should not be looking at your code, and vice versa.

If you find that the problem is under specified, please make reasonable assumptions and document them in the documentation file.

This project’s main difficulty is in conceptualizing the solution. Once you overcome that hurdle, you will be surprised at how relatively simple the implementation is!

Start early, I mean it!!!

Points Awarded – max 350XP

If you code passes doTest you will get at least 200 XP