.

Saturday, July 12, 2008

Threading in C#

Overview and Concepts

C# supports parallel execution of code through multithreading. A thread is an independent execution path, able to run simultaneously with other threads.

A C# program starts in a single thread created automatically by the CLR and operating system (the "main" thread), and is made multi-threaded by creating additional threads. Here's a simple example and its output:

All examples assume the following namespaces are imported, unless otherwise specified:

using System;
using System.Threading;

class ThreadTest {
  static void Main() {
    Thread t = new Thread (WriteY);
    t.Start();                          // Run WriteY on the new thread
    while (true) Console.Write ("x");   // Write 'x' forever
  }
 
  static void WriteY() {
    while (true) Console.Write ("y");   // Write 'y' forever
  }
}

xxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
...

The main thread creates a new thread t on which it runs a method that repeatedly prints the character y. Simultaneously, the main thread repeatedly prints the character x.

The CLR assigns each thread its own memory stack so that local variables are kept separate. In the next example, we define a method with a local variable, then call the method simultaneously on the main thread and a newly created thread:

static void Main() {
  new Thread (Go).Start();      // Call Go() on a new thread
  Go();                         // Call Go() on the main thread
}
 
static void Go() {
  // Declare and use a local variable - 'cycles'
  for (int cycles = 0; cycles < style="color: teal;">Console.Write ('?');
}

??????????

A separate copy of the cycles variable is created on each thread's memory stack, and so the output is, predictably, ten question marks.

Threads share data if they have a common reference to the same object instance. Here's an example:

class ThreadTest {
 bool done;
 
 static void Main() {
   ThreadTest tt = new ThreadTest();   // Create a common instance
   new Thread (tt.Go).Start();
   tt.Go();
 }
 
 // Note that Go is now an instance method
 void Go() {
   if (!done) { done = true; Console.WriteLine ("Done"); }
 }
}

Because both threads call Go() on the same ThreadTest instance, they share the done field. This results in "Done" being printed once instead of twice:

Done

Static fields offer another way to share data between threads. Here's the same example with done as a static field:

class ThreadTest {
 static bool done;    // Static fields are shared between all threads
 
 static void Main() {
   new Thread (Go).Start();
   Go();
 }
 
 static void Go() {
   if (!done) { done = true; Console.WriteLine ("Done"); }
 }
}

Both of these examples illustrate another key concept – that of thread safety (or, rather, lack of it!) The output is actually indeterminate: it's possible (although unlikely) that "Done" could be printed twice. If, however, we swap the order of statements in the Go method, then the odds of "Done" being printed twice go up dramatically:

static void Go() {
  if (!done) { Console.WriteLine ("Done"); done = true; }
}

Done
Done (usually!)

The problem is that one thread can be evaluating the if statement right as the other thread is executing the WriteLine statement – before it's had a chance to set done to true.

The remedy is to obtain an exclusive lock while reading and writing to the common field. C# provides the lock statement for just this purpose:

class ThreadSafe {
  static bool done;
  static object locker = new object();
 
  static void Main() {
    new Thread (Go).Start();
    Go();
  }
 
  static void Go() {
    lock (locker) {
      if (!done) { Console.WriteLine ("Done"); done = true; }
    }
  }
}

When two threads simultaneously contend a lock (in this case, locker), one thread waits, or blocks, until the lock becomes available. In this case, it ensures only one thread can enter the critical section of code at a time, and "Done" will be printed just once. Code that's protected in such a manner – from indeterminacy in a multithreading context – is called thread-safe.

Temporarily pausing, or blocking, is an essential feature in coordinating, or synchronizing the activities of threads. Waiting for an exclusive lock is one reason for which a thread can block. Another is if a thread wants to pause, or Sleep for a period of time:

Thread.Sleep (TimeSpan.FromSeconds (30));         // Block for 30 seconds

A thread can also wait for another thread to end, by calling its Join method:

Thread t = new Thread (Go);           // Assume Go is some static method
t.Start();
t.Join();                             // Wait (block) until thread t ends

A thread, while blocked, doesn't consume CPU resources.

How Threading Works

Multithreading is managed internally by a thread scheduler, a function the CLR typically delegates to the operating system. A thread scheduler ensures all active threads are allocated appropriate execution time, and that threads that are waiting or blocked – for instance – on an exclusive lock, or on user input – do not consume CPU time.

On a single-processor computer, a thread scheduler performs time-slicingrapidly switching execution between each of the active threads. This results in "choppy" behavior, such as in the very first example, where each block of a repeating X or Y character corresponds to a time-slice allocated to the thread. Under Windows XP, a time-slice is typically in the tens-of-milliseconds region – chosen such as to be much larger than the CPU overhead in actually switching context between one thread and another (which is typically in the few-microseconds region).

On a multi-processor computer, multithreading is implemented with a mixture of time-slicing and genuine concurrency – where different threads run code simultaneously on different CPUs. It's almost certain there will still be some time-slicing, because of the operating system's need to service its own threads – as well as those of other applications.

A thread is said to be preempted when its execution is interrupted due to an external factor such as time-slicing. In most situations, a thread has no control over when and where it's preempted.

Threads vs. Processes

All threads within a single application are logically contained within a process – the operating system unit in which an application runs.

Threads have certain similarities to processes – for instance, processes are typically time-sliced with other processes running on the computer in much the same way as threads within a single C# application. The key difference is that processes are fully isolated from each other; threads share (heap) memory with other threads running in the same application. This is what makes threads useful: one thread can be fetching data in the background, while another thread is displaying the data as it arrives.

When to Use Threads

A common application for multithreading is performing time-consuming tasks in the background. The main thread keeps running, while the worker thread does its background job. With Windows Forms or WPF applications, if the main thread is tied up performing a lengthy operation, keyboard and mouse messages cannot be processed, and the application becomes unresponsive. For this reason, it’s worth running time-consuming tasks on worker threads even if the main thread has the user stuck on a “Processing… please wait” modal dialog in cases where the program can’t proceed until a particular task is complete. This ensures the application doesn’t get tagged as “Not Responding” by the operating system, enticing the user to forcibly end the process in frustration! The modal dialog approach also allows for implementing a "Cancel" button, since the modal form will continue to receive events while the actual task is performed on the worker thread. The BackgroundWorker class assists in just this pattern of use.

In the case of non-UI applications, such as a Windows Service, multithreading makes particular sense when a task is potentially time-consuming because it’s awaiting a response from another computer (such as an application server, database server, or client). Having a worker thread perform the task means the instigating thread is immediately free to do other things.

Another use for multithreading is in methods that perform intensive calculations. Such methods can execute faster on a multi-processor computer if the workload is divided amongst multiple threads. (One can test for the number of processors via the Environment.ProcessorCount property).

A C# application can become multi-threaded in two ways: either by explicitly creating and running additional threads, or using a feature of the .NET framework that implicitly creates threads – such as BackgroundWorker, thread pooling, a threading timer, a Remoting server, or a Web Services or ASP.NET application. In these latter cases, one has no choice but to embrace multithreading. A single-threaded ASP.NET web server would not be cool – even if such a thing were possible! Fortunately, with stateless application servers, multithreading is usually fairly simple; one's only concern perhaps being in providing appropriate locking mechanisms around data cached in static variables.

When Not to Use Threads

Multithreading also comes with disadvantages. The biggest is that it can lead to vastly more complex programs. Having multiple threads does not in itself create complexity; it's the interaction between the threads that creates complexity. This applies whether or not the interaction is intentional, and can result long development cycles, as well as an ongoing susceptibility to intermittent and non-reproducable bugs. For this reason, it pays to keep such interaction in a multi-threaded design simple – or not use multithreading at all – unless you have a peculiar penchant for re-writing and debugging!

Multithreading also comes with a resource and CPU cost in allocating and switching threads if used excessively. In particular, when heavy disk I/O is involved, it can be faster to have just one or two workers thread performing tasks in sequence, rather than having a multitude of threads each executing a task at the same time. Later we describe how to implement a Producer/Consumer queue, which provides just this functionality.

Creating and Starting Threads

Threads are created using the Thread class’s constructor, passing in a ThreadStart delegate – indicating the method where execution should begin. Here’s how the ThreadStart delegate is defined:

public delegate void ThreadStart();

Calling Start on the thread then sets it running. The thread continues until its method returns, at which point the thread ends. Here’s an example, using the expanded C# syntax for creating a TheadStart delegate:

class ThreadTest {
  static void Main() {
    Thread t = new Thread (new ThreadStart (Go));
    t.Start();   // Run Go() on the new thread.
    Go();        // Simultaneously run Go() in the main thread.
  }
  static void Go() { Console.WriteLine ("hello!"); }

In this example, thread t executes Go() – at (much) the same time the main thread calls Go(). The result is two near-instant hellos:

hello!
hello!

A thread can be created more conveniently using C#'s shortcut syntax for instantiating delegates:

static void Main() {
  Thread t = new Thread (Go);    // No need to explicitly use ThreadStart
  t.Start();
  ...
}
static void Go() { ... }

In this case, a ThreadStart delegate is inferred automatically by the compiler. Another shortcut is to use an anonymous method to start the thread:

static void Main() {
  Thread t = new Thread (delegate() { Console.WriteLine ("Hello!"); });
  t.Start();
}

A thread has an IsAlive property that returns true after its Start() method has been called, up until the thread ends.

A thread, once ended, cannot be re-started.

Passing Data to ThreadStart

Let’s say, in the example above, we wanted to better distinguish the output from each thread, perhaps by having one of the threads write in upper case. We could achieve this by passing a flag to the Go method: but then we couldn’t use the ThreadStart delegate because it doesn’t accept arguments. Fortunately, the .NET framework defines another version of the delegate called ParameterizedThreadStart, which accepts a single object argument as follows:

public delegate void ParameterizedThreadStart (object obj);

The previous example then looks like this:

class ThreadTest {
  static void Main() {
    Thread t = new Thread (Go);
    t.Start (true);             // == Go (true) 
    Go (false);
  }
  static void Go (object upperCase) {
    bool upper = (bool) upperCase;
    Console.WriteLine (upper ? "HELLO!" : "hello!");
  }

hello!
HELLO!

In this example, the compiler automatically infers a ParameterizedThreadStart delegate because the Go method accepts a single object argument. We could just as well have written:

Thread t = new Thread (new ParameterizedThreadStart (Go));
t.Start (true);

A feature of using ParameterizedThreadStart is that we must cast the object argument to the desired type (in this case bool) before use. Also, there is only a single-argument version of this delegate.

An alternative is to use an anonymous method to call an ordinary method as follows:

static void Main() {
  Thread t = new Thread (delegate() { WriteText ("Hello"); });
  t.Start();
}
static void WriteText (string text) { Console.WriteLine (text); }

The advantage is that the target method (in this case WriteText) can accept any number of arguments, and no casting is required. However one must take into account the outer-variable semantics of anonymous methods, as is apparent in the following example:

static void Main() {
  string text = "Before";
  Thread t = new Thread (delegate() { WriteText (text); });
  text = "After";
  t.Start();
}
static void WriteText (string text) { Console.WriteLine (text); }

After

 

Anonymous methods open the grotesque possibility of unintended interaction via outer variables if they are modified by either party subsequent to the thread starting. Intended interaction (usually via fields) is generally considered more than enough! Outer variables are best treated as ready-only once thread execution has begun – unless one's willing to implement appropriate locking semantics on both sides.

Another common system for passing data to a thread is by giving Thread an instance method rather than a static method. The instance object’s properties can then tell the thread what to do, as in the following rewrite of the original example:

class ThreadTest {
  bool upper;
 
  static void Main() {
    ThreadTest instance1 = new ThreadTest();
    instance1.upper = true;
    Thread t = new Thread (instance1.Go);
    t.Start();
    ThreadTest instance2 = new ThreadTest();
    instance2.Go();        // Main thread – runs with upper=false
  }
 
  void Go() { Console.WriteLine (upper ? "HELLO!" : "hello!"); }

Naming Threads

A thread can be named via its Name property. This is of great benefit in debugging: as well as being able to Console.WriteLine a thread’s name, Microsoft Visual Studio picks up a thread’s name and displays it in the Debug Location toolbar. A thread’s name can be set at any time – but only once – attempts to subsequently change it will throw an exception.

The application’s main thread can also be assigned a name – in the following example the main thread is accessed via the CurrentThread static property:

class ThreadNaming {
  static void Main() {
    Thread.CurrentThread.Name = "main";
    Thread worker = new Thread (Go);
    worker.Name = "worker";
    worker.Start();
    Go();
  }
  static void Go() {
    Console.WriteLine ("Hello from " + Thread.CurrentThread.Name);
  }
}

Hello from main
Hello from worker

Foreground and Background Threads

By default, threads are foreground threads, meaning they keep the application alive for as long as any one of them is running. C# also supports background threads, which don’t keep the application alive on their own – terminating immediately once all foreground threads have ended.

Changing a thread from foreground to background doesn’t change its priority or status within the CPU scheduler in any way.

A thread's IsBackground property controls its background status, as in the following example:

class PriorityTest {
  static void Main (string[] args) {
    Thread worker = new Thread (delegate() { Console.ReadLine(); });
    if (args.Length > 0) worker.IsBackground = true;
    worker.Start();
  }
}

If the program is called with no arguments, the worker thread runs in its default foreground mode, and will wait on the ReadLine statement, waiting for the user to hit Enter. Meanwhile, the main thread exits, but the application keeps running because a foreground thread is still alive.

If on the other hand an argument is passed to Main(), the worker is assigned background status, and the program exits almost immediately as the main thread ends – terminating the ReadLine.

When a background thread terminates in this manner, any finally blocks are circumvented. As circumventing finally code is generally undesirable, it's good practice to explicitly wait for any background worker threads to finish before exiting an application – perhaps with a timeout (this is achieved by calling Thread.Join). If for some reason a renegade worker thread never finishes, one can then attempt to abort it, and if that fails, abandon the thread, allowing it to die with the process (logging the conundrum at this stage would also make sense!)

Having worker threads as background threads can then beneficial, for the very reason that it's always possible to have the last say when it comes to ending the application. Consider the alternative – foreground thread that won't die – preventing the application from exiting. An abandoned foreground worker thread is particularly insidious with a Windows Forms application, because the application will appear to exit when the main thread ends (at least to the user) but its process will remain running. In the Windows Task Manager, it will have disappeared from the Applications tab, although its executable filename still be visible in the Processes tab. Unless the user explicitly locates and ends the task, it will continue to consume resources and perhaps prevent a new instance of the application from starting or functioning properly.

A common cause for an application failing to exit properly is the presence of “forgotten” foregrounds threads.

Thread Priority

A thread’s Priority property determines how much execution time it gets relative to other active threads in the same process, on the following scale:

enum ThreadPriority { Lowest, BelowNormal, Normal, AboveNormal, Highest }

This becomes relevant only when multiple threads are simultaneously active.

Setting a thread’s priority to high doesn’t mean it can perform real-time work, because it’s still limited by the application’s process priority. To perform real-time work, the Process class in System.Diagnostics must also be used to elevate the process priority as follows (I didn't tell you how to do this):

Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;

ProcessPriorityClass.High is actually one notch short of the highest process priority: Realtime. Setting one's process priority to Realtime instructs the operating system that you never want your process to be preempted. If your program enters an accidental infinite loop you can expect even the operating system to be locked out. Nothing short of the power button will rescue you! For this reason, High is generally considered the highest usable process priority.

If the real-time application has a user interface, it can be undesirable to elevate the process priority because screen updates will be given excessive CPU time – slowing the entire computer, particularly if the UI is complex. (Although at the time of writing, the Internet telephony program Skype gets away with doing just this, perhaps because its UI is fairly simple). Lowering the main thread’s priority – in conjunction with raising the process’s priority – ensures the real-time thread doesn’t get preempted by screen redraws, but doesn’t prevent the computer from slowing, because the operating system will still allocate excessive CPU to the process as a whole. The ideal solution is to have the real-time work and user interface in separate processes (with different priorities), communicating via Remoting or shared memory. Shared memory requires P/Invoking the Win32 API (web-search CreateFileMapping and MapViewOfFile).

Exception Handling

Any try/catch/finally blocks in scope when a thread is created are of no relevance once the thread starts executing. Consider the following program:

public static void Main() {
  try {
    new Thread (Go).Start();
  }
  catch (Exception ex) {
    // We'll never get here!
    Console.WriteLine ("Exception!");
  }
 
  static void Go() { throw null; }
}

The try/catch statement in this example is effectively useless, and the newly created thread will be encumbered with an unhandled NullReferenceException. This behavior makes sense when you consider a thread has an independent execution path. The remedy is for thread entry methods to have their own exception handlers:

public static void Main() {
   new Thread (Go).Start();
}
 
static void Go() {
  try {
    ...
    throw null;      // this exception will get caught below
    ...
  }
  catch (Exception ex) {
    Typically log the exception, and/or signal another thread
    that we've come unstuck
    ...
  }

From .NET 2.0 onwards, an unhandled exception on any thread shuts down the whole application, meaning ignoring the exception is generally not an option. Hence a try/catch block is required in every thread entry method – at least in production applications – in order to avoid unwanted application shutdown in case of an unhandled exception. This can be somewhat cumbersome – particularly for Windows Forms programmers, who commonly use the "global" exception handler, as follows:

using System;
using System.Threading;
using System.Windows.Forms;
 
static class Program {
  static void Main() {
    Application.ThreadException += HandleError;
    Application.Run (new MainForm());
  }
 
  static void HandleError (object sender, ThreadExceptionEventArgs e) {
    Log exception, then either exit the app or continue...
  }
}

The Application.ThreadException event fires when an exception is thrown from code that was ultimately called as a result of a Windows message (for example, a keyboard, mouse or "paint" message) – in short, nearly all code in a typical Windows Forms application. While this works perfectly, it lulls one into a false sense of security – that all exceptions will be caught by the central exception handler. Exceptions thrown on worker threads are a good example of exceptions not caught by Application.ThreadException (the code inside the Main method is another – including the main form's constructor, which executes before the Windows message loop begins).

The .NET framework provides a lower-level event for global exception handling: AppDomain.UnhandledException. This event fires when there's an unhandled exception in any thread, and in any type of application (with or without a user interface). However, while it offers a good last-resort mechanism for logging untrapped exceptions, it provides no means of preventing the application from shutting down – and no means to suppress the .NET unhandled exception dialog.

In production applications, explicit exception handling is required on all thread entry methods.


Blocking

When a thread waits or pauses as a result of using the constructs listed in the tables above, it's said to be blocked. Once blocked, a thread immediately relinquishes its allocation of CPU time, adds WaitSleepJoin to its ThreadState property, and doesn’t get re-scheduled until unblocked. Unblocking happens in one of four ways (the computer's power button doesn't count!):

  • by the blocking condition being satisfied
  • by the operation timing out (if a timeout is specified)
  • by being interrupted via Thread.Interrupt
  • by being aborted via Thread.Abort

A thread is not deemed blocked if its execution is paused via the (deprecated) Suspend method.

Sleeping and Spinning

Calling Thread.Sleep blocks the current thread for the given time period (or until interrupted):

static void Main() {
  Thread.Sleep (0);                       // relinquish CPU time-slice
  Thread.Sleep (1000);                    // sleep for 1000 milliseconds
  Thread.Sleep (TimeSpan.FromHours (1));  // sleep for 1 hour
  Thread.Sleep (Timeout.Infinite);        // sleep until interrupted
}

More precisely, Thread.Sleep relinquishes the CPU, requesting that the thread is not re-scheduled until the given time period has elapsed. Thread.Sleep(0) relinquishes the CPU just long enough to allow any other active threads present in a time-slicing queue (should there be one) to be executed.


Thread.Sleep is unique amongst the blocking methods in that suspends Windows message pumping within a Windows Forms application, or COM environment on a thread for which the single-threaded apartment model is used. This is of little consequence with Windows Forms applications, in that any lengthy blocking operation on the main UI thread will make the application unresponsive – and is hence generally avoided – regardless of the whether or not message pumping is "technically" suspended. The situation is more complex in a legacy COM hosting environment, where it can sometimes be desirable to sleep while keeping message pumping alive. Microsoft's Chris Brumme discusses this at length in his web log.

Joining a Thread

You can block until another thread ends by calling Join:

class JoinDemo {
  static void Main() {
    Thread t = new Thread (delegate() { Console.ReadLine(); });
    t.Start();
    t.Join();    // Wait until thread t finishes
    Console.WriteLine ("Thread t's ReadLine complete!");
  }
}

The Join method also accepts a timeout argument – in milliseconds, or as a TimeSpan, returning false if the Join timed out rather than found the end of the thread. Join with a timeout functions rather like Sleep – in fact the following two lines of code are almost identical:

Thread.Sleep (1000);
Thread.CurrentThread.Join (1000);


Locking and Thread Safety

Locking enforces exclusive access, and is used to ensure only one thread can enter particular sections of code at a time. For example, consider following class:

class ThreadUnsafe { static int val1, val2; static void Go() { if (val2 != 0) Console.WriteLine (val1 / val2); val2 = 0; }}

This is not thread-safe: if Go was called by two threads simultaneously it would be possible to get a division by zero error – because val2 could be set to zero in one thread right as the other thread was in between executing the if statement and Console.WriteLine.

Here’s how lock can fix the problem:

class ThreadSafe { static object locker = new object(); static int val1, val2; static void Go() { lock (locker) { if (val2 != 0) Console.WriteLine (val1 / val2); val2 = 0; } }}

Only one thread can lock the synchronizing object (in this case locker) at a time, and any contending threads are blocked until the lock is released. If more than one thread contends the lock, they are queued – on a “ready queue” and granted the lock on a first-come, first-served basis as it becomes available. Exclusive locks are sometimes said to enforce serialized access to whatever's protected by the lock, because one thread's access cannot overlap with that of another. In this case, we're protecting the logic inside the Go method, as well as the fields val1 and val2.

A thread blocked while awaiting a contended lock has a ThreadState of WaitSleepJoin. Later we discuss how a thread blocked in this state can be forcibly released via another thread calling its Interrupt or Abort method. This is a fairly heavy-duty technique that might typically be used in ending a worker thread.

C#’s lock statement is in fact a syntactic shortcut for a call to the methods Monitor.Enter and Monitor.Exit, within a try-finally block. Here’s what’s actually happening within the Go method of the previous example:

Monitor.Enter (locker); try { if (val2 != 0) Console.WriteLine (val1 / val2); val2 = 0;}finally { Monitor.Exit (locker); }

Calling Monitor.Exit without first calling Monitor.Enter on the same object throws an exception.

Monitor also provides a TryEnter method allows a timeout to be specified – either in milliseconds or as a TimeSpan. The method then returns true – if a lock was obtained – or false – if no lock was obtained because the method timed out. TryEnter can also be called with no argument, which "tests" the lock, timing out immediately if the lock can’t be obtained right away.

Choosing the Synchronization Object

Any object visible to each of the partaking threads can be used as a synchronizing object, subject to one hard rule: it must be a reference type. It’s also highly recommended that the synchronizing object be privately scoped to the class (i.e. a private instance field) to prevent an unintentional interaction from external code locking the same object. Subject to these rules, the synchronizing object can double as the object it's protecting, such as with the list field below:

class ThreadSafe { List <string> list = new List <string>(); void Test() { lock (list) { list.Add ("Item 1"); ...

A dedicated field is commonly used (such as locker, in the example prior), because it allows precise control over the scope and granularity of the lock. Using the object or type itself as a synchronization object, i.e.:

lock (this) { ... }

or:

lock (typeof (Widget)) { ... } // For protecting access to statics

is discouraged because it potentially offers public scope to the synchronization object.

Thread-Safety and .NET Framework Types

Locking can be used to convert thread-unsafe code into thread-safe code. A good example is with the .NET framework – nearly all of its non-primitive types are not thread safe when instantiated, and yet they can be used in multi-threaded code if all access to any given object is protected via a lock. Here's an example, where two threads simultaneously add items to the same List collection, then enumerate the list:

class ThreadSafe { static List <string> list = new List <string>(); static void Main() { new Thread (AddItems).Start(); new Thread (AddItems).Start(); } static void AddItems() { for (int i = 0; i <>lock (list) list.Add ("Item " + list.Count); string[] items; lock (list) items = list.ToArray(); foreach (string s in items) Console.WriteLine (s); }}

In this case, we're locking on the list object itself, which is fine in this simple scenario. If we had two interrelated lists, however, we would need to lock upon a common object – perhaps a separate field, if neither list presented itself as the obvious candidate.

Enumerating .NET collections is also thread-unsafe in the sense that an exception is thrown if another thread alters the list during enumeration. Rather than locking for the duration of enumeration, in this example, we first copy the items to an array. This avoids holding the lock excessively if what we're doing during enumeration is potentially time-consuming.

Here's an interesting supposition: imagine if the List class was, indeed, thread-safe. What would it solve? Potentially, very little! To illustrate, let's say we wanted to add an item to our hypothetical thread-safe list, as follows:

if (!myList.Contains (newItem)) myList.Add (newItem);

Whether or not the list was thread-safe, this statement is certainly not! The whole if statement would have to be wrapped in a lock – to prevent preemption in between testing for containership and adding the new item. This same lock would then need to be used everywhere we modified that list. For instance, the following statement would also need to be wrapped – in the identical lock:

myList.Clear();

to ensure it did not preempt the former statement. In other words, we would have to lock almost exactly as with our thread-unsafe collection classes. Built-in thread safety, then, can actually be a waste of time!

One could argue this point when writing custom components – why build in thread-safety when it can easily end up being redundant?

There is a counter-argument: wrapping an object around a custom lock works only if all concurrent threads are aware of, and use, the lock – which may not be the case if the object is widely scoped. The worst scenario crops up with static members in a public type. For instance, imagine the static property on the DateTime struct, DateTime.Now, was not thread-safe, and that two concurrent calls could result in garbled output or an exception. The only way to remedy this with external locking might be to lock the type itself – lock(typeof(DateTime)) – around calls to DateTime.Now – which would work only if all programmers agreed to do this. And this is unlikely, given that locking a type is considered by many, a Bad Thing!

For this reason, static members on the DateTime struct are guaranteed to be thread-safe. This is a common pattern throughout the .NET framework – static members are thread-safe, while instance members are not. Following this pattern also makes sense when writing custom types, so as not to create impossible thread-safety conundrums!

Interrupt and Abort

A blocked thread can be released prematurely in one of two ways:

This must happen via the activities of another thread; the waiting thread is powerless to do anything in its blocked state.

Interrupt

Calling Interrupt on a blocked thread forcibly releases it, throwing a ThreadInterruptedException, as follows:

class Program {
  static void Main() {
    Thread t = new Thread (delegate() {
      try {
        Thread.Sleep (Timeout.Infinite);
      }
      catch (ThreadInterruptedException) {
        Console.Write ("Forcibly ");
      }
      Console.WriteLine ("Woken!");
    });
 
    t.Start();
    t.Interrupt();
  }
}

Forcibly Woken!

Interrupting a thread only releases it from its current (or next) wait: it does not cause the thread to end (unless, of course, the ThreadInterruptedException is unhandled!)

If Interrupt is called on a thread that’s not blocked, the thread continues executing until it next blocks, at which point a ThreadInterruptedException is thrown. This avoids the need for the following test:

if ((worker.ThreadState & ThreadState.WaitSleepJoin) > 0)
  worker.Interrupt();

which is not thread-safe because of the possibility of being preempted in between the if statement and worker.Interrupt.

Interrupting a thread arbitrarily is dangerous, however, because any framework or third-party methods in the calling stack could unexpectedly receive the interrupt rather than your intended code. All it would take is for the thread to block briefly on a simple lock or synchronization resource, and any pending interruption would kick in. If the method wasn't designed to be interrupted (with appropriate cleanup code in finally blocks) objects could be left in an unusable state, or resources incompletely released.

Interrupting a thread is safe when you know exactly where the thread is. Later we cover signaling constructs, which provide just such a means.

Abort

A blocked thread can also be forcibly released via its Abort method. This has an effect similar to calling Interrupt, except that a ThreadAbortException is thrown instead of a ThreadInterruptedException. Furthermore, the exception will be re-thrown at the end of the catch block (in an attempt to terminate the thread for good) unless Thread.ResetAbort is called within the catch block. In the interim, the thread has a ThreadState of AbortRequested.

The big difference, though, between Interrupt and Abort, is what happens when it's called on a thread that is not blocked. While Interrupt waits until the thread next blocks before doing anything, Abort throws an exception on the thread right where it's executing – maybe not even in your code. Aborting a non-blocked thread can have significant consequences, the details of which are explored in the later section "Aborting Threads".

Thread State

[ThreadState Diagram]

Figure 1: Thread State Diagram

One can query a thread's execution status via its ThreadState property. Figure 1 shows one "layer" of the ThreadState enumeration. ThreadState is horribly designed, in that it combines three "layers" of state using bitwise flags, the members within each layer being themselves mutually exclusive. Here are all three layers:

  • the running / blocking / aborting status (as shown in Figure 1)
  • the background/foreground status (ThreadState.Background)
  • the progress towards suspension via the deprecated Suspend method (ThreadState.SuspendRequested and ThreadState.Suspended)

In total then, ThreadState is a bitwise combination of zero or one members from each layer! Here are some sample ThreadStates:

Unstarted
Running
WaitSleepJoin
Background, Unstarted
SuspendRequested, Background, WaitSleepJoin

(The enumeration has two members that are never used, at least in the current CLR implementation: StopRequested and Aborted.)

To complicate matters further, ThreadState.Running has an underlying value of 0, so the following test does not work:

if ((t.ThreadState & ThreadState.Running) > 0) ...

and one must instead test for a running thread by exclusion, or alternatively, use the thread's IsAlive property. IsAlive, however, might not be what you want. It returns true if the thread's blocked or suspended (the only time it returns false is before the thread has started, and after it has ended).

Assuming one steers clear of the deprecated Suspend and Resume methods, one can write a helper method that eliminates all but members of the first layer, allowing simple equality tests to be performed. A thread's background status can be obtained independently via its IsBackground property, so only the first layer actually has useful information:

public static ThreadState SimpleThreadState (ThreadState ts)
{
  return ts & (ThreadState.Aborted | ThreadState.AbortRequested |
               ThreadState.Stopped | ThreadState.Unstarted |
               ThreadState.WaitSleepJoin);
}

ThreadState is invaluable for debugging or profiling. It's poorly suited, however, to coordinating multiple threads, because no mechanism exists by which one can test a ThreadState and then act upon that information, without the ThreadState potentially changing in the interim.

No comments:

.