[ECOS] Draft Threading Backgrounder and Guideline

John Carter john.carter@tait.co.nz
Fri Oct 29 06:36:00 GMT 2004


Greetings Folk,

I was busy doing an audit of the thread safety of our app when I
started taking notes to help me decide what were problems and what
weren't.

Anyway these notes grew and grew until I felt that they could be
useful as a guideline to our developers.

And then I started to worry, was what I am saying correct?

So I pulled out all the stuff specific just to our app into another
document and made it a bit more general.

It is deliberately written in a bossy, know it all, do-it-this-way
style, as it is essentially documenting the design decisions for this
class of app.

It is not a general guide to all possible threading architectures
available under ecos.

Some of the formatting curiosities are due to exporting it as a .txt
file from an .html document.

So here it is, for your entertainment, corrections, comments,
suggestions and flames...

I will take on board your replies and repost the corrected version.

Thank you,

John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter@tait.co.nz
New Zealand


Ecos Threading Backgrounder
===========================

This is a background information and guideline document for...

     * people writing,
     * in C using gcc,
     * a message driven multi-threaded application,
     * under Ecos,
     * using the bitmap scheduler.

Explicitly missing from this document is the reasons for choosing this
architecture, (there are good reasons). But having chosen it, these are
rules we need to live by to stay sane....


Who preempts what when..
=======================

     * Read about Ecos Contexts
       <http://sources.redhat.com/ecos/docs-latest/ref/kernel-overview.html#KERNEL-OVERVIEW-CONTEXTS>
       before reading this document.
     * A thread may be preempted by any higher priority thread at *any*
       stage. (An ISR may interrupt the thread, the control then returns
       to scheduler, scheduler schedules the highest priority runnable
       thread.)
     * By "preempt at any stage" I mean literally between one machine
       code instruction and the next. Even something as simple as |++i|
       may be preempted between the load and the store.
     * A higher priority thread will only be preempted by a lower
       priority thread if it blocks on a cyg_blah_wait() primitive.
     * Under the bitmap scheduler, there are no threads of equal
       priority. Do *not* ever rely on this. We may later elect to moved
       to the MLQ scheduler that permits multiple threads of equal
       priority and time-slicing.
     * Any thread may be preempted at any stage by an ISR or a DSR.
     * A correctly written DSR cannot be preempted by a thread _or_
       another DSR. *You should rely on this.*
     * A DSR may be preempted at any stage by an ISR.
     * A correctly written ISR cannot be preempted by a DSR or thread.
       *You should rely on this.*
     * An ISR may be preempted by an ISR (possibly itself).

* Where I say "between contexts" I mean between any two threads, dsr's
and/or isr's. *

An alarm under ecos is in a DSR context.


     Inter-thread communication
     ==========================

Inter-thread communication should always be via messages.

Take care not to create mail storms by...

     * circular message paths or
     * fork bombs (receipt of one message creates two or more messages
       which create two or more, which...)

Often excessive messaging can be reduced by moving blocks functionality
from one thread to another.


       Back channel communicating between threads
       ==========================================

         Passing pointers between threads
         --------------------------------

Passing pointers between threads in messages or via statics is strongly
recommended against. This rapidly creates a deadly tangle of heap
crashes, memory leaks and race conditions.

Possible situations where this is may be permissible, (after careful
review) are...

     * the data-chunk is so large that memcpy'ing it would be
       prohibitively expensive.
     * The pointer is to a fixed system resource.

The alternative of moving entire blocks of functionality from one thread
to another should be considered in preference to passing pointers.

Do _not_ pass a pointer from thread A to thread B in a message unless
you can immediately overwrite all copies of that pointer in thread A. If
you do so, don't forget to mark the item pointed to as volatile.


         Passing data via globals or statics.
         -----------------------------------

In very rare, very well thought out, documented and reviewed cases,
inter-thread communication may be done via static variables, in which
case access should be serialized with a mutex.

"serialization" means to ensure that only one context is ever executing
certain regions of code at a time.

Semaphores, event flags and condition variables are effective for
sequencing, not serialization. ie. If you wish to ensure only one
context accesses something, these aren't enough.

"sequencing" means to ensure one event follows another.

Acceptable reasons for passing data via statics are...

     * The data is readonly outside the startup context.
     * The data item is a fixed system resource such as a hardware
       register or DMA region.
     * The data item is very large and memcpy'ing it would be prohibitive
       expensive.

The last two reasons are only acceptable if restructuring the code, so
that ownership of the items by one thread or the other, is not possible.
Restructuring the code is preferred to passing via static and globals.

If two or more locks are acquired.....

     * to avoid deadlock all contexts _must_ acquired them in the same
       order and release them in reverse order.
     * the harsher lock, the shorter the duration you may hold it, so the
       harsher must be the inner one.

       In order of decreasing harshness we have HAL_DISABLE_INTERRUPTS,
       cyg_interrupt_mask, cyg_scheduler_lock, cyg_blah_wait


       Use of the |volatile| keyword.

As far as gcc is concerned, everything is entirely single threaded. It
is entirely thread unaware. This is not a bug, this is how it should be.
Thus it is entirely within it's rights to...

    1. Cache memory values in registers.
    2. Optimize away memory reads, if it knows _this_ thread does not
       change that memory location.
    3. Optimize away memory writes, if it knows it will shortly be
       overwriting that location with a new value.
    4. Move memory reads and writes out of loops.
    5. Reorder memory reads and writes to keep the CPU pipeline full.
    6. inline function calls and perform the above optimizations across
       function call boundaries.
    7. This is a risc "load, do stuff, store" CPU architecture. So even
       something as simple as |++i| is /not/ atomic.

Locking primitives may protect you from the race conditions, but do not
necessarily protect you from the compiler's ignorance. To do that you
need to mark the shared data items as "|volatile|"

Thus if...

     * You have a global or static variable shared between threads or
       DSR's or ISR's, the variable must be marked as "|volatile|"
     * If you pass a pointer between threads, then the item pointed to
       must be marked "|volatile|".
     * If the memory location is an external hardware DMA location or
       control register, it must be marked "|volatile|".
     * If the order of reads or writes to the memory locations are
       important, they must be marked "|volatile|".

>From When is a Volatile Object Accessed?
<http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html>

     / Thus an implementation is free to reorder and combine volatile
     accesses which occur between sequence points, but cannot do so for
     accesses across a sequence point. /

Where a sequence point is.... (quoting here from the C Book, second
edition by Mike Banahan, Declan Brady and Mark) Doran.
<http://publications.gbdirect.co.uk/c_book/chapter8/sequence_points.html>

     The sequence points laid down in the Standard are the following:

         * The point of calling a function, after evaluating its arguments.
         * The end of the first operand of the && operator.
         * The end of the first operand of the || operator.
         * The end of the first operand of the ?: conditional operator.
         * The end of the each operand of the comma operator.
         * Completing the evaluation of a full expression. They are the
           following:
               o Evaluating the initializer of an auto object.
               o The expression in an ordinary statement, an expression
                 followed by semicolon.
               o The controlling expressions in do, while, if, switch or
                 for statements.
               o The other two expressions in a for statement.
               o The expression in a return statement.

Consider...

          1  uint32_t * hardwareControlRegister = 0x80000000;
          2  uint32_t * hardwareDMARegister     = 0x80000004;
          3  *hardwareControlRegister |= WRITABLE;
          4  *hardwareDMARegister = *hardwareDMARegister = myData;


The following problems are visible...

     * The optimizer will eliminate line 4 as dead computation.
     * Line 3 may be reordered after 4.
     * The standard doesn't say whether you mean to stuff myData into
       *hardwardDMARegister twice, or stuff it into *hardwardDMARegister
       once, read *hardwardDMARegister and stuff the result into
       *hardwardDMARegister.

This code should be....

          1  volatile uint32_t * hardwareControlRegister = 0x80000000;
          2  volatile uint32_t * hardwareDMARegister     = 0x80000004;
          3  *hardwareControlRegister |= WRITABLE;
          4  *hardwareDMARegister = myData;
          5  *hardwareDMARegister = myData;


Bugs related failing to use the volatile keyword are fragile
"Heisenbugs". Unrelated changes (eg. debug code) can make them appear or
disappear.


       Use of |HAL_REORDER_BARRIER|
       ----------------------------

As noted in point 5 above, the compiler can reorder code. So long as the
result (for a single thread, no funny hardware), is the correct answer,
the compiler and reorder as it pleases.

This is not a "might happen" thing, its a "does it continuously all the
time, every blooming line" thing.

As seen above the use of the volatile keyword protects you / up to the
nearest sequence point. /

Thus there exists an evil non-portable macro HAL_REORDER_BARRIER in
hal_arch.h.

This truly evil hack stops the compiler, but not the CPU, from
reordering code across it.

* I cannot think of use for HAL_REORDER_BARRIER in non-kernel, non-SMP,
ISR's,DSR's or threads. * This may merely indicate my lack of
imagination. Please expand my knowledge of the pathologies of optimizers
and threading if I am wrong.


       Important Notes from the Ecos docs
       ==================================

     * / Mutexes serve as a mutual exclusion mechanism between threads,
       and cannot be used to synchronize between threads and the
       interrupt handling subsystem. *If a critical region is shared
       between a thread and a DSR then it must be protected using
       cyg_scheduler_lock and cyg_scheduler_unlock. If a critical region
       is shared between a thread and an ISR, it must be protected by
       disabling or masking interrupts.*/

       The way I read that is...

           o Threads keep other threads out of their critical region with
             mutex's.
           o Threads keep DSR's out of their critical region with schedlock
           o DSR's _never_ preempt other DSR's.
           o Threads and DSR's keep a particular ISR out of their
             critical region with cyg_interrupt_mask
           o Only the kernel should ever use HAL_DISABLE_INTERRUPT's.
     * /cyg_cond_wait and cyg_cond_timedwait may only be called from
       thread context since they may block. cyg_cond_signal and
       cyg_cond_broadcast may be called from thread or DSR context./
     * /cyg_semaphore_wait and cyg_semaphore_timed_wait may only be
       called from thread context because these operations may block.
       cyg_semaphore_trywait, cyg_semaphore_post and cyg_semaphore_peek
       may be called from thread or DSR context./


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss



More information about the Ecos-discuss mailing list