[ECOS] Scheduler lock in Doug Lea's malloc implementation - be real careful

Mon Sep 11 11:01:00 GMT 2006

>>>>> "Bob" == bob koninckx <bob.koninckx@o-3s.com> writes:

    Bob> Obviously, you need protection there. What I do not
    Bob> understand is why it is implemented using scheduler
    Bob> lock/unlock instead of using mutexes or semaphores. Also the
    Bob> proposed use of memory pools is not really clear to me (and
    Bob> it certainly raises portability issues when moving from one
    Bob> OS to another). As a general rule, I _always_ consider
    Bob> dynamic memory allocation an undeterministic process, so I
    Bob> _never_ use it in time critical code. It nevertheless seems
    Bob> odd to me that you shouldn't be allowed to use dynamic memory
    Bob> allocation in low priority threads with no real-time
    Bob> requirements at all, just because of timing requirements in
    Bob> higher priority threads. I see no reason why scanning the
    Bob> heap could not be interrupted (put on hold for a while) to
    Bob> give a real-time thread time to finish its job. This is not
    Bob> possible if you protect things with a big kernel lock, but
    Bob> would be if searching for memory would be protected by a
    Bob> mutex.

    Bob> Any comments ? Am I making a fundamental reasoning error here
    Bob> ? If not, I am going to try to change the implementation
    Bob> and see if it solves our problem and propose a patch.

You seem to have missed my point. Let me try again.

The current code behaves as follows:

    lock scheduler
    alloc() or free() or realloc()
    unlock scheduler

Let T_alloc be the worst case time spent in the alloc(), free() or
realloc() routines, noting that the realloc() implementations in the
heap will fail if the chunk cannot be resized and the alternative
alloc()/memcpy()/free() implementation happens elsewhere. T_alloc will
depend on factors like heap fragmentation so ideally would be measured
in the context of a real application, but some of testcases in the
memalloc package probably come close enough.

What you are proposing is:

    lock_mutex
      lock scheduler
      mutex lock code
      unlock scheduler
    alloc() or free() or realloc()
    unlock_mutex
      lock scheduler
      mutex unlock code
      unlock scheduler

Let T_lock be the time spent in the mutex lock code and T_unlock the
time spent in the mutex unlock code. These should probably be measured
for the case where the mutex is contested, priority inversion
protection triggers, and there are threads blocked on the mutex, not
for the simple uncontested case. The other important number is T_max -
the maximum time the system will spend with the scheduler locked
because of other code. This is rather tricky to measure and will
depend in part on the application and its I/O behaviour.

If T_alloc is comparable to max(T_lock, T_unlock) then there is
nothing to be gained by using a mutex rather than the scheduler lock.
You are adding cpu cycle overhead to the memory allocation routines
for no improvement in real-time performance.

If T_alloc < T_max, then switching from a scheduler lock to a mutex
lock/unlock will not change the maximum dispatch latency and will not
affect the worst case real-time characteristics. It may still affect
the average dispatch latency.

If T_alloc == T_max, i.e. the worst case dispatch latency is caused by
the memory allocation code, then we have a problem that needs
addressing.

I believe the author of the original code assumed that T_alloc was
comparable to max(T_lock, T_unlock), or at least that (T_alloc < T_max).
Hence using a scheduler lock saved some cpu cycles for every memory
allocation operation without affecting real-time behaviour. I suspect
that assumption is false but do not have any numbers to prove it.

There are several memory allocators, but the ones we are interested in
are the fixed block, variable block and dlmalloc ones. If for all
three we have T_alloc >> MAX(T_lock, T_unlock) then we should probably
switch to using a mutex by default on the grounds of improving average
dispatch latency. If for all three we have T_alloc == T_max then we
must switch to using a mutex by default to reduce the maximum dispatch
latency. Unfortunately this requires measuring T_max.

If on the other hand we find that for say the fixed memory allocator
T_alloc is much the same as MAX(T_lock, T_unlock) but dlmalloc behaves
very differently then we would need to introduce new templates, with
the old ones still using the scheduler lock and the new ones using a
mutex. 

I do not have time right now to experiment and get some real numbers.
However I would very much like to see such numbers before or
accompanying any patch that changes the behaviour. Also it would be
important to run not just the memory allocation package's own tests
but also the uITRON ones.

Bart

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss