[ECOS] The Ideas I mentioned few weeks back..

Fri Dec 14 03:47:00 GMT 2001

Hi Nick,

> > now this approach (of not letting schedular process the queue(s) even when the
> > thread's time slice is over) is fine on a single processor but not on multiple
> > processors.
>
> You cannot allow a thread with the scheduler lock to be preempted, it
> would leave the kernel in an undefined state. This is true regardless
> of whether this is a single or multiple CPU system. I am not sure what
> you really mean here.

By the time you finish reading this mail carefully and with patience :-) pretty long
mail,
you will find the answer.

> If you are talking about allowing more than one CPU into the scheduler
> at the same time, then this might be possible by using finer grained
> locking - for example a spinlock per mutex/semaphore and a spinlock
> per scheduler queue. But, this is a significant change to the way the

In the ideas mentioned i have taken case of a single prioirity queue, just to keep
things ]
simple, but to extend the ideas to multiple priority queues.. I think we will be
saying similar as above.
schedular  can run simultaneously on multiple processors, each dealing with a
different priority level
queue, w/o any botheration coz we don't have issues of aging etc here in ecos.

well... here are two of my ideas after discarding many others.. have been bit time
pressed these days and
didn't want to just post it w/o writing it properly. Still, if certain issues are not
clear, people are welcome
to comment/discuss.

-------------------------------------------------------------
PROBLEM -- If a process/thread is doing something critical during which it doesn't
want to be scheduled out, it takes a schedular lock, so even if it's time slice
gets over, it doesn't get scheduled out. even if schedular is invoked, no
processing of task queue is done.

On Single processor system, this approach is okay, but on multiprocessor system
it could lead to underutilisation of system and lower throughput. especially when
the duration of lock is comparable to timeslot duration. The idea being -- well, a
particular process/thread running on a particular processor doesn't wanna giveup the
processor for time being, but why should it stop schedular from scheduling tasks
on other processors.

In my proposed solution(s) I have tried to work with the  constraints of minimal
changes required in existing code/data-structures and also good resource utilisation.

Out of few solutions, few are discussed here in the order of improvement.
To keep things simple, only one task queue is considered and the solutions suggest
modifications/addendum in existing data structures/logic.

SOLUTION 1 -- This solution assumes that the schedular can run on any of the
processors, when the need arises, and it bothers about scheduling only on
that processor it is run on, i.e. if the schedular code is run on processor Pi
at some instant it bothers about scheduling only on that processor Pi and not
on any other Pj (j =/= i). Also assumed that time slices are not synchronous.

In this solution, NeedReschedule flag becomes an array of these flags, size
being equal to Number of Processors and similarly SchedularRunLock becomes an
array of size Number of Processors (these implementations could be bitvectors or
plain char arrays, whichever is better and convenient). A new lock (SchedSanity)
is introduced to keep sanity of schedular (task) queue, because schedular code
can run on any processor now, simultaneously.

Whenever schedular is run (either normally at the end of timeslice of the thread
running on processor Pi or because of Unlock call by thread on Pi) it also takes
Schedsaity before processing the queue. Other things remain as earlier.

The thing that is not nice here is requirement of large no. of locks, though a
suitable implementation can get around it with a "lock-vector" (bit vector),
bit positions assigned for a particular processor, exclusively.

Wrt scheduling functionality one may call it leading to SMP model, but I feel
it as distribution of scheduling, and it can equally be applied to ASMP model
as well.

SOLUTION 2 -- reduces the no. of locks required to a great extent but requires
schedular to be running on single designated processor.

Here are the assumptions/issues/implication/... in this solution no.2 ...

* schedular code runs only on one fixed processor.
  The implication of this is we don't need the lock to protect the schedular
  queue data structures (as in earlier soln), because only one entity is handling
  it now.

* It is assumed that when schedular looks at a process record, it has access
  to information about how much time out of it's slice, has elapsed, since it
  was scheduled. It should be possible to continue decrementing time value
  even after this value reaches zero.

* If time slices for various processes/threads are synchronous it will be for
  the betterment. Moreover the nature of this soln also contributes towards
  synchronising timeslices.

* Each time schedular runs, it checks for all the processors where new
  threads/processes can be scheduled. I mean here, it looks for the processors
  with threads on it that have reached the end of their time slices or need to
  be scheduled out for some reasons.

* In earlier soln, schedular could run on any processor and it was taking care
  of scheduling next thread on it only. There we needed a lock to prevent
  schedular from taking the processor away from the thread at the end of it's
  timeslice coz it was doing some critical work, in the middle of which it
  shouldn't be scheduled out.

  So the issue is somehow preventing our removal b/w lock-unlock if our time
  slice gets over. Moreover the single processor idea of "stopping schedular
  from running while you are doing something during which you don't want to
  give up processor to any other thread" is not healthy on multiple processor
  environments, as this is very likely to bring down the throughput.
  Here I am assuming no blocking during the period lock-unlock, coz it could be
  detrimental to render one/more processors unschedulable.

  In CURRENT SOLUTION we don't need these per-processor locks, but need a thread
  state value to indicate to the schedular that it shouldn't be disturbed even
  though it's time slice is over. Let me call it DNDEITSIO or in short as DND.

  If it could be possible to adjust this new state value, well and good, else
  new variable will have to be introduced in the corresponding data structures,
  looked upon by the schedular at scheduling time.

  WHO WILL SET THIS STATE VALUE? A "modified-lock call", as programmer
  shouldn't be allowed to acess schedular data structures directly.

  HOW IT WILL FUNCTION? Internally this call will take a Sched-Lock and set
  the DND state for the caller thread and then release the Sched-Lock.

  ** If we make sure that internally the pointer to it's entry in schedular queue
  is available to the thread, then setting this state value will be very small
  CONSTANT TIME OPERATON, so Sched-Lock is not held for long by this modified-lock
  call. On the other hand it will have to spend time looking for the thread
  entry (to find it and set the DND state) in the schedular queue, that is
  O(queue size) operation.

--> this modified-lock call will block for two reasons -- like schedular
already processing the queue or some other thread currently setting their
DND... state. I propose that this blocking should be treated differently than
normal blocking for IO/resource. I hope you get what i mean by this.

  ** Even schedular needs to take this lock before processing the queue. When
  schedular runs, it tries to schedule threads on all the available processors
  that are (either idle or) having currently running threads having finished
  their timeslices and DND... state not set and other scheduling constraints.
  [*See variations to this below*]
  * Schedular could be running at regular intervals.

  ** Similarly this state will be unset from DND... by modified-unlock call
  that will do little more than just clearing this DND... state as discussed
  below. To safeguard against race conditions this call will also take
  Sched-Lock internally.

This lock will be needed here to avoid race conditions -- if a thread manages
to set this state value it should be sure of not being disturbed. but if this
Sched-Lock is not there, race conditions can lead to inconsistencies.

WHAT HAPPENS WHEN MODIFIED-UNLOCK CALL IS MADE?
No problem when the timeslice of caller is not yet over.
But if it was over, issue arises -- what should be done with the caller?
already it has taken more time than was given to it.

There are various approaches to deal with this part.

[A] run the schedular again. But this could lead to schedular running again
    and again -- say many threads had managed to set the state DND.. and now
    they are unsetting their state one by one..
    This approach is not as pleasant as it entails lot of work on schedular
    side, that could possibly be avoided.

[B] This approach is worse than above (most of you will say), what about
    dividing scheduling intervals from T to (T/x) where x could be adjusted.
    So, in case the thread's time slice was over but DND.. was set, it could
    be allowed to run for maxm (T/x) duration and during this time it's
    timeslice will not be reset. modified-lock will further be modified not
    to set the state variable to DND.. if time spent is more than timeslice
    allocated already (easy check -- negative value, due to decrement at each
    time unit, normal stuff).

    In other words T --> (T/x) could be said finer granulation of time
    interval when schedular runs and timeslices will be multiple of it.

    Now there is this question, what if some process blocks on IO etc.. normal
    blocking behaviour, won't it be taken out and schedular run? YES! i haven't
    said no to that, only thing that i suggested was to reduce no. of
    schedular runs, in case of some thread having taken DND.... and couldn't
    be taken out even though it's timeslice expired.

    Combined with this fact (of schedular running when someone blocks on IO
    etc. kind of normal blockings as we are familiar with) the schedular will
    run after ((T/x)*i) again at whichever-occurs-first(((T/x)*(i+1)),someone
    blocking on IO/some normal resource)

    This version introduces bit of unfairness towards other processes, makes
    thing slightly complex, though reduces the number of time schedular has
    to process the queue. I am not in favour of this for the obvious reasons
    that you can easily see in this approach no. 2.
************************
[C] This is *approach* which is taking care of what-to-do-after-unlock issue
    in case time slice becomes zero during DND.. state set.

    When the schedular runs, we respect the DND.. state and do not take off
    that thread from processor, BUT.. (*HERE IS THE CRUX*) maintain an array
    CouldBerunOn[#processors] (initialised to all entries NULL before schedular
    starts processing queue) and for the corresponding processor entry store a
    pointer to process that could/should be run after current thread is taken
    off. Now when modified-unlock unsets DND.. state, it checks the entry in
    this array corresponding to the processor the caller thread is running on
    and if it is non-NULL (implies caller thread's timeslice already over)
    replaces this thread by next one pointed by the entry. So this doesn't
    require running of schedular and processing of entire queue and scheduling
    of next thread is quick. moreover this approach is not biased like
    approach no. [B].

    Please note that we check for non-NULLity of the array entry coz before
    processing the queue schedular sets all entries to NULL and it is set to
    non-NULL only for the cases where timeslice is over but the thread
    couldn't be taken out.

    Now, your next question is, thread may remain in DND.. state for one/more
    schedule/timeslice cycle, this will hurt the process/thread
    CouldBeRunOn[..]. It won't happen coz if that thread couldn't run because
    of above reason then it is still available for scheduling and it will be
    scheduled somewhere else in next schedular run.

--------------------------------------------------------------------------
--
regards
sandeep