This is the mail archive of the ecos-devel@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Bugs in eCos SMP scheduler


hi nick,

Of late, I observed a new situation related to scheduler, where sched_lock count was becoming negative (0xFFFFFFFF/0xFFFFFFFE) and the holder of lock was HAL_SMP_CPU_NONE (i.e. none of the processors was the owner of schedlock).

well, cause of that is --
though sched_lock incrementing process makes sure that only owner can increment the count, zero_sched_lock/set_sched_lock/get_sched_lock don't respect the notion of owner (processor) of sched_lock.
This, also introduces race conditions in the system, and results are obvious.


consider a sample situation of two processor configuration involving threads T1, T2, T3, T4, ... running on system comprising of processors P1 and P2.

- currently no processor owns the lock and sched_lock count is 0.
- T1 (on P2) completes it's excution of thread_entry and later in user specified
  thread entry function, takes the scheduler lock (owner = P2, count = 1)
- T2 (on P1) is in it's execution in thread_entry function and executes
  zero_sched_lock (owner = NONE, count = 0)
- T1 (on P2) unlocks the scheduler  __AND__ scheduler lock is -1 (0xFFFFFFFF,
  considering 32-bit data-type for it), owner is NONE.

another variation of previous scenario could be --

- T1 (on P2) takes sched lock (owner = P2, count =1)
- T2 (on P1) executed zero_sched_lock (owner = NONE, count = 0)
- T1 (on P2) takes another sched lock (owner = P2, count = 1)
  ** count should have been 2 **
- T1 (on P2) unlocks the scheduler (causes it to enter unlock_inner and choose
  another thread to run.
  ** scheduling shouldn't have happened at this point **
- next time when T1 is in on any processor, it continues with it's second
  scheduler unlocking (that decrements current sched lock value, irrespective of
  anyone else being owner) and messup continues..

  current code of sched-lock incrementing, does a lock++ when it becomes the
  owner of lock for the first time (instead of setting it to 1), hence in case
  sched_lock value had become -1 in previous case and NO CPU was owner, then it
  will become 0, rather than 1 and system is in mess.

  for this small aspect, fix is to replace "lock ++" by "lock = 1", but still
  the larger problem remains.

Possible solutions (in part) could be --

* I change zeroing sched_lock process to check for - if the current processor executing this code is the owner of sched lock, and only in that case proceed with zeroing.

but it breaks the notion of - every eCos thread starting with sched_lock value of 0 (a notion carried from NO-SMP eCos) --- impact of this??? might not be any.

Unless, I am missing something stupidly, decent amount of changes might be required for SMPising of eCos, ranging from changes to unlock_inner to situation that it might not been possible to extend NO-SMP eCos scheduling model to SMP.

I have sat over the observations and tried to analyse a bit before keying in this mail today, but I am still considering my flaws in analysis/understanding of eCos. but the mentioned in this mail (and others not mentioned) observations can't be explained in other ways, atleast as of now.

For saving reader's time, SMP startup flow is as follows --

Cyg_scheduler :: start ()
--> HAL_SMP_CPU_START
--> cyg_hal_smp_start
--> cyg_hal_smp_startup
--> cyg_kernel_smp_startup (takes scheduler lock and calls start_cpu)
--> start_cpu (gets next thread to schedule and loads it)
--> thread_entry (zeroes scheduler lock and calls actual thread entry point specified by the user during thread creation)


--
regards
sandeep
--------------------------------------------------------------------------
Walk softly and carry a megawatt laser.
--------------------------------------------------------------------------


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]