Nick Garnett nickg@cygnus.co.uk
Tue Apr 25 08:25:00 GMT 2000

Sergei Organov <osv@javad.ru> writes:

> Nick Garnett <nickg@cygnus.co.uk> writes:
> > Sergei Organov <osv@javad.ru> writes:
> >

> IMHO, how stack is organized and used is entirely HAL's responsibility. For me
> it seems to be possible to define HAL interface so that kernel will be
> insulated from stack direction considerations, but not possible to insulate
> HAL from these considerations. Besides the sole purpose of HAL is to isolate
> kernel from hardware details, and I believe stack direction is one of these
> hardware specific details. So I think it's better to don't use statements
> like, e.g, stack_limit+=size in the kernel, but to call the HAL in such places
> instead.

There is a delicate balance to be maintained between putting all
potentially hardware specific code into the HAL, and keeping it small
and easy to port. The code to manage the stack is in any case common
to all architectures, duplicating it in each HAL would be redundant
and error prone. The Cyg_HardwareThread class is intended to contain
all the code that is hardware configurable in this way.

The HAL was never intended to be a complete "virtual machine"
interface for the kernel, we always intended that the kernel be able
to configure itself to the underlying hardware. Stack direction is one
of these configuration points. Completely abstracting the hardware
would also required the HAL to maintain its own per-thread data
structure. Since the HAL is partly in C and partly in assembler, we
would have the difficulty of keeping two representations of this
structure in step. We already have to do that with the
HAL_SavedRegisters structure on some architectures, and it is not
easy.  We have avoided the need to maintain a per thread structure at
present by passing the addresses of kernel fields to HAL macros when

> I also don't see how updating of stack_limit by the HAL breaks the abstraction
> layers. For me it seems to be similar to the HAL_THREAD_INIT_CONTEXT
> and HAL_THREAD_SWITCH_CONTEXT both changing stack pointer passed to them by
> the kernel. Does it mean that abstraction layers are already broken by these
> two macros? Or am I missing something?

I'm talking about having HAL-level code access fields of kernel
defined data structures. To make this work we would need to export to
the HAL the offset of such fields in the thread structure. Since the
code in the HAL that needs to access this is mostly in assembler (in
the FP unavailable handler for example), this is not as simple as
including the headers.

The context macros are passed the address of the appropriate field
each time they are called, they have no knowledge of the layout of the
thread data structures.

> BTW, could you please explain what is the purpose of having
> Cyg_HardwareThread::increment_stack_limit() at all? I didn't find any
> references to this routine in the sources, just an implementation. Anyway
> it seems that the name of the routine doesn't match reality for architectures
> where stack grows up.

This is used by compatibility layers to allocate per-thread data
structures. The C library uses it at present, and the POSIX library
will do so in the future.

The naming of the routine is perhaps unfortunately short-sighted, but
since it is an internal API, we will just have to live with it. Rising
stack processors are sufficiently rare these days that I am not too
concerned about not being totally agnostic with respect to stack

> >
> > >
> > > As for FP support, while the issue with the interface is not resolved, I've
> > > decided to allocate space for FP context on the bottom of stack. For PPC it
> > > means all threads (even those that never use FP) need to have additional ~300
> > > bytes of stack. I'd like to have some graceful solution to the problem in the
> > > future though.
> > >
> >
> > The approach I intended to take was to "statically" allocate the first
> > FP save area during HAL_THREAD_INIT_CONTEXT() and then dynamically
> > choose to allocate further FP save areas on the stack, below the
> > standard save area, depending on whether the current thread was using
> > the FPU and whether its previous FP save area was already in
> > use. Such an allocation would be cheap to do, since it is just an
> > extra decrement of the SP, and could be done at the same time as
> > disabling the FPU for use detection. If the thread retains FPU
> > ownership then this space is unused, but that is benign, since it must
> > have enough space on the stack to allocate it anyway. This approach
> > also allows us to use FP operations in interrupts, DSR and exceptions
> > more easily.
> Please explain in more details what does it mean "statically"?
> Is it one global "static" area, or per-thread "static" area? Is it to be
> allocated on the thread stack? On which end of the stack?

I simply mean in HAL_THREAD_INIT_CONTEXT(), without any consideration
as to whether it will be used or not. It would be allocate at the top
of the stack just above the thread context (assuming a falling stack).

However, using cyg_thread_increment_stack_limit() will avoid having to
do this.

> >
> > However, I am still not convinced that we actually need to make this
> > change, and have become even less convinced as I have been writing
> > this message.
> >
> > I now think that the best approach to this is for the kernel to export
> > a C wrapper function: cyg_thread_increment_stack_limit() to the HAL
> > and for the HAL to call this when it needs to allocate the FP save
> > area. This makes the HAL->Kernel interface clean and well defined and
> > would be similar to existing functions exported from the kernel to the
> > HAL like cyg_interrupt_post_dsr() and interrupt_end().
> This may bring other complications though. HAL will need to allocate the FP
> area when it detects usage of FP by the thread. It means the routine should be
> called from the "fp unavailable" exception handler. Then it will be required
> to save all the registers that could be clobbered by the C routine and
> establish stack frame. Also, the routine should better claim to don't use FP.

This only needs to be done once per thread, when we first detect that
it used the FPU. We can afford this, it is still probably less costly
than the save/restore of the FPU state, which we are going to do much
more often. The routine itself will just be a call to the inline
increment_stack_limit(), and should be no more than a few

> This is apparently one possible approach, but letting HAL to handle all the
> stack details seems to be more straightforward. Well, at least at first
> glance.

We are now largely stuck with the data structures and APIs we already
have. It is less disruptive to add new interfaces (such as
cyg_thread_increment_stack_limit() ) than it is to make changes to
existing interfaces. If we could go back and start again, we would
probably choose to do what you suggest, together with adding a
HAL-defined thread data structure, but we no longer have that option.

Nick Garnett
Cygnus Solutions, a Red Hat Company
Cambridge, UK

More information about the Ecos-discuss mailing list