This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)

On Thu, Jul 16, 2015 at 12:27:10PM -0700, Andy Lutomirski wrote:
> On Thu, Jul 16, 2015 at 11:08 AM, Mathieu Desnoyers
> <> wrote:
> > ----- On Jul 14, 2015, at 5:34 AM, Ben Maurer wrote:
> >>
> >> That said, having the ability for the kernel to understand that TLS
> >> implementation are laid out using the same offset on each thread seems like
> >> something that could be valuable long term. Doing so makes it possible to build
> >> other TLS-based features without forcing each thread to be registered.
> >
> > AFAIU, using a fixed hardcoded ABI between kernel and user-space might make
> > transition from the pre-existing ABI (where this memory area is not
> > reserved) a bit tricky without registering the area, or getting a "feature"
> > flag, through a system call.
> >
> > The related question then becomes: should we issue this system call once
> > per process, or once per thread at thread creation ? Issuing it once per
> > thread is marginally more costly for thread creation, but seems to be
> > easier to deal with internally within the kernel.
> >
> > We could however ensure that only a single system call is needed per new-coming
> > thread, rather than one system call per feature. One way to do this would be
> > to register an area that may contain more than just the CPU id. It could
> > consist of an expandable structure with fixed offsets. When registered, we
> > could pass the size of that structure as an argument to the system call, so
> > the kernel knows which features are expected by user-space.
> If we actually bit the bullet and implemented per-cpu mappings, we
> could have this be completely flexible because there would be no
> format at all.  Similarly, if we implemented per-cpu segments,
> userspace would need to agree with *itself* how to arbitrate it, but
> the kernel wouldn't need to be involved.
> With this kind of memory poking, it's definitely messier, which is unfortunate.
Could you recapitulate thread? On libc side we didn't read most of it so
it would be appreciated.

If per-cpu mappings mean that there is a single virtual page that is
mapped to different virtual pages?

I had in my todo list improving tls access. This would help tls
implementations for older arms and in general architectures that dont
store tcb in register.

My proposal is modulo small constant equivalent of userspace accessing tid 
without syscall overhead, just use array of tcb's for first 32768 tids
and do syscall only when tid exceeds that.

On userspace my proposal would be use map that to fixed virtual address and store tcb in first eigth bytes. Kernel would on context switch along registers also
save and restore these. That would make tls access cheap as it would
need only extra load instruction versus static variable.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]