This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Linux kernel problem -- food for thoughts

From: Daniel Jacobowitz <drow at mvista dot com>
To: Elena Zannoni <ezannoni at redhat dot com>
Cc: gdb at sources dot redhat dot com, roland at redhat dot com
Date: Wed, 16 Apr 2003 10:28:11 -0400
Subject: Re: Linux kernel problem -- food for thoughts
References: <16029.26499.985342.118733@localhost.redhat.com>

On Wed, Apr 16, 2003 at 10:24:03AM -0400, Elena Zannoni wrote:
> 
> Gdb is currently having a 'little problem' backtracing out of system
> calls in x86 kernels which support NPTL. I think the current public
> 2.5 kernel would make this problem show up.
> 
> Right now, if you are in system calls the backtrace will show up as:
> 
>  0xffffe002 in ??

I was just thinking about this.  My reaction is:
  - the page needs to be readable; I vaguely remember badgering Linus
about this and getting it fixed, but it might have been someone else,
or it might not have gotten fixed.
  - GDB needs to get the location of the EH information from glibc
somehow.  My instinct is to make glibc export this in a global symbol,
just like the way we get signal numbers from linuxthreads.

How does that sound?


Note that we don't use eh information on i386 yet.  We need to fix
that.  I tried once and got distracted by another project, I think :)

> 
> Here is an explanation of the problem that Roland has provided:
> 
> ---------------
> Previously asm or C code in libc entered the kernel by setting some
> registers and using the "int $0x80" instruction.  e.g.
> 
> 00000000 <__getpid>:
>    0:	b8 14 00 00 00       	mov    $0x14,%eax
>    5:	cd 80                	int    $0x80
>    7:	c3                   	ret    
> 
> That is the function called __getpid in libc, the pre-NPTL build.  (In the
> shared library you will see this if you've run with LD_ASSUME_KERNEL=2.4.1
> so that /lib/i686/libc.so.6 is what you are using.)
> 
> In the new libc (/lib/tls/libc.so.6), that function looks like this:
> 
> 00000000 <__getpid>:
>    0:	b8 14 00 00 00       	mov    $0x14,%eax
>    5:	65 ff 15 10 00 00 00 	call   *%gs:0x10
>    c:	c3                   	ret    
> 
> %gs:0x10 is a location that has been initialized to a kernel-supplied
> special entry point address.  In the current kernels, that address is
> always 0xffffe000.  But that is not part of the ABI, which is why it's
> indirect instead of a literal "call 0xffffe000".  The kernel supplies the
> actual entry point address to libc at startup time, and nothing in the
> kernel-user interface prevents it from using a different address in each
> process if it chose to.
> 
> The reason for this is that there can be multiple ways to enter the kernel,
> not just the "int $0x80" trap instruction.  Some kernels on some hardware
> may use a different method that performs better.  By using this
> kernel-supplied entry point address, no user code has to be changed to
> select the method.  It's entirely the kernel's choice.
> 
> In all the RH kernels we have right now, the entry point page contains:
> 
> 	0xffffe000:	int $0x80
> 	0xffffe002:	ret
> 
> But user code cannot presume what this code sequence looks like exactly.
> It will be some sequence of register and stack moves and special trap
> instructions, but you have to disassemble to know exactly.  In the case
> above, the PC value seen while a thread is in the kernel is 0xffffe002.
> You can disassemble the "ret" there and see that you have to pop the PC off
> the stack to recover the caller's frame.  
> 
> Another example of what this code might look like when you disassemble it is:
> 
> 	0xffffe000:	push   %ecx
> 	0xffffe001:	push   %edx
> 	0xffffe002:	push   %ebp
> 	0xffffe003: 	mov    %esp,%ebp
> 	0xffffe005: 	sysenter 
> 	0xffffe007:	nop    
> 	0xffffe008:	nop    
> 	0xffffe009:	nop    
> 	0xffffe00a:	nop    
> 	0xffffe00b:	nop    
> 	0xffffe00c:	nop    
> 	0xffffe00d:	nop    
> 	0xffffe00e: 	jmp    0xffffe003
> 	0xffffe010:	pop    %ebp
> 	0xffffe011:	pop    %edx
> 	0xffffe012:	pop    %ecx
> 	0xffffe013:	ret    
> 
> In this example, depending on what happened inside the kernel the PC you
> usually see may be either 0xffffe00e or 0xffffe010.  If the process gets a
> signal or you attach asynchronously or so forth, the PC might be at any of
> the earlier instructions as well.  You cannot rely on exactly what the
> sequence is, so you must be able to disassemble from where you are and
> cope.  In this case you will most often see 0xffffe010, in which case you
> need to pop those three registers and the PC off the stack to restore the
> caller's frame.
> 
> So, these cases are like a leaf function with no debugging info.  The
> first solution idea was interpreting the epilogue code.  It will
> probably be safe to assume that it looks like epilogue code normally
> does, i.e. register pops and not any arbitrary instructions.
> 
> Another solution I was considering is to have the system somewhere provide
> DWARF unwind info matching the possible PC addresses in the vsyscall page.
> I am now pretty sure this is the way to go.  The recent development is that
> NPTL now needs .eh_frame information for these PCs as well, and Ulrich has
> made a kernel change to provide it.  The .eh_frame info for the vsyscall
> PCs is on the same read-only kernel page.  The C library now uses this as
> if the vsyscall page were a DSO with .eh_frame info to register, so that
> exception-style unwinding from any valid PC in a magic entry point works.
> 
> So, there is a .eh_frame section available for this code, and getting it
> from where it is into gdb can be done by hook or by crook.  I have the
> impression that gdb turning an available .eh_frame section into happy
> backtraces is something that might be expected real soon now.  
> Sounds like a winner.
> 
> I think that elucidates all but the dreariest bits of the technical issues.
> Now the practical questions.  Oh, one dreary bit: 83172 mostly talks about
> the fact that ptrace refuses to read the 0xffffe000 page for you, which is
> presumed a prerequisite for dealing with the real can of worms (unwinding).
> 
> --------------------
> 
> 
> I think right now the public 2.5 kernel has a fix to make the page
> readable, and another one to provide the .eh_frame information. There
> is no mechanism yet to make that debug info accessible to gdb.
> 
> 
> elena
> 

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

Follow-Ups:
- Re: Linux kernel problem -- food for thoughts
  - From: Elena Zannoni
- Re: Linux kernel problem -- food for thoughts
  - From: Roland McGrath

References:
- Linux kernel problem -- food for thoughts
  - From: Elena Zannoni

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]