This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: Partial cores using Linux "pipe" core_pattern
On Thu, 2009-05-21 at 12:32 -0400, Paul Smith wrote:
> It _feels_ to me like there's some kind of COW or similar mismanagement
> of the VM for these forked processes such that they interfere and we
> can't get a full and complete core dump when all of them are dumping at
> the same time.
Well, my feelings were way off. I did more investigation and it turns
out that what's happening is that the TIF_SIGPENDING flag is being set
during the core dump. This causes the write to the pipe to stop, and
the core dumping code makes no effort to manage errors or partial
writes. Here's the function in binfmt_elf.c that does the write:
static int dump_write(struct file *file, const void *addr, int nr)
{
return file->f_op->write(file, addr, nr, &file->f_pos) == nr;
}
If we get back anything other than exactly the number of bytes we tried
to write, we give up and return false (0).
This definitely returns false when I see the short cores, and never when
I see "normal" cores. I modified it to see what it's getting back and
file->f_op->write() is returning ERESTARTSYS. So I annotated
fs/pipe.c:pipe_write() and I'm definitely getting it from this code, at
line 550 or so:
if (signal_pending(current)) {
if (!ret)
ret = -ERESTARTSYS;
break;
}
I've been posting on the linux-kernel mailing list, so this is really
just an FYI to anyone interested in following the progress; you can find
the current end of the thread here:
http://marc.info/?l=linux-kernel&m=124336093401443&w=2
So far I've failed to gain any interest from anyone on that list but
hopefully someone will respond, who can help me figure out what to do
next.
Cheers!