This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] malloc: Run fork handler as late as possible [BZ #19431]
- From: Torvald Riegel <triegel at redhat dot com>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Wed, 13 Apr 2016 14:55:11 +0200
- Subject: Re: [PATCH] malloc: Run fork handler as late as possible [BZ #19431]
- Authentication-results: sourceware.org; auth=none
- References: <56BBAF3D dot 5030905 at redhat dot com> <1455559833 dot 20971 dot 11 dot camel at localhost dot localdomain> <56C5EE32 dot 1020605 at redhat dot com> <1460485015 dot 3869 dot 267 dot camel at localhost dot localdomain> <570D4944 dot 7070501 at redhat dot com>
On Tue, 2016-04-12 at 21:15 +0200, Florian Weimer wrote:
> On 04/12/2016 08:16 PM, Torvald Riegel wrote:
> > On Thu, 2016-02-18 at 17:15 +0100, Florian Weimer wrote:
> >> On 02/15/2016 07:10 PM, Torvald Riegel wrote:
> >>>> It explicitly encodes the lock order in the implementations of fork,
> >>>> and does not rely on the registration order, thus avoiding the deadlock.
> >>>> I couldn't test the Hurd bits, but the changes look straightforward enough.
> >>> Are those changes, and thus the new scheme, documented anywhere?
> >> Fork handlers used by the implementation should be invisible to
> >> applications. The old one wasn't, and this was a bug.
> > But have you documented this and your understanding of the
> > synchronization scheme anywhere? Your explanation in the email with the
> > patch seems more detailed than the comments in the code.
> That's because I'm talking mainly about removed code and explaining a
> bug. I'm not sure how this information will be useful to future
> developers once the bug is gone. We have a regression test, which
> should avoid reintroducing precisely the same bug. For similar issues,
> we need to rely on code review and collective memory.
But you do have a new scheme, right, or are doing things in such a way
that what was formerly a bug now doesn't matter? You arrived through
some information on the conclusion that the new scheme works correctly;
isn't this worth documenting? It doesn't seem to be obvious. For
example, what about saying that fork handlers should be invisible to the
application or otherwise you'll get problem X?
Collective memory is not ideal, especially when the people working on a
> >>> I think it's really worthwhile to always update and improve the
> >>> documentation, especially if we're kind of (re)discovering knowledge
> >>> about the current implementation such as in this case. I know this is
> >>> extra work, but if this isn't documented, it's less likely anybody but
> >>> you will be aware of how this all works.
> >> Do you mean something like this inside the fork implementation?
> >> /* Acquire malloc locks. This needs to come last because fork
> >> handlers may use malloc, and the libio list lock has an indirect
> >> malloc dependency as well (via the getdelim function). */
> >> __malloc_fork_lock_parent ();
> > Not quite. IIRC this is one comment in one part of the code, but it's
> > not easy to find it from the other parts of code that are related.
> > One of the main motivations to document the synchronization scheme is to
> > avoid requiring future developers that want to modify the code to
> > closely review all the code (eg, to look for comments such as the one
> > above).
> The cycle isn't there anymore, so it would not particularly helpful to
> mention fork handlers in a comment on _IO_flush_all_lockp.
> I would even go one step further: It would be misleading because it
> implies that we are confident about the libio locking behavior and lack
> of cycles.
Then why not put that into a note? A simple TODO or XXX would suffice I
suppose, and would alert others that there's more that needs to be
reviewed. It would be bad if we can only document perfect information.
As long as we're honest about how complete/accurate we think the
information is, providing an intermediate snapshot sounds fine to me.
> But libio calls user-supplied callbacks (set by fopencookie)
> while implementation locks are acquired (again in _IO_flush_all_lockp).
> Diagnostic functions in the malloc subsystem call into libio (via
> stdio streams) while malloc locks are acquired, and libio may in turn
> use malloc. I'm sure I can find more examples with more poking.
If you can think about those, why not put a comment into a TODO in the
code? Or create a bug. At least I tend to page out things, so if we
all do that our collective memory will page out things too :)
> > (The thinking is that, simplified, in sequential code it's
> > sufficient to look at the callers or the contract of your function,
> > whereas in concurrent code you don't easily know who is actually
> > affected by a particiular lock acquisition order.)
> With callbacks, you can't be sure even with sequential code.
Sure, and there are other cases in sequential code when a logical
operation is spread across several functions. But even in those cases
there are less possible interleavings than in a typical concurrent
execution, so the state space is smaller.
> > This is made easier if there's an overview of the synchronization
> > somewhere in the code, and it is referenced in affected parts of the
> > code.
> I can add some comments to libio with what I have found, but this
> doesn't really related to the locking cycle I removed (because again,
> it's no longer there).
I agree that we don't want to document why doing something arbitrary was
a bad idea. But if it may be something that might look sensible to do
at first sight, documenting why that doesn't work helps, I think.
I won't object to the patch if you don't add a comment, but my opinion
is that we should add more details and lessons learned as comments, so
that we don't have to rely on our memory as much.
> >>>> diff --git a/malloc/malloc-private.h b/malloc/malloc-private.h
> >>>> new file mode 100644
> >>>> index 0000000..2fef840
> >>>> --- /dev/null
> >>>> +++ b/malloc/malloc-private.h
> >>> Should this perhaps be malloc-internal.h to match libc-internal.h?
> >> We also have libio/libioP.h, and include/malloc.h. But I think a
> >> separate header makes sense if its contents is truly internal.
> >> What are the preferences here?
> > I think having an internal header is good, but I can't give a definitive
> > reply regarding what the project prefers.
> See my reply to Roland. I was under the impression you objected to the
I did object to the name, but simply because so far I had only seen
cases of -internal.h. I hadn't looked at math, crypt, and login, which
use _private.h or -private.h.