Synchronization problem with posix_spawn

Corinna Vinschen corinna-cygwin@cygwin.com
Mon Aug 3 11:22:39 GMT 2020


On Aug  2 12:21, Ken Brown via Cygwin-developers wrote:
> On 8/2/2020 10:50 AM, Corinna Vinschen wrote:
> > On Aug  2 13:55, Corinna Vinschen wrote:
> > > [Moving this discussion to cygwin-developers]
> > > On Jul 31 10:10, Corinna Vinschen wrote:
> > > > On Jul 30 19:04, Ken Brown via Cygwin wrote:
> > > > > On 7/30/2020 1:17 PM, Corinna Vinschen wrote:
> > > > > > On Jul 30 13:59, Corinna Vinschen wrote:
> > > > > > > On Jul 29 19:12, Ken Brown via Cygwin wrote:
> > > > > > > > On 7/29/2020 4:17 PM, Ken Brown via Cygwin wrote:
> > > > > > > > > posix_spawn(p) returns before the spawned process is fully up and
> > > > > > > > > running.  [...]
> > > > > > > > I just took a look at the source, and I see that posix_spawn
> > > > > > > > was taken from FreeBSD.
> > > > > > > > [...]
> > > > > > > 
> > > > > > > Actually, this is a Cygwin problem.
> > > > > > > [...]
> > > > > > > IOW, we need a Cygwin-specific do_posix_spawn() using fork(2)
> > > > > > > in conjunction with some synchronization the BSD function
> > > > > > > gets "for free" by using its specific vfork(2).
> > > > > > 
> > > > > > Below is a POC implementation for a Cygwin-specific do_posix_spawn().
> > > > > > [...]
> > > > > > Can you give it a try?
> > > > > 
> > > > > It looks like something further is needed: 'wait' doesn't seem to recognize
> > > > > the spawned process.
> > > > 
> > > > Oh well.
> > > > [...]
> > > 
> > > I attached another patch.  This one is designed from the ground up and
> > > I *think* it works as desired.  I added lots of comments so the idea
> > > behind this patch should be clear enough.
> > > 
> > > Please give it a try.
> > 
> > Version 2 of the patch attached.  It occured to me belatedly, that
> > parent and child have to be synchronized prior to calling WFMO in
> > the parent.  Otherwise OpenProcess in __posix_spawn_sem_wait_and_close
> > may end up opening the exec'ed process rather than the forked child.
> 
> LGTM, and passes all the tests I could throw at it.
> 
> FYI, I noticed the posix_spawn problem because I've been regularly testing
> my FIFO code by running the test suite from the "casual" project
> (https://bitbucket.org/casualcore/).  That project uses FIFOs extensively,
> and it also uses posix_spawn.  I only realized a few days ago that some of
> the test failures I had been seeing had nothing to do with FIFOs but rather
> stemmed from the posix_spawn issue that I reported.  Those tests now all
> pass with your patch.

Great, I pushed version 3 of the patch (with an additional minor fix).
Can you check this version again, too?

> For the sake of my education, could you explain what made you decide that a
> semaphore was the right kind of synchronization object for this problem?
> (If that doesn't have an easy answer, don't worry about it.)

The problem was synchronization plus having a way to propagate an error
code up to the parent.  The semaphore value can transport information by
the fact that it can be set to any value 0 <= X <= INT_MAX.  So the
semaphore value after WFMO contains the error code "for free".

GLibc uses a blocking pipe as synchronization object, as well as
transport medium for the error code.  I guess I could have done the
same, but I don't trust Windows pipes too much...


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer


More information about the Cygwin-developers mailing list