This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: rsync over ssh hang issue understood

Lev Bishop wrote:
On 6/28/06, Darryl Miles wrote:
See how-to-debug-cygwin.txt

Thanks for your pointers. Everything I'm wanting to get started is already covered in the how-to-debug-cygwin.txt.

indications from select(2) interface. But if no worker thread is busy
working on that fd then you get writability back ?

Yes, but it is very hard to get the precise unix semantics. For example, the application issues a write() which spawns off a thread that then blocks. Then the application exit()s, causing the thread to also terminate before completing its write, and the write never completes.

This is a very valid point, but not one that is a problem in the situations I'm looking at. The situation I am looking as it much more chronic.

How does Overlapping I/O get around this, since you have send the data into the kernel layer and are now waiting on a completion notification or event signalling. If the application holding the handle exits from under it, does Win32 kernel abort the I/O in this circumstance ?

What about if this was gotten around via a fork() but not at every I/O but only if we exit and there is an incomplete I/O operation still in progress. Can we:

* fork()
* reaquire handle, as per dup()
* CloseHandle() from dying process
* receive IO completion callback with indication of failure, handle was closed!
* hand data over to the child (of fork()) for it to take up the mission.

Maybe there is a resident part of cygwin that could take up the mission, since a named pipe can be obtained by any process on the system. This resident part is a process outside of the lifecycle of the emulated POSIX processes.

It still would not be perfect but I can't think of any situation that would use a single write call (as two writes would be allowed to cause blocking) and the data must reliability make it to the reader, but once written the writer exited. Pretty rare if you ask me. Even when it was queued into a POSIX kernel there is no guarantee the reader will read it, it might sit in the buffer. Applications that need that guarantee would round trip the other end of the pipe to be sure.

At least we should be able to _DETECT_ that incomplete pipe writing I/O is still in progress when a process exits. So maybe we can log a warning and pickup any real problem from there. Rather than thinking too deeply about that rare case.

There is also the issue of what return value to give the application
doing the write() on the pipe. You'll have to be careful to deal with
error conditions, SIGPIPE, etc, etc.

As cgf put: | If I understand the plan correctly, in the scenario where select says | it's ok to write but it really isn't, the write would return as if it | succeeded and a writer thread would be created which sits around | trying to empty the pipe.

This is _EXACTLY_ the problem as I see. We have to deal with those rules, if the OS can't tell us in a reliable way that a write() will work.

The writer thread sits around trying to fill the pipe, would be more correct.

There maybe other ways to deal with that write() but as far as I understand the NT kernel does not provide a true non-blocking mechanism to work from with pipes. This is where you can offer to the kernel the data and if the buffers are full the kernel will reject the data without blocking leaving the application holding it. Overlapped I/O as I understand it does not work like this.

I have read the Overlapped I/O model as documented, but in my (limited) understanding of Overlapped I/O is that the call to WriteFile()/WriteFileEx() can still block (and it probably will under the pipelined conditions of rsync+ssh) when the kernel can't queue new requests.

I have not read this anywhere but surely everyone can appreciate that an application can't keep doing continuous overlapped I/O into the kernel and expect to get back an ERROR_IO_PENDING everytime without it ever blocking the applications call. Something has to block or the kernel has to give back another error equivalent to EAGAIN of POSIX. As I can't see any EAGAIN equivalent I presume it must block where the data rate of the writer is faster than the reader end of the pipe.

This is not true non-blocking IO as I see it. So there is actually no non-blocking API unless you use PIPE_NOWAIT, for which there is a big fat warning not to use. Nature did not intend PIPE_NOWAIT to exist.

As cgf writes: | The idea of using threads for pipe writing has been bounced around for | a long time. It doesn't solve the select problem if there are | multiple processes writing to one pipe. That is not a completely | unusual event, unfortunately.

I dont see the problem here, each writing process will have its own worker thread taking the block.

But to pickup with the point here.

The problem is between the select/poll/read/write event notification system within the same application. We need to ensure when we signal writability on a pipe via the select/poll event mechanism that some work appears to be getting done at the next write() call. Maybe we can return 0 ? So at least we didn't block, the application has to already deal with partial writes when in O_NONBLOCK anyway. In a real POSIX system it would never return 0 and always at least PIPE_BUF, but this may still be less than the 64Kb chunk the application was trying do in the first place.

When in blocking mode we can return EINTR (a ficticious signal notifcation) but then we run into problem where the application has blocked signals, but what about signals outside the scope of POSIX, like Linux RT signals. What I'm saying buy this there maybe some signals that can not be blocked anyway so EINTR may still be valid. But there is probably lots of application code which does not expect EINTR when it has already blocked all the signals it can think of.

Ah ha! Eureka moment....

What about if all pipe write operations used overlapping I/O and was FIFO serialized within cygwin. I believe the WriteFileEx() can return TRUE when the I/O went through first time and ERROR_IO_PENDING when its going to signal completion later. This sticks with the always make a private copy of the POSIX application's data buffer in plan A, so thats a double buffered throughput loss for every write. Ah well.

If we get TRUE back there is no problem, business as usual next time, if we get FALSE back with ERROR_IO_PENDING we consider that I/O to be an outstanding write on a pipe and we revoke the writability status in select.

We then call WaitForSingleObject() for the I/O completion (or we have a completion function do that work), when we get I/O completion we allow the next I/O from the FIFO through the gate. If there was no more I/O in the FIFO we set write_ready=true and wakeup select's.

This model does not rely on over-writing to find the call that would block to be able to revoke writability. It just uses the IO completion mechanism of overlapping IO which is how nature intended.

If throughput becomes a problem it maybe possible to apply heuristics with a guesstimate of the amount of OVERLAPPING IO the kernel can buffer before blocking. Then instead of only one I/O per fd per process we could account for the amount of outstanding bytes and revoke writability based on that threshold figure. This way multiple overlapped IOs can be outstanding in the kernel before throttle it with select. But for now I just want get back to a working app.

If the POSIX pipe is in blocking mode we _deliberatly_ make it block until it gets completion signalled. If its non-blocking mode and we have already revoked writability we return EAGAIN.

Thanks for your replies.

I have started to write WIN32 application code to help me completely understand the various windows IO models and NamedPiped implementation in detail. So there can be some solid ground for me to tweak the proposal based on the rules in play with the NT kernel.


Unsubscribe info:
Problem reports:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]