This is the mail archive of the
cygwin
mailing list for the Cygwin project.
readv() questions
- From: clayne at anodized dot com
- To: cygwin at cygwin dot com
- Date: Tue, 9 May 2006 00:44:20 -0700
- Subject: readv() questions
Warning - LONG and network code related - do not read if not interested
or not versed.
I'm trying to currently debug an issue where readv() seems to be filling
iovec's with bad data or otherwise overflowing when having to deal with a
large receive buffer. I say large receive buffer because I can replicate the
issue by writev()ing 1000+ iovec's on the sending side and readv()ing on
the cygwin side continously. I have multiply verified the sending side is
writev()ing the correct iovecs, with length intact and as I specified it -
however upon readv()ing the same back on the cygwin end, after a sporadic
number of data has been transfered (usually around 100 iovecs or so),
I get a spurious iovec filled with data I did not originally send out.
[ sending-side (Linux 2.6.9-22.0.1.EL) ]
--> writev() + write(variable length, sent in preceeded iovec)
--> 100mb uplink
--> 384/1500 dsl up/downlink
--> readv() + read(length derived from iovec received)
[ receiving-side (Cygwin 1.5.20s(0.155/4/2) 20060427) ]
The iovec itself is small, 13 bytes:
1 byte (total length)
1 byte (variable length)
1 byte (flag)
2 byte (header data)
4 byte (header data)
4 byte (header data)
On the sending side I writev() to the network stack, and then immediately
issue another write() afterwards containing the variable length data,
which I stored in the header (iovec[1]). On the receiving end, same deal,
just reverse. readv(), passing a char * to iovec[1], and relying on readv()
to fill it with the correct data received - which I then use as a length
to read() to get the variable length data following.
A few things:
1. Sanity test, nobody sees anything wrong with this fairly standard
procedure, correct?
2. What exactly is the purpose of dummytest() within
/winsup/cygwin/miscfuncs.cc?
The call to check_iovec_for_read from within readv():
440 extern "C" ssize_t
441 readv (int fd, const struct iovec *const iov, const int iovcnt)
442 {
443 extern int sigcatchers;
444 const int e = get_errno ();
445
446 int res = -1;
447
448 const ssize_t tot = check_iovec_for_read (iov, iovcnt);
check_iovec_for_read is a macro defined as:
winsup.h:#define check_iovec_for_read(a, b) check_iovec ((a), (b), false)
The actual check_iovec() call with preceeding dummytest():
162 static char __attribute__ ((noinline))
163 dummytest (volatile char *p)
164 {
165 return *p;
166 }
167 ssize_t
168 check_iovec (const struct iovec *iov, int iovcnt, bool forwrite)
169 {
170 if (iovcnt <= 0 || iovcnt > IOV_MAX)
171 {
172 set_errno (EINVAL);
173 return -1;
174 }
175
176 myfault efault;
177 if (efault.faulted (EFAULT))
178 return -1;
179
180 size_t tot = 0;
181
182 while (iovcnt != 0)
183 {
184 if (iov->iov_len > SSIZE_MAX || (tot += iov->iov_len) > SSIZE_MAX)
185 {
186 set_errno (EINVAL);
187 return -1;
188 }
189
190 volatile char *p = ((char *) iov->iov_base) + iov->iov_len - 1;
191 if (!iov->iov_len)
192 /* nothing to do */;
193 else if (!forwrite)
194 *p = dummytest (p);
195 else
196 dummytest (p);
197
198 iov++;
199 iovcnt--;
200 }
201
202 assert (tot <= SSIZE_MAX);
203
204 return (ssize_t) tot;
205 }
Lines 190 to 196 seem completely pointless to me unless I'm missing
something, which I believe to be the case here. Can someone explain it? Due
to the use of volatile and the explicit noinline attribute, I have a
feeling it's some form of memory assertion - but why?
Anyways, the cases where the situation *does not* happen are if I run it
under strace (which smells of a race) or if I throttle the data manually
by only sending a set amount and then requesting ack from the receiving
side (which I use the flags var for). If I go full unthrottled, no acks,
standard write it all to wire, read it all from wire - the s* hits the fan.
What I believe is causing the issue is an MTU related problem. It almost
always seems to get into weirdness right around 1452 bytes transfered. I
have verified, via Ethereal, that my assertions fail (which are checking
the variable length stored in the header I sent == what is stored in the
received iovec) when readv() reads data at the border of a TCP packet in
the stream (i.e. the next portion of an iovec or the next iovec entirely is
in the next packet). Ethereal also verifies that the data sent is exactly
as I had placed it on the sending stack via writev() from sending host.
Ethereal also verifies that the problems occur as iovec data or iovecs
within the array passed to readv() span TCP packets.
I'm slowly going through the code, which can be a mission, but I'm beginning
to wonder if this section:
219 void
220 fhandler_base::raw_read (void *ptr, size_t& ulen)
221 {
222 #define bytes_read ulen
223
224 HANDLE h = NULL; /* grumble */
225 int prio = 0; /* ditto */
226 DWORD len = ulen;
227
228 ulen = (size_t) -1;
229 if (read_state)
230 {
231 h = GetCurrentThread ();
232 prio = GetThreadPriority (h);
233 SetThreadPriority (h, THREAD_PRIORITY_TIME_CRITICAL);
234 signal_read_state (1);
235 }
236 BOOL res = ReadFile (get_handle (), ptr, len, (DWORD *) &ulen, 0);
237 if (read_state)
238 {
239 signal_read_state (1);
240 SetThreadPriority (h, prio);
241 }
242 if (!res)
243 {
244 /* Some errors are not really errors. Detect such cases here. */
245
246 DWORD errcode = GetLastError ();
247 switch (errcode)
248 {
249 case ERROR_BROKEN_PIPE:
250 /* This is really EOF. */
251 bytes_read = 0;
252 break;
253 case ERROR_MORE_DATA:
254 /* `bytes_read' is supposedly valid. */
255 break;
256 case ERROR_NOACCESS:
is culprit... There *are* some relatively spooky looking calls in there,
coming from a POSIX perspective.
But according to my MS API docs on ReadFile - it shall not return until
it has read the number of bytes requested (or times out, specified
through SetCommTimeouts I believe - although I do not see it used under
fhandler_base. I presume there is another way through the win32 API when
using sockets?):
"If hFile is not opened with FILE_FLAG_OVERLAPPED and lpOverlapped is NULL,
the read operation starts at the current file position and ReadFile does
not return until the operation is complete, and then the system updates
the file pointer."
ERROR_MORE_DATA is not surprisingly defined as:
"ERROR_MORE_DATA: More data is available."
The API references it here:
"If a named pipe is being read in message mode and the next message is
longer than the nNumberOfBytesToRead parameter specifies, ReadFile returns
FALSE and GetLastError returns ERROR_MORE_DATA. The remainder of the message
may be read by a subsequent call to the ReadFile or PeekNamedPipe function."
However this applies to named pipes - not necessarily sockets. But I'm
weary of this section:
253 case ERROR_MORE_DATA:
254 /* `bytes_read' is supposedly valid. */
255 break;
Mainly because I do not see anywhere where there is an explicit check in the
form of:
if (len != bytes_read) /* bytes_read is really ulen */
handle_problem();
Let's just throw out the wild assumption that win32 does something funky
when data requested via ReadFile() spans an MTU size or resides in a
following TCP packet associated with the stream - throwing an error and
saying ERROR_MORE_DATA. An example case being mine where I request 13
bytes and we get 2 for instance. Upon returning from raw_read(), not much
is done in the way of error checking there either:
Within fhandler_base::read():
725 raw_read (ptr + copied_chars, len);
726 if (!copied_chars)
727 /* nothing */;
728 else if ((ssize_t) len > 0)
729 len += copied_chars;
730 else
731 len = copied_chars;
732
733 if (rbinary () || len <= 0)
734 goto out;
My actual readv() wrapping code is very basic and standard, so I don't think
it's doing anything evil or causing a problem:
400 size_t n_recv_iov(int s, const struct iovec *v, size_t c, int tout)
401 {
402 size_t br;
403 int res;
404 struct timeval to;
405 fd_set fds, fds_m;
406
407 FD_ZERO(&fds_m);
408 FD_SET(s, &fds_m);
409
410 while (1) {
411 fds = fds_m;
412 to.tv_sec = tout;
413 to.tv_usec = 0;
414
415 if ((br = readv(s, v, c)) == (size_t)-1) {
416 switch (errno) {
417 case EWOULDBLOCK:
418 case EINTR:
419 break;
420 default:
421 perror("readv");
422 return -1;
423 }
424 } else {
425 break;
426 }
427
428 if ((res = select(s + 1, &fds, NULL, NULL, &to)) == 0)
429 return -1; /* timeout */
430 else if (res == -1) {
431 perror("select");
432 return -1; /* never happen */
433 }
434 }
435
436 return br;
437 }
And my call to it is basic as well:
61 IOV_SET(&packet[0], &byte_tl, sizeof(byte_tl));
62 IOV_SET(&packet[1], &byte_vl, sizeof(byte_vl));
63 IOV_SET(&packet[2], &byte_flags, sizeof(byte_flags));
64 IOV_SET(&packet[3], &nbo_s, sizeof(nbo_s));
65 IOV_SET(&packet[4], &nbo_t_onl, sizeof(nbo_t_onl));
66 IOV_SET(&packet[5], &nbo_t_ofl, sizeof(nbo_t_ofl));
67
68 for (error = 0; !error; ) {
69 error = 1;
70
71 if ((hl = n_recv_iov(s, packet, NE(packet), 60)) == (size_t)-1)
72 break;
73
74 assert(byte_vl < sizeof(byte_var));
75
76 if ((vl = n_recv(s, byte_var, byte_vl, 60)) == (size_t)-1)
77 break;
78 if (hl == 0 || vl == 0)
79 break;
80
81 error = 0;
82
83 /* process_data(); */
84 }
Sorry for the ultra mail, but I know for a fact that readv() on cygwin is
doing bad things when faced with a lot of data to read from the wire. Any
insights?
-cl
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/