This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Implement IPv6 support for GDB/gdbserver


On Wednesday, June 20 2018, Pedro Alves wrote:

> On 06/15/2018 01:24 AM, Sergio Durigan Junior wrote:
>> Changes from v1:
>> 
>> - s/hostnames/addresses/ on NEWS.
>> 
>> - Simplify functions on netstuff.c.  Add new defines for
>>   GDB_NI_MAX_ADDR and GDB_NI_MAX_PORT.  Make
>>   parse_hostname_without_prefix return a struct parsed_hostname.
>> 
>> - Use AF_UNSPEC instead of AF_INET by default on unprefixed
>>   connections.
>> 
>> - Simplify and modernize things on gdbreplay.c.
>> 
>> - Implement new GDB_TEST_SOCKETHOST mechanism for testing things with
>>   any type of hostname/address.
>> 
>> - Simplify things on boards/*.exp because of the above.
>
> Any thoughts on a gdb.server/ smoke test that connects with
> tcpv4, tcpv6, etc?  I still think it's well worth it, though
> I'm OK with not having it in this patch.

First, thanks for the review.

I'd like to do it, but I couldn't figure a way to do this smoke test.
Maybe you mean "just try to connect and see if GDB can successfully
parse the hostname"?  Sorry, I'm a bit lost here.

>> 
>>   $ ./gdb -ex 'target extended-remote localhost:1234' ./a.out
>> 
>> the user would notice a somewhat big delay before GDB was able to
>> connect to the IPv4 socket.  This happened because GDB was trying to
>> connect to the IPv6 socket first, and had to wait until the connection
>> timed out before it tried to connect to the IPv4 socket.
>
> Let me try to clarify a detail here -- AFAICS, this auto-retry scenario,
> does not kick in when at the socket level gdb gets a time out (ETIMEDOUT).
> AFAICT, the auto-retry mechanism only kicks in when the connection is
> actively refused with ECONNREFUSED (because the agent had not had enough
> time to start up and start listening yet).

Yes, that's correct.  When I wrote "... until the connection timed out",
I was referring to the auto-retry timeout.  Perhaps you'd like me to
further clarify this part of the commit message?

> Note that in current master, tcp_auto_retry is _only_ checked in
> the ECONNREFUSED cases:
>
>       /* Maybe we're waiting for the remote target to become ready to
> 	 accept connections.  */
>       if (tcp_auto_retry
> #ifdef USE_WIN32API
> 	  && err == WSAECONNREFUSED
> #else
> 	  && err == ECONNREFUSED
> #endif
> 	  && wait_for_connect (NULL, &polls) >= 0)
> 	{
> 	  close (scb->fd);
> 	  goto retry;
> 	}
>> 
>> For that reason, I had to rewrite the main loop and implement a new
>> method for handling multiple connections.  The main idea is:
>
> So I'm surprised to see this.  I'm not sure connecting 
> to multiple addresses at the same time and then discarding
> all but one is a good idea.

It's important to mention that we're just trying to connect to multiple
addresses in the case where getaddrinfo returns us these addresses.  It
is not because the *user* has provided multiple addresses.  And if
getaddrinfo just returns one address, we'll just try to connect to one
address.

>  E.g., once we have scox's
> multi-client support in, this can truly manage to connect
> multiple times to the same gdbserver.  It probably won't cause
> an issue in practice with that agent, but I'm not sure, and
> not sure about others.

Hm, but will the user be able to provide multiple addresses to connect
to via the CLI?  If not, then I guess nothing changes.  And if she will,
then we'll just have to call net_open for each address, I guess.

I'm in no way an expert in what scox is doing, so I may very well be
wrong, but at first thought I don't see a problem with my proposed
solution + his feature.

> In the previous discussions, I was thinking about something
> simpler,

To be completely honest, I have to say I tried something simpler at
first as well.  It's just that this code is a bit more complicated than
what I initially thought :-/.

> something like this in pseudo code:
>
> - refactor the net_open code that tries to connect, starting at the
>   retry: label up until where we have a successful connect (excluded)
>   into a try_connect function.  Tweak it to return immediately
>   on ECONNREFUSED instead of waiting directly.  I.e., let the caller
>   handle the wait + auto-retry logic.
>
> - Then net_open would be something like:
>
>   net_open ()
>   {
>
>     parse_connection_spec (....);
>
>     addresses = getaddrinfo (....);
>
>     do
>       {
>         number_refused = 0;
>         foreach (address in addresses)
> 	  {
> 	    res = try_connect (address);
> 	    if (res == connected_ok)
> 	      break;
> 	    else if (res == ECONNREFUSED)
> 	      number_refused++; // don't wait for auto-retry until we've tried all addresses
> 	  }
>       } while (tcp_auto_retry
> 	       && number_refused == number_addresses      // raced with server starting,
> 	       && wait_for_connect (NULL, &polls) >= 0);  // so wait and try again.
>
>    /* Got connection.  */
>    ...
>  }

Initially it seems OK, but I see at least one place where this might be
problematic.  For example, how is EINPROGRESS handled?  Given that our
sockets are all non-blocking in this code, 'connect' (almost) always
returns EINPROGRESS.  Currently, it's basically being handled by this
part:

      ...
      if (
#ifdef USE_WIN32API
	  /* Under Windows, calling "connect" with a non-blocking socket
	     results in WSAEWOULDBLOCK, not WSAEINPROGRESS.  */
	  err != WSAEWOULDBLOCK
#else
	  err != EINPROGRESS
#endif
	  )
	{
	  errno = err;
	  net_close (scb);
	  return -1;
	}

      /* Looks like we need to wait for the connect.  */
      do 
	{
	  n = wait_for_connect (scb, &polls);
	} 
      while (n == 0);
      if (n < 0)
	{
	  net_close (scb);
	  return -1;
	}
      ...

Which means that GDB will wait_for_connect while the timeout is not
reached (n == 0).  From my tests, if you start gdbserver listening on an
IPv4-only socket, and then tell GDB to connect to "localhost", the IPv6
socket will still return EINPROGRESS, even though there are no IPv6
sockets listening on the other side.  This may be problematic because it
could make GDB wait unnecessarily.  But maybe 'select' will return right
away signalling that the IPv6 socket has an error (which means we'd have
to perform a getsockopt to obtain the errno), and everything will be OK.
I'd have to test, I guess.

Another thing to consider is, how many times do we loop if we get an
errno that is not ECONNREFUSED?  Worst case scenario, there may be an
error with something internal and we'd always get some other errno, in
which case we'd loop until all addresses have been tried, which is not
going to happen.  We could stipulate a maximum number of times, but
that'd be one more number to keep track of.

[ Coming back after writing the whole e-mail... ]

All in all, your proposal seems simpler.  As I said above, I want to
create a separate branch and test it.

> So basically we'd do the try-all-getaddrinfo-addresses loop that
> everyone seems to do, with the "set tcp auto-retry on"
> delay + retry logic preserving it's original intent of
> being useful "if the remote debugging agent is launched in
> parallel with GDB; there is a race condition because the agent may not
> become ready to accept the connection before @value{GDBN} attempts to
> connect.".

With my proposed implementation, this race condition is still present,
and the retry logic's meaning is preserved.

>> diff --git a/gdb/common/netstuff.h b/gdb/common/netstuff.h
>> new file mode 100644
>> index 0000000000..687ff532b8
>> --- /dev/null
>> +++ b/gdb/common/netstuff.h
>> @@ -0,0 +1,76 @@
>> +/* Operations on network stuff.
>> +   Copyright (C) 2018 Free Software Foundation, Inc.
>> +
>> +   This file is part of GDB.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#ifndef NETSTUFF_H
>> +#define NETSTUFF_H
>> +
>> +#include <string>
>> +#include "common/gdb_string_view.h"
>
> In the end you're not using string_view (the copying
> is still here), so please remove the unnecessary include.

Done.

>> +
>> +/* Like NI_MAXHOST/NI_MAXSERV, but enough for numeric forms.  */
>> +#define GDB_NI_MAX_ADDR 64
>> +#define GDB_NI_MAX_PORT 16
>> +
>> +/* Helper class to guarantee that we always call 'freeaddrinfo'.  */
>> +
>> +class scoped_free_addrinfo
>> +{
>> +public:
>> +  /* Default constructor.  */
>> +  scoped_free_addrinfo (struct addrinfo *ainfo)
>> +    : m_res (ainfo)
>> +  {
>> +  }
>> +
>> +  /* Destructor responsible for free'ing M_RES by calling
>> +     'freeaddrinfo'.  */
>> +  ~scoped_free_addrinfo ();
>> +
>> +  DISABLE_COPY_AND_ASSIGN (scoped_free_addrinfo);
>> +
>> +private:
>> +  /* The addrinfo resource.  */
>> +  struct addrinfo *m_res;
>> +};
>> +
>> +/* The struct we return after parsing the hostname.  */
>> +
>> +struct parsed_hostname
>> +{
>> +  /* The hostname.  */
>> +  std::string host_str;
>> +
>> +  /* The port, if any.  */
>> +  std::string port_str;
>> +};
>> +
>> +
>> +/* Parse HOSTNAME (which is a string in the form of "ADDR:PORT") and
>> +   return a 'parsed_hostname' structure with the proper fields filled
>> +   in.  Also adjust HINT accordingly.  */
>> +extern parsed_hostname parse_hostname_without_prefix (std::string hostname,
>> +						      struct addrinfo *hint);
>> +
>> +/* Parse HOSTNAME (which is a string in the form of
>> +   "[tcp[6]:|udp[6]:]ADDR:PORT") and return a 'parsed_hostname'
>> +   structure with the proper fields filled in.  Also adjust HINT
>> +   accordingly.  */
>> +extern parsed_hostname parse_hostname (const char *hostname,
>> +				       struct addrinfo *hint);
>
> After staring at these a couple times, I'm thinking that maybe
> replacing the "hostname" in the function and struct names with
> something else may be a bit clearer, since you're not just
> parsing the host name, but also the port.
>
> Maybe call the full protocol+address+port thing a
> "connection spec", so you'd have:
>
>  struct parsed_connection_spec (or just "struct connection_spec")
>  parse_connection_spec
>  parse_connection_spec_without_prefix
>
> or:
>
>  struct parsed_connspec (or just "struct connspec")
>  parse_connspec
>  parse_connspec_without_prefix

OK, no problem.  I prefer connection_spec, so I'll use that.

>> +
>> +#endif /* ! NETSTUFF_H */
>> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
>> index a6bad13d9d..55b48309a8 100644
>> --- a/gdb/doc/gdb.texinfo
>> +++ b/gdb/doc/gdb.texinfo
>> @@ -20509,16 +20509,27 @@ If you're using a serial line, you may want to give @value{GDBN} the
>>  @code{target} command.
>>  
>>  @item target remote @code{@var{host}:@var{port}}
>> +@itemx target remote @code{@var{[host]}:@var{port}}
>>  @itemx target remote @code{tcp:@var{host}:@var{port}}
>> +@itemx target remote @code{tcp:@var{[host]}:@var{port}}
>> +@itemx target remote @code{tcp4:@var{host}:@var{port}}
>> +@itemx target remote @code{tcp6:@var{host}:@var{port}}
>> +@itemx target remote @code{tcp6:@var{[host]}:@var{port}}
>>  @itemx target extended-remote @code{@var{host}:@var{port}}
>> +@itemx target extended-remote @code{@var{[host]}:@var{port}}
>>  @itemx target extended-remote @code{tcp:@var{host}:@var{port}}
>> +@itemx target extended-remote @code{tcp:@var{[host]}:@var{port}}
>> +@itemx target extended-remote @code{tcp4:@var{host}:@var{port}}
>> +@itemx target extended-remote @code{tcp6:@var{host}:@var{port}}
>> +@itemx target extended-remote @code{tcp6:@var{[host]}:@var{port}}
>>  @cindex @acronym{TCP} port, @code{target remote}
>>  Debug using a @acronym{TCP} connection to @var{port} on @var{host}.
>> -The @var{host} may be either a host name or a numeric @acronym{IP}
>> -address; @var{port} must be a decimal number.  The @var{host} could be
>> -the target machine itself, if it is directly connected to the net, or
>> -it might be a terminal server which in turn has a serial line to the
>> -target.
>> +The @var{host} may be either a host name, a numeric @acronym{IPv4}
>> +address, or a numeric @acronym{IPv6} address (with or without the
>> +square brackets to separate the address from the port); @var{port}
>> +must be a decimal number.  The @var{host} could be the target machine
>> +itself, if it is directly connected to the net, or it might be a
>> +terminal server which in turn has a serial line to the target.
>>  
>>  For example, to connect to port 2828 on a terminal server named
>>  @code{manyfarms}:
>> @@ -20527,6 +20538,26 @@ For example, to connect to port 2828 on a terminal server named
>>  target remote manyfarms:2828
>>  @end smallexample
>>  
>> +To connect to port 2828 on a terminal server whose address is
>> +@code{2001:0db8:85a3:0000:0000:8a2e:0370:7334}, you can either use the
>> +square bracket syntax:
>> +
>> +@smallexample
>> +target remote [2001:0db8:85a3:0000:0000:8a2e:0370:7334]:2828
>> +@end smallexample
>> +
>> +@noindent
>> +or explicitly specify the @acronym{IPv6} protocol:
>> +
>> +@smallexample
>> +target remote tcp6:2001:0db8:85a3:0000:0000:8a2e:0370:7334:2828
>> +@end smallexample
>> +
>> +This last example may be confusing to the reader, because there is no
>> +visible separation between the hostname and the port number.
>> +Therefore, we recommend the user to provide @acronym{IPv6} addresses
>> +using square brackets for clarity.
>
> Thanks, this example is better than in v1, though reading it I'm thinking
> that it may be a good idea to explicitly say that GDB always interprets
> the text after the last ":" as the port separator.  Is that right?
> I.e., for GDB, there's no ambiguity at all?

Correct.

> Ideally the patch would include unit tests for these new parsing routines
> covering cases like these.

I will see to it.

>> @@ -350,18 +396,24 @@ remote_open (const char *name)
>>  #endif /* USE_WIN32API */
>>    else
>>      {
>> -      int port;
>> -      socklen_t len;
>> -      struct sockaddr_in sockaddr;
>> -
>> -      len = sizeof (sockaddr);
>> -      if (getsockname (listen_desc,
>> -		       (struct sockaddr *) &sockaddr, &len) < 0
>> -	  || len < sizeof (sockaddr))
>> +      char listen_port[16];
>
> GDB_NI_MAX_PORT ?

True, fixed.

>> +      struct sockaddr_storage sockaddr;
>> +      socklen_t len = sizeof (sockaddr);
>> +
>> +      if (getsockname (listen_desc, (struct sockaddr *) &sockaddr, &len) < 0)
>>  	perror_with_name ("Can't determine port");
>> -      port = ntohs (sockaddr.sin_port);
>>  
>> -      fprintf (stderr, "Listening on port %d\n", port);
>> +      int r = getnameinfo ((struct sockaddr *) &sockaddr, len,
>> +			   NULL, 0,
>> +			   listen_port, sizeof (listen_port),
>> +			   NI_NUMERICSERV);
>> +
>> +      if (r != 0)
>> +	fprintf (stderr, _("Can't obtain port where we are listening: %s"),
>> +		 gai_strerror (r));
>> +      else
>> +	fprintf (stderr, _("Listening on port %s\n"), listen_port);
>> +
>>        fflush (stderr);
>>  
>>        /* Register the event loop handler.  */
>> diff --git a/gdb/ser-tcp.c b/gdb/ser-tcp.c
>> index 23ef3b04b8..44b6d89cda 100644
>> --- a/gdb/ser-tcp.c
>> +++ b/gdb/ser-tcp.c
>> @@ -25,6 +25,7 @@
>>  #include "cli/cli-decode.h"
>>  #include "cli/cli-setshow.h"
>>  #include "filestuff.h"
>> +#include "netstuff.h"
>>  
>>  #include <sys/types.h>
>>  
>> @@ -39,6 +40,7 @@
>>  
>>  #ifdef USE_WIN32API
>>  #include <winsock2.h>
>> +#include <wspiapi.h>
>>  #ifndef ETIMEDOUT
>>  #define ETIMEDOUT WSAETIMEDOUT
>>  #endif
>> @@ -81,12 +83,69 @@ static unsigned int tcp_retry_limit = 15;
>>  
>>  #define POLL_INTERVAL 5
>>  
>> -/* Helper function to wait a while.  If SCB is non-null, wait on its
>> -   file descriptor.  Otherwise just wait on a timeout, updating *POLLS.
>> -   Returns -1 on timeout or interrupt, otherwise the value of select.  */
>> +/* An abstraction of a socket, useful when we want to close the socket
>> +   fd automatically when exiting a context.  */
>> +
>> +class gdb_socket
>> +{
>> +public:
>> +  /* Default constructor.  */
>
> A "default constructor" is a constructor that can be called
> with no arguments.  But this one cannot.

Ah, OK.  I'll rewrite the comment.

>> +  gdb_socket (int sock, const struct addrinfo *ainfo)
>> +    : m_socket (sock),
>> +      m_released (false),
>> +      m_ainfo (ainfo)
>> +  {
>> +  }
>> +
>> +  /* Release a socket, i.e., make sure it doesn't get closed when our
>> +     destructor is called.  */
>> +  void release ()
>> +  {
>> +    m_released = true;
>> +  }
>> +
>> +  /* Return the socket associated with this object.  */
>> +  int get_socket () const
>> +  {
>> +    return m_socket;
>> +  }
>> +
>> +  /* Return the addrinfo structure associated with this object.  */
>> +  const struct addrinfo *get_addrinfo () const
>> +  {
>> +    return m_ainfo;
>> +  }
>
> Nit: we don't tend to use a "get_" prefix in class getters.
> Would "socket()" and "addrinfo()" work?

Yes, they work.  I fixed the names.

>> +
>> +  /* Destructor.  Make sure we close the socket if it hasn't been
>> +     released.  */
>> +  ~gdb_socket ()
>> +  {
>> +    if (!m_released)
>> +      close (m_socket);
>> +  }
>> +
>> +private:
>> +  /* The socket.  */
>> +  int m_socket;
>> +
>> +  /* Whether the socket has been released or not.  If it has, then we
>> +     don't close it when our destructor is called.  */
>> +  bool m_released;
>
> Do we need m_released?  Wouldn't e.g., m_socket == -1 instead work?

Yeah, it would.  I think it's clearer to have an explicit flag
signalling that the socket has been "released", but that's a matter of
taste, I guess.

>> +
>> +  /* The addrinfo structure associated with the socket.  */
>> +  const struct addrinfo *m_ainfo;
>
> So the class does not own m_ainfo?  Should be at least mentioned
> in a comment.

That's correct, m_ainfo isn't owned by the class.  I've clarified that
now.

> I think it'd be good to use DISABLE_COPY_AND_ASSIGN explicitly
> to make it clear the class is not meant to be copiable.
>
> I think it should be movable, though.  See comment further below,
> about vector of objects.
>
>
>> +};
>> +
>> +/* Helper function to wait a while.  If SOCKET_POLL is non-null, wait
>> +   on its file descriptors.  Otherwise just wait on a timeout, updating
>> +   *POLLS.  If SOCKET_POLL and SOCKETS_READY are both non-NULL, update
>> +   SOCKETS_READY with the value of the 'write' fd_set upon successful
>> +   completion of the 'select' call.  Return -1 on timeout or
>> +   interrupt, otherwise return the value of the 'select' call.  */
>>  
>>  static int
>> -wait_for_connect (struct serial *scb, unsigned int *polls)
>> +wait_for_connect (const std::vector<std::unique_ptr<gdb_socket>> *socket_poll,
>> +		  unsigned int *polls, fd_set *sockets_ready)
>>  {
>>    struct timeval t;
>>    int n;
>> @@ -120,24 +179,39 @@ wait_for_connect (struct serial *scb, unsigned int *polls)
>>        t.tv_usec = 0;
>>      }
>>  
>> -  if (scb)
>> +  if (socket_poll != NULL)
>>      {
>> -      fd_set rset, wset, eset;
>> +      fd_set wset, eset;
>> +      int maxfd = 0;
>> +
>> +      FD_ZERO (&wset);
>> +      FD_ZERO (&eset);
>> +      for (const std::unique_ptr<gdb_socket> &ptr : *socket_poll)
>> +	{
>> +	  const gdb_socket *sock = ptr.get ();
>> +	  int s = sock->get_socket ();
>
> I don't see why you'd extract the raw pointer out of the
> unique_ptr instead of dereferencing the unique_ptr directly, like:
>
>       for (const std::unique_ptr<gdb_socket> &sock : *socket_poll)
> 	{
> 	  int s = sock->get_socket ();
>
> ?
>
> This appears in several places in the patch.

Pure ignorance.  Sorry about that, I've simplified the code now.

> But, why does the vector need to hold heap-allocated
> gdb_socket instances instead of being a vector of
> gdb_socket object, like:
>
>   std::vector<gdb_socket>
>
> ?
>
> I think you'd just need to add a move ctor (and move assign)
> to gdb_socket.
>
> Then you'd use emplace_back to push a new socket.  I.e., instead
> of:
>
>              socket_poll.push_back
>                (std::unique_ptr<gdb_socket>
>                 (new gdb_socket (sock, cur_ainfo)));
>
> you write:
>
>              socket_poll.emplace_back (sock, cur_ainfo));

My first version of the code was like that, but I was having issues with
sockets being closed in the middle of the function.  That was, of
course, because the vector was being extended and, when moving
gdb_socket's around, it was calling destructors everywhere.  I remember
thinking "I should probably implement some move semantics here", but in
the end I chose the more complicated path.  Thanks for the tips, I've
now simplified the code a bit more.

>> +
>> +	  FD_SET (s, &wset);
>> +	  FD_SET (s, &eset);
>> +	  maxfd = std::max (maxfd, s);
>> +	}
>>  
>> -      FD_ZERO (&rset);
>> -      FD_SET (scb->fd, &rset);
>> -      wset = rset;
>> -      eset = rset;
>> -	  
>>        /* POSIX systems return connection success or failure by signalling
>>  	 wset.  Windows systems return success in wset and failure in
>>  	 eset.
>> -     
>> +
>>  	 We must call select here, rather than gdb_select, because
>>  	 the serial structure has not yet been initialized - the
>>  	 MinGW select wrapper will not know that this FD refers
>>  	 to a socket.  */
>> -      n = select (scb->fd + 1, &rset, &wset, &eset, &t);
>> +      n = select (maxfd + 1, NULL, &wset, &eset, &t);
>> +
>> +      if (n > 0 && sockets_ready != NULL)
>> +	{
>> +	  /* We're just interested in possible successes, so we just
>> +	     copy wset here.  */
>> +	  *sockets_ready = wset;
>> +	}
>>      }
>>    else
>>      /* Use gdb_select here, since we have no file descriptors, and on
>> @@ -153,171 +227,271 @@ wait_for_connect (struct serial *scb, unsigned int *polls)
>>    return n;
>>  }
>>  
>> -/* Open a tcp socket.  */
>> +/* Return TRUE if there is an error associated with socket SOCK, or
>> +   FALSE otherwise.  If there's an error, set ERRNO accordingly.  */
>>  
>> -int
>> -net_open (struct serial *scb, const char *name)
>> +static bool
>> +socket_error_p (int sock)
>>  {
>> -  char hostname[100];
>> -  const char *port_str;
>> -  int n, port, tmp;
>> -  int use_udp;
>> -  struct hostent *hostent;
>> -  struct sockaddr_in sockaddr;
>> -#ifdef USE_WIN32API
>> -  u_long ioarg;
>> -#else
>> -  int ioarg;
>> -#endif
>> -  unsigned int polls = 0;
>> +  int res, err;
>> +  socklen_t len = sizeof (err);
>>  
>> -  use_udp = 0;
>> -  if (startswith (name, "udp:"))
>> -    {
>> -      use_udp = 1;
>> -      name = name + 4;
>> -    }
>> -  else if (startswith (name, "tcp:"))
>> -    name = name + 4;
>> +  /* On Windows, the fourth parameter to getsockopt is a "char *";
>> +     on UNIX systems it is generally "void *".  The cast to "char *"
>> +     is OK everywhere, since in C++ any data pointer type can be
>> +     implicitly converted to "void *".  */
>> +  res = getsockopt (sock, SOL_SOCKET, SO_ERROR, (char *) &err, &len);
>>  
>> -  port_str = strchr (name, ':');
>> +  if (err != 0)
>> +    errno = err;
>>  
>> -  if (!port_str)
>> -    error (_("net_open: No colon in host name!"));  /* Shouldn't ever
>> -						       happen.  */
>> +  return (res < 0 || err != 0) ? true : false;
>
> You can just write:
>
>  return (res < 0 || err != 0);
>
> just like we write:
>
>   if (expression)
>
> instead of :
>
>   if (expression == true)

That I know!  Heh.  This was a brain fart; the code was different before
(I was returning something else instead of true/false), so when I
changed the return value I blindly adjusted this statement.  Anyway,
fixed.

>> +}
>>  
>> -  tmp = std::min (port_str - name, (ptrdiff_t) sizeof hostname - 1);
>> -  strncpy (hostname, name, tmp);	/* Don't want colon.  */
>> -  hostname[tmp] = '\000';	/* Tie off host name.  */
>> -  port = atoi (port_str + 1);
>> +/* Helper structure containing a pair of socket and addrinfo.  */
>>  
>> -  /* Default hostname is localhost.  */
>> -  if (!hostname[0])
>> -    strcpy (hostname, "localhost");
>> +struct gdb_connect_info
>> +{
>> +  int socket;
>>  
>> -  hostent = gethostbyname (hostname);
>> -  if (!hostent)
>> -    {
>> -      fprintf_unfiltered (gdb_stderr, "%s: unknown host\n", hostname);
>> -      errno = ENOENT;
>> -      return -1;
>> -    }
>> +  const struct addrinfo *ainfo;
>> +};
>>  
>> -  sockaddr.sin_family = PF_INET;
>> -  sockaddr.sin_port = htons (port);
>> -  memcpy (&sockaddr.sin_addr.s_addr, hostent->h_addr,
>> -	  sizeof (struct in_addr));
>> +/* Iterate over the entries of AINFO and try to open a socket and
>> +   perform a 'connect' on each one.  If there's a success, return the
>> +   socket and the associated 'struct addrinfo' that succeeded.
>> +   Otherwise, return a special instance of gdb_connect_info whose
>> +   SOCKET is -1 and AINFO is NULL.
>> +
>> +   Sockets that are opened are marked as non-blocking.  When a socket
>> +   fails to connect (i.e., when 'connect' returns -1 and ERRNO is set
>> +   to EINPROGRESS), we add this socket (along with its associated
>> +   'struct addrinfo') into SOCKET_POLL.  The caller can then use
>> +   SOCKET_POLL to perform a 'select' on the sockets and check if any
>> +   of them succeeded.  */
>> +
>> +static gdb_connect_info
>> +gdb_connect (const struct addrinfo *ainfo,
>> +	     std::vector<std::unique_ptr<gdb_socket>> &socket_poll)
>> +{
>> +  gdb_connect_info ret;
>> +#ifdef USE_WIN32API
>> +  u_long ioarg;
>> +#else
>> +  int ioarg;
>> +#endif
>>  
>> - retry:
>> +  ret.socket = -1;
>> +  ret.ainfo = NULL;
>>  
>
> Note you can write:
>
>   gdb_connect_info ret = {-1, NULL};
>
> Or add in-class initialization defaults:
>
>  /* Helper structure containing a pair of socket and addrinfo.  */
>  
>  struct gdb_connect_info
>  {
>    int socket = -1;
>  
>    const struct addrinfo *ainfo = nullptr;
>  };

Ah, fair enough.  I'll go with in-class initialization.

>> -  if (use_udp)
>> -    scb->fd = gdb_socket_cloexec (PF_INET, SOCK_DGRAM, 0);
>> -  else
>> -    scb->fd = gdb_socket_cloexec (PF_INET, SOCK_STREAM, 0);
>> +  for (const struct addrinfo *cur_ainfo = ainfo;
>> +       cur_ainfo != NULL;
>> +       cur_ainfo = cur_ainfo->ai_next)
>> +    {
>> +      int sock = gdb_socket_cloexec (cur_ainfo->ai_family,
>> +				     cur_ainfo->ai_socktype,
>> +				     cur_ainfo->ai_protocol);
>>  
>> -  if (scb->fd == -1)
>> -    return -1;
>> -  
>> -  /* Set socket nonblocking.  */
>> -  ioarg = 1;
>> -  ioctl (scb->fd, FIONBIO, &ioarg);
>> +      if (sock < 0)
>> +	continue;
>>  
>> -  /* Use Non-blocking connect.  connect() will return 0 if connected
>> -     already.  */
>> -  n = connect (scb->fd, (struct sockaddr *) &sockaddr, sizeof (sockaddr));
>> +      /* Set socket nonblocking.  */
>> +      ioarg = 1;
>> +      ioctl (sock, FIONBIO, &ioarg);
>>  
>> -  if (n < 0)
>> -    {
>> +      /* Use Non-blocking connect.  connect() will return 0 if
>> +	 connected already.  */
>> +      if (connect (sock, cur_ainfo->ai_addr, cur_ainfo->ai_addrlen) == 0)
>> +	{
>> +	  if (!socket_error_p (sock))
>> +	    {
>> +	      /* Connection succeeded, we can stop trying.  */
>> +	      ret.socket = sock;
>> +	      ret.ainfo = cur_ainfo;
>> +	      break;
>> +	    }
>> +	  else
>> +	    {
>> +	      /* There was an error with the socket.  Just try the
>> +		 next one.  */
>> +	      close (sock);
>> +	      continue;
>> +	    }
>> +	}
>> +      else
>> +	{
>>  #ifdef USE_WIN32API
>> -      int err = WSAGetLastError();
>> +	  int err = WSAGetLastError();
>>  #else
>> -      int err = errno;
>> +	  int err = errno;
>>  #endif
>>  
>> -      /* Maybe we're waiting for the remote target to become ready to
>> -	 accept connections.  */
>> -      if (tcp_auto_retry
>> +	  if (tcp_auto_retry
>>  #ifdef USE_WIN32API
>> -	  && err == WSAECONNREFUSED
>> +	      /* Under Windows, calling "connect" with a
>> +		 non-blocking socket results in WSAEWOULDBLOCK,
>> +		 not WSAEINPROGRESS.  */
>> +	      && err == WSAEWOULDBLOCK
>>  #else
>> -	  && err == ECONNREFUSED
>> +	      && err == EINPROGRESS
>>  #endif
>> -	  && wait_for_connect (NULL, &polls) >= 0)
>
>
> Hard to read in the patch in the mail, but in the original code, we see
> that tcp_auto_retry was _only_ checked if we got a ECONNREFUSED:
>
>       /* Maybe we're waiting for the remote target to become ready to
> 	 accept connections.  */
>       if (tcp_auto_retry
> #ifdef USE_WIN32API
> 	  && err == WSAECONNREFUSED
> #else
> 	  && err == ECONNREFUSED
> #endif
> 	  && wait_for_connect (NULL, &polls) >= 0)
> 	{
> 	  close (scb->fd);
> 	  goto retry;
> 	}
>
>
> while with your patch, we would check for tcp_auto_retry only when
> we get EINPROGRESS:
>
> 	  if (tcp_auto_retry
> #ifdef USE_WIN32API
> 	      /* Under Windows, calling "connect" with a
> 		 non-blocking socket results in WSAEWOULDBLOCK,
> 		 not WSAEINPROGRESS.  */
> 	      && err == WSAEWOULDBLOCK
> #else
> 	      && err == EINPROGRESS
> #endif
> 	      )
> 	    {
> 	      /* If we have an "INPROGRESS" error, add the socket
> 		 to the poll of sockets we have to perform a
> 		 'select' on.  */
> 	      socket_poll.push_back
> 		(std::unique_ptr<gdb_socket>
> 		 (new gdb_socket (sock, cur_ainfo)));
> 	    }
>
>
> So I don't think I understand this.  AFAICS, you removed all references
> to ECONNREFUSED.  Can you clarify?

Yeah, I guess I've made a confusion here.  There's a difference between
"wait_for_connect (SOCKET, &polls);" and "wait_for_connect (NULL,
&polls);".  The first one will perform a 'select' on a socket that got
EINPROGRESS, in which case the 'select' is probably going to return
before the timeout.  In this case, in the original code, GDB doesn't
check for 'tcp_auto_retry'.  This makes sense, because we want to
call 'wait_for_connect' anyway.  In my code, GDB is check the value of
'tcp_auto_retry' in this case.  This is wrong,

As for ECONNREFUSED, in my proposal, if 'connect' doesn't return
EINPROGRESS, then I consider it's worth retrying.  Maybe I should indeed
be more selective and just perform the retry when errno is
ECONNREFUSED.  Also, I noticed I'm not calling 'tcp_auto_retry' when I
should, which is here:

      ...
      if (got_connection)
	{
	  /* Maybe we've been able to establish a connection.  If so,
	     just break.  */
	  break;
	}

      /* Let's wait a bit.  */
----> // HERE
      if (wait_for_connect (NULL, &polls, NULL) < 0)
	break;
      ...

So yeah, there is a problem with the current proposal.

My current fix is to:

1) Remove the check for 'tcp_auto_retry' when checking for errno ==
EINPROGRESS.

2) Add a check for errno == ECONNREFUSED, in which case we'd signal that
GDB needs to call "wait_for_connect (NULL...)" if no connection
succeeds.

>> -	{
>> -	  close (scb->fd);
>> -	  goto retry;
>> +	      )
>> +	    {
>> +	      /* If we have an "INPROGRESS" error, add the socket
>> +		 to the poll of sockets we have to perform a
>> +		 'select' on.  */
>> +	      socket_poll.push_back
>> +		(std::unique_ptr<gdb_socket>
>> +		 (new gdb_socket (sock, cur_ainfo)));
>> +	    }
>>  	}
>> +    }
>> +  return ret;
>> +}
>>  
>> -      if (
>> +/* Open a tcp socket.  */
>> +
>> +int
>> +net_open (struct serial *scb, const char *name)
>> +{
>> +  bool use_udp;
>>  #ifdef USE_WIN32API
>> -	  /* Under Windows, calling "connect" with a non-blocking socket
>> -	     results in WSAEWOULDBLOCK, not WSAEINPROGRESS.  */
>> -	  err != WSAEWOULDBLOCK
>> +  u_long ioarg;
>>  #else
>> -	  err != EINPROGRESS
>> +  int ioarg;
>>  #endif
>> -	  )
>> +  struct addrinfo hint;
>> +  struct addrinfo *ainfo;
>> +
>> +  memset (&hint, 0, sizeof (hint));
>> +  /* Assume no prefix will be passed, therefore we should use
>> +     AF_UNSPEC.  */
>> +  hint.ai_family = AF_UNSPEC;
>> +  hint.ai_socktype = SOCK_STREAM;
>> +  hint.ai_protocol = IPPROTO_TCP;
>> +
>> +  parsed_hostname parsed = parse_hostname (name, &hint);
>> +
>> +  if (parsed.port_str.empty ())
>> +    error (_("Missing port on hostname '%s'"), name);
>> +
>> +  int r = getaddrinfo (parsed.host_str.c_str (),
>> +		       parsed.port_str.c_str (), &hint, &ainfo);
>> +
>> +  if (r != 0)
>> +    {
>> +      fprintf_unfiltered (gdb_stderr, _("%s: cannot resolve name: %s\n"),
>> +			  name, gai_strerror (r));
>> +      errno = ENOENT;
>> +      return -1;
>> +    }
>> +
>> +  scoped_free_addrinfo free_ainfo (ainfo);
>> +
>> +  const struct addrinfo *cur_ainfo;
>> +  bool got_connection = false;
>> +  unsigned int polls = 0;
>> +
>> +  /* Assume the worst.  */
>> +  scb->fd = -1;
>> +
>> +  while (!got_connection)
>> +    {
>> +      /* A poll of sockets.  This poll will store the sockets that
>> +	 "error'd" with EINPROGRESS, meaning that we will perform a
>> +	 'select' on them.  */
>> +      std::vector<std::unique_ptr<gdb_socket>> socket_poll;
>> +      /* Try to connect.  This function will return a pair of socket
>> +	 and addrinfo.  If the connection succeeded, the socket will
>> +	 have a positive value and the addrinfo will not be NULL.  */
>> +      gdb_connect_info gci = gdb_connect (ainfo, socket_poll);
>> +
>> +      if (gci.socket > -1)
>
> That "> -1" reads as an unusual check, which gives pause.  I'd suggest:
>
>      if (gci.socket != -1)

Done.

Thanks,

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]