OpenSSH 8.9p1-1 Connects successfully but then hangs - Killing ssh-agent resolves the issue

Andrey Repin anrdaemon@yandex.ru
Mon Apr 4 07:25:06 GMT 2022


Greetings, Jim Garrison via Cygwin!

Replying to the first post to reduce quoting, but I did read the entire thread.

> My Cygwin ssh client stopped working... It would successfully connect to
> the remote (Debian) host but then hang without displaying the command
> prompt.  See debug output attached, as well as cygcheck output.

> I decided to run setup to see if there was a newer version of openssh.
> In preparation for that I always terminate all Cygwin processes because
> they will interfere with the update.  I killed the ssh-agent process and
> on a whim decided to try connecting again.  This time it worked.

> This would seem to indicate something in ssh-agent is interfering with
> the connection.  There are no credentials loaded into ssh-agent.

I've encountered similar issue with ssh-pageant myself.
The explanation (as I see it) is this:
At certain point in its lifetime, the agent gets stuck <somewhere> and cease
to respond to the requests.
SSH attempting to contact the hung agent, the connection thread responds but
internal storage is somehow locked and never return any usable info on which
the client could meaningfully act. Since neither agent, nor SSH have any
guarding code against slow responses in this place, entire system hangs
indefinitely.

This is how the problem is observed. The following is a pure guesswork (with a
workaround).

I'm only exclusively observing this issue on my notebook. My guess is when it
awakes from hibernation, some internal state is not managed well. The delay in
agent response gets increasingly larger until it reaches the point of
intolerability. I've made a workaround like the following:

_check_agent() {
  test -f "$HOME/.ssh/agent" && . "$HOME/.ssh/agent" > /dev/null
  ssh-add -l > /dev/null 2>&1 &
  sleep 1
  if kill -0 $! 2> /dev/null; then
    echo "$( basename "$0" ): ssh-add: the agent is hung, unable to continue" >&2
    exit 1
  fi

  if ! wait $!; then
    echo "$( basename "$0" ): ssh-add: no identities or unable to contact the agent" >&2
    exit 2
  fi
}

What it does is:
1. Run a command to list available keys, detached.
2. Wait a second to let the command complete, if all goes well.
3. Test if a listing command is still around. If it does, assume hung agent
and report an error.
4. Also report an error if no keys are registered with agent or agent is dead.


-- 
With best regards,
Andrey Repin
Monday, April 4, 2022 9:16:49

Sorry for my terrible english...



More information about the Cygwin mailing list