sshd slow login and/or 100% cpu

Jason Pyeron jpyeron@pdinc.us
Mon May 3 13:40:14 GMT 2021


My teammates have been observing periodic slow login problems recently, most recent Cygwin update was for the Git CVE, but I do not think that is related. 

Guidance on troubleshooting and resolution most appreciated.

My assumptions: 
===============

BLODA (I cannot influence that) and bad network between AD and Server.

I am willing to do whatever I am allowed to fix this.


Here are my observations:
=========================

1. The users log in via PKI
2. The users' accounts are AD based accounts

-------------------------

3. stopping the sshd service does not kill all the sshd.exe processes
4. killing all the sshd.exe processes (after service stop) and starting the service returns performance to normal
5. the normal last for an indeterminate time between 1 and 24 hours typical. Once it goes slow, it does not recover on its own.

-------------------------

6. /etc/nsswitch.conf only contains

passwd:   files
group:    files

7. we are running the cygserver, /etc/cygserver.conf  is empty

-------------------------

8. resolving group information takes 97 seconds (sometimes)

XREDACTED_00012X@XREDACTED_00003X ~
$ id -G XREDACTED_00047X
XREDACTED_00023X 545 555 2 11 15 XREDACTED_00045X XREDACTED_00028X 401408

XREDACTED_00012X@XREDACTED_00003X ~
$ getent group XREDACTED_00023X 545 555 2 11 15 XREDACTED_00045X XREDACTED_00028X 401408
Domain Users:S-1-5-21-XREDACTED_00044X-513:XREDACTED_00023X:
Users:S-1-5-32-545:545:
Remote Desktop Users:S-1-5-32-555:555:
NETWORK:S-1-5-2:2:
Authenticated Users:S-1-5-11:11:
This Organization:S-1-5-15:15:
Service asserted identity:S-1-18-2:XREDACTED_00045X:
XREDACTED_00016X:XREDACTED_00014X:XREDACTED_00028X:
Medium Mandatory Level:S-1-16-8192:401408:

Running id a second time is quick, most of the time. A few hours later this morning in the same bash shell, it was slow again.

9. tracing through sshd it seems to be holding at 2 system calls in uidswap.c (initgroups, getgroups)

diff --git a/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c b/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c
index 40e1215..4538e63 100644
--- a/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c
+++ b/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c
@@ -60,6 +60,8 @@ static int    saved_egroupslen = -1, user_groupslen = -1;
 void
 temporarily_use_uid(struct passwd *pw)
 {
+       debug3_f("entering");
+
        /* Save the current euid, and egroups. */
 #ifdef SAVED_IDS_WORK_WITH_SETEUID
        saved_euid = geteuid();
@@ -83,7 +85,9 @@ temporarily_use_uid(struct passwd *pw)
        privileged = 1;
        temporarily_use_uid_effective = 1;

+       debug3_f("getgroups(0, NULL)");
        saved_egroupslen = getgroups(0, NULL);
+       debug3_f("getgroups(0, NULL)=%u", saved_egroupslen);
        if (saved_egroupslen == -1)
                fatal("getgroups: %.100s", strerror(errno));
        if (saved_egroupslen > 0) {
@@ -97,42 +101,57 @@ temporarily_use_uid(struct passwd *pw)
        }

        /* set and save the user's groups */
+       debug3_f("if (user_groupslen == -1 || user_groups_uid != pw->pw_uid)");
        if (user_groupslen == -1 || user_groups_uid != pw->pw_uid) {
+               debug3_f("if (initgroups(\"%s\", %u) == -1) [SLOW NEXT LINE]", pw->pw_name, pw->pw_gid);
                if (initgroups(pw->pw_name, pw->pw_gid) == -1)
                        fatal("initgroups: %s: %.100s", pw->pw_name,
                            strerror(errno));

+               debug3_f("getgroups(0, NULL) [SLOW NEXT LINE]");
                user_groupslen = getgroups(0, NULL);
+               debug3_f("getgroups(0, NULL)=%u", user_groupslen);

-------------------------

10. I have not tried to find the 100% cpu cause, yet. When at 100% it may or may not be slow to log in.

-------------------------

11. redacted cygcheck output attached. I ran cygcheck -s -v -r > cygcheck-20210503-0759.out



Respectfully,

Jason Pyeron

--
Jason Pyeron  | Architect
PD Inc        |
10 w 24th St  |
Baltimore, MD |
 
.mil: jason.j.pyeron.ctr@mail.mil
.com: jpyeron@pdinc.us
tel : 202-741-9397


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygcheck-20210503-0759.redacted.out
Type: application/octet-stream
Size: 84395 bytes
Desc: not available
URL: <https://cygwin.com/pipermail/cygwin/attachments/20210503/1609d01c/attachment-0001.obj>


More information about the Cygwin mailing list