Bug 18340

Summary: Segmentation fault of probed SSHD program
Product: systemtap Reporter: fahadaliarshad
Component: uprobesAssignee: Unassigned <systemtap>
Status: RESOLVED WORKSFORME    
Severity: normal CC: fche
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description fahadaliarshad 2015-04-28 03:15:55 UTC
Hi,

This bug appears to be similar to this (https://sourceware.org/bugzilla/show_bug.cgi?id=12458) but I think elfutils is not the issue.

I compiled the following openssh server versions to be probed by systemtap and all versions are segfaulting when probed by systemtap versions 2.4/0.156, 2.7/0.156 on my 3.13.6-100.fc19:

To make sure that it is not elfutils, I also reproduced the same problem on centos7 3.10.0-123.9.3.el7.x86_64 with systemtap version 2.8/0.158, commit release-2.7-16-gbac8aa5aa94c

When I don't execute the probes the openssh-server executes normally and clients can connect via sftp.

openssh-5.2p1,openssh-5.3p1,openssh-5.4p1

Compilation:
------------
wget http://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/
tar zxf openssh-5.3p1.tar.gz
cd openssh-5.3p1/
./configure --prefix=/usr/local/openssh-5.3p1

To include the symbols in binary, edit the Makefile and remove '-s' from the following variable:
STRIP_OPT=

make
make install

Reproduction:
-------------
Edit the sshd_file with the following modifications:
Port 33000
#Subsystem      sftp    /usr/local/openssh-5.3p1/libexec/sftp-server
Subsystem sftp internal-sftp
Match Group sftpuser
        ChrootDirectory /home
        X11Forwarding no
        AllowTcpForwarding no
        ForceCommand internal-sftp

Start the following two stap probes:
------------------------------------
stap -v -e 'probe process("/usr/local/openssh-5.3p1/sbin/sshd").function("*") {printf("[%d] funcname:%s->\n", gettimeofday_us(), pp())} probe process("/usr/local/openssh-5.3p1/sbin/sshd").function("*").return {printf("[%d] funcname:%s<-\n", gettimeofday_us(), pp())}'

Start sshd and connect from client 'sftp -vvvv -oPort=33000 sftpuser@192.168.83.17':

[root@centos7 ~]# /usr/local/openssh-5.3p1/sbin/sshd -f /usr/local/openssh-5.3p1/etc/sshd_config -ddd
debug2: load_server_config: filename /usr/local/openssh-5.3p1/etc/sshd_config
debug2: load_server_config: done config len = 279
debug2: parse_server_config: config /usr/local/openssh-5.3p1/etc/sshd_config len 279
debug3: /usr/local/openssh-5.3p1/etc/sshd_config:14 setting Port 33000
debug3: /usr/local/openssh-5.3p1/etc/sshd_config:22 setting Protocol 2
debug3: /usr/local/openssh-5.3p1/etc/sshd_config:115 setting Subsystem sftp internal-sftp
debug3: checking syntax for 'Match Group sftpuser'
debug1: sshd version OpenSSH_5.3p1
debug3: Not a RSA1 key file /usr/local/openssh-5.3p1/etc/ssh_host_rsa_key.
debug1: read PEM private key done: type RSA
debug1: private host key: #0 type 1 RSA
debug3: Not a RSA1 key file /usr/local/openssh-5.3p1/etc/ssh_host_dsa_key.
debug1: read PEM private key done: type DSA
debug1: private host key: #1 type 2 DSA
debug1: rexec_argv[0]='/usr/local/openssh-5.3p1/sbin/sshd'
debug1: rexec_argv[1]='-f'
debug1: rexec_argv[2]='/usr/local/openssh-5.3p1/etc/sshd_config'
debug1: rexec_argv[3]='-ddd'
debug2: fd 3 setting O_NONBLOCK
debug1: Bind to port 33000 on 0.0.0.0.
Server listening on 0.0.0.0 port 33000.
debug2: fd 4 setting O_NONBLOCK
debug1: Bind to port 33000 on ::.
Server listening on :: port 33000.
debug3: fd 5 is not O_NONBLOCK
debug1: Server will not fork when running in debugging mode.
debug3: send_rexec_state: entering fd = 8 config len 279
debug3: ssh_msg_send: type 0
debug3: send_rexec_state: done
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: inetd sockets after dupping: 3, 3
Segmentation fault



Error seen from client:
=======================
# sftp -vvvv -oPort=33000 sftpuser@192.168.83.17
Connecting to 192.168.83.17...
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.83.17 [192.168.83.17] port 33000.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
ssh_exchange_identification: Connection closed by remote host
Couldn't read packet: Connection reset by peer


Systemtap outputs this:
=======================
....
[1430111639655561] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("packet_set_connection@/root/openssh-5.3p1/packet.c:181")->
[1430111639655576] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("cipher_by_name@/root/openssh-5.3p1/cipher.c:133")->
[1430111639655585] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("cipher_by_name@/root/openssh-5.3p1/cipher.c:133").return<-
[1430111639655591] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("cipher_init@/root/openssh-5.3p1/cipher.c:204")->
[1430111639655610] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("cipher_init@/root/openssh-5.3p1/cipher.c:204").return<-
[1430111639655615] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("cipher_init@/root/openssh-5.3p1/cipher.c:204")->
[1430111639655620] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("cipher_init@/root/openssh-5.3p1/cipher.c:204").return<-
[1430111639655624] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34")->
[1430111639655630] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28")->
[1430111639655636] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28").return<-
[1430111639655640] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34").return<-
[1430111639655644] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34")->
[1430111639655649] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28")->
[1430111639655654] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28").return<-
[1430111639655658] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34").return<-
[1430111639655662] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34")->
[1430111639655667] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28")->
[1430111639655672] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28").return<-
[1430111639655675] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34").return<-
[1430111639655679] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34")->
[1430111639655684] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28")->
[1430111639655694] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("xmalloc@/root/openssh-5.3p1/xmalloc.c:28").return<-
[1430111639655697] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("buffer_init@/root/openssh-5.3p1/buffer.c:34").return<-
[1430111639655701] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("packet_set_connection@/root/openssh-5.3p1/packet.c:181").return<-
[1430111639655707] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("packet_set_server@/root/openssh-5.3p1/packet.c:1759")->
[1430111639655712] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("packet_set_server@/root/openssh-5.3p1/packet.c:1759").return<-
[1430111639655717] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("packet_connection_is_on_socket@/root/openssh-5.3p1/packet.c:251")->
[1430111639655722] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("packet_connection_is_on_socket@/root/openssh-5.3p1/packet.c:251").return<-
[1430111639655733] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("get_remote_port@/root/openssh-5.3p1/canohost.c:403")->
[1430111639655739] funcname:process("/usr/local/openssh-5.3p1/sbin/sshd").function("get_port@/root/openssh-5.3p1/canohost.c:383")->
Comment 1 Frank Ch. Eigler 2015-04-28 10:58:17 UTC
We've tracked several kernel uprobes bugs that have symptoms like this, including bug #18171.  Try removing the .return probes from your script to be sure.
Comment 2 fahadaliarshad 2015-04-28 11:18:19 UTC
Removing the .return probe did not help and I am observing the same behavior (Segmentation fault). 

FYI, the following log entry in /var/log/messages is generated when segfault occurs.

Apr 28 11:12:04 centos7 kernel: traps: sshd[27333] general protection ip:7fffffffe080 sp:7fff9bba5cd0 error:0
Comment 3 Frank Ch. Eigler 2015-04-28 11:29:28 UTC
> Removing the .return probe did not help and I am observing the same behavior
> (Segmentation fault). 

OK.  A number of kernel uprobe bugs were fixed after 3.13; would you be able to test with a more recent version?
Comment 4 fahadaliarshad 2015-04-29 12:09:20 UTC
Frank,
Thanks for providing feedback on this.

I was going to build from source a kernel version > 3.13. But before that, I updated my system (yum update) and kernel version updated from "3.10.0-123.9.3.el7.x86_64" to the latest version "3.10.0-229.1.2.el7.x86_64" available by Centos7 repo.

Apparently, this fixed the issue!

I am still not sure what was the root-cause though.

There is no segfault anymore but I do see a "WARNING: function _start return" as below.

[root@centos7 ~]# stap -v -e 'probe process("/usr/local/openssh-5.3p1/sbin/sshd").function("*") {printf("[%d] funcname:%s->\n", gettimeofday_us(), pp())} probe process("/usr/local/openssh-5.3p1/sbin/sshd").function("*").return {printf("[%d] funcname:%s<-\n", gettimeofday_us(), pp())}'
Pass 1: parsed user script and 106 library script(s) using 217156virt/34680res/2992shr/32188data kb, in 220usr/60sys/288real ms.
WARNING: function _start return probe is blacklisted: keyword at <input>:1:131
 source: probe process("/usr/local/openssh-5.3p1/sbin/sshd").function("*") {printf("[%d] funcname:%s->\n", gettimeofday_us(), pp())} probe process("/usr/local/openssh-5.3p1/sbin/sshd").function("*").return {printf("[%d] funcname:%s<-\n", gettimeofday_us(), pp())}
                                                                                                                                           ^
Pass 2: analyzed script: 1982 probe(s), 3 function(s), 1 embed(s), 0 global(s) using 225440virt/43964res/3732shr/40472data kb, in 180usr/20sys/183real ms.
Pass 3: using cached /root/.systemtap/cache/a6/stap_a6c26ef9098b1d31ec5d9d07d982715b_652697.c
Pass 4: using cached /root/.systemtap/cache/a6/stap_a6c26ef9098b1d31ec5d9d07d982715b_652697.ko
Pass 5: starting run.
Comment 5 Frank Ch. Eigler 2015-04-29 12:55:12 UTC
Yes, the rhel7 kernel contains many uprobe fixes.  The _start-related warning is due to bug #16662 (the ABI of the _start function being incompatible with uretprobes).