linux-kernel - Strange hang with gcc 8 of kprobe multiple

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181203211807.17e03384@vmware.local.home>
Date:   Mon, 3 Dec 2018 21:18:07 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Masami Hiramatsu <mhiramat@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...capital.net>
Subject: Strange hang with gcc 8 of kprobe multiple_kprobes test

Hi Masami,

I started testing some of my new code and the system got into a
strange state. Debugging further, I found the cause came from the
kprobe tests. It became stranger to me that I could reproduce it with
older kernels. I went back as far as 4.16 and it triggered. I thought
this very strange because I ran this test on all those kernels in the
past.

After a bit of hair pulling, I figured out what changed. I upgraded to
gcc 8.1 (and I reproduce it with 8.2 as well). I convert back to gcc 7
and the tests pass without issue.

The issue that I notice when the system gets into this strange state is
that I can't log into the box. Nor can I reboot. Basically it's
anything to do with systemd just doesn't work (insert your jokes here
now, and then let's move on).

I was able to narrow down what the exact function was that caused the
issues and it is: update_vsyscall()

gcc 7 looks like this:

ffffffff81004bf0 <update_vsyscall>:
ffffffff81004bf0:       e8 0b cc 9f 00          callq  ffffffff81a01800 <__fentry__>
                        ffffffff81004bf1: R_X86_64_PC32 __fentry__-0x4
ffffffff81004bf5:       48 8b 07                mov    (%rdi),%rax
ffffffff81004bf8:       8b 15 96 5f 34 01       mov    0x1345f96(%rip),%edx        # ffffffff8234ab94 <vclocks_used>
                        ffffffff81004bfa: R_X86_64_PC32 vclocks_used-0x4
ffffffff81004bfe:       83 05 7b 84 6f 01 01    addl   $0x1,0x16f847b(%rip)        # ffffffff826fd080 <vsyscall_gtod_data>
                        ffffffff81004c00: R_X86_64_PC32 vsyscall_gtod_data-0x5
ffffffff81004c05:       8b 48 24                mov    0x24(%rax),%ecx
ffffffff81004c08:       b8 01 00 00 00          mov    $0x1,%eax
ffffffff81004c0d:       d3 e0                   shl    %cl,%eax

And gcc 8 looks like this:

ffffffff81004c90 <update_vsyscall>:
ffffffff81004c90:       e8 6b cb 9f 00          callq  ffffffff81a01800 <__fentry__>
                        ffffffff81004c91: R_X86_64_PC32 __fentry__-0x4
ffffffff81004c95:       48 8b 07                mov    (%rdi),%rax
ffffffff81004c98:       83 05 e1 93 6f 01 01    addl   $0x1,0x16f93e1(%rip)        # ffffffff826fe080 <vsyscall_gtod_data>
                        ffffffff81004c9a: R_X86_64_PC32 vsyscall_gtod_data-0x5
ffffffff81004c9f:       8b 50 24                mov    0x24(%rax),%edx
ffffffff81004ca2:       8b 05 ec 5e 34 01       mov    0x1345eec(%rip),%eax        # ffffffff8234ab94 <vclocks_used>
                        ffffffff81004ca4: R_X86_64_PC32 vclocks_used-0x4

The test adds a kprobe (optimized) at udpate_vsyscall+5. And will
insert a jump on the two instructions after fentry. The difference
between v7 and v8 is that v7 is touching vclocks_used and v8 is
touching vsyscall_gtod_data.

Is there some black magic going on with the vsyscall area with
vsyscall_gtod_data that is causing havoc when a kprobe is added there?

I can dig a little more into this, but I'm currently at my HQ office
with a lot of other objectives that I must get done, and I can't work
on this much more this week.

I included my config (for my virt machine, which I was also able to
trigger it with).

The test that triggers this bug is:

 tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc

It runs the test fine, but other things just start to act up after I
run it.

I notice that when I get into the state, journald and the dbus_daemon
are constantly running. Perhaps the userspace time keeping went bad?

-- Steve

Download attachment ".config" of type "application/octet-stream" (134897 bytes)