linux-kernel - Re: general protection fault in perf_misc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200923090336.GD28545@zn.tnic>
Date:   Wed, 23 Sep 2020 11:03:36 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Nick Desaulniers <ndesaulniers@...gle.com>
Cc:     Josh Poimboeuf <jpoimboe@...hat.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+ce179bc99e64377c24bc@...kaller.appspotmail.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, Jiri Olsa <jolsa@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Mark Rutland <mark.rutland@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: general protection fault in perf_misc_flags

On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote:
> So I think there's an issue with "deterministically reproducible."
> The syzcaller report has:
> > > Unfortunately, I don't have any reproducer for this issue yet.

Yeah, Dmitry gave two other links of similar reports, the first one
works for me:

https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c

and that one doesn't have a reproducer either. The bytes look familiar
though:

Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18
All code
========
   0:   c1 e8 03                shr    $0x3,%eax
   3:   42 80 3c 20 00          cmpb   $0x0,(%rax,%r12,1)
   8:   74 05                   je     0xf
   a:   e8 79 7a a7 00          callq  0xa77a88
   f:   49 8b 47 10             mov    0x10(%r15),%rax
  13:   48 89 05 f6 d8 ef 09    mov    %rax,0x9efd8f6(%rip)        # 0x9efd910
  1a:   49 8d 7f 08             lea    0x8(%r15),%rdi
  1e:   48 89 f8                mov    %rdi,%rax
  21:   48 c1 e8 03             shr    $0x3,%rax
  25:   42 80 3c 00 00          cmpb   $0x0,(%rax,%r8,1)
  2a:*  00 00                   add    %al,(%rax)               <-- trapping instruction
  2c:   e8 57 7a a7 00          callq  0xa77a88
  31:   49 8b 47 08             mov    0x8(%r15),%rax
  35:   48 89 05 dc d8 ef 09    mov    %rax,0x9efd8dc(%rip)        # 0x9efd918
  3c:   49 8d 7f 18             lea    0x18(%r15),%rdi

4 zero bytes again. And that .config has kasan stuff enabled too so
could the failure be related to having kasan stuff enabled and it
messing up offsets?

That is, provided this is the mechanism how it would happen. We still
don't know what and when wrote those zeroes in there. Not having a
reproducer is nasty but looking at those reports above and if I'm
reading this correctly, rIP points to

RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline]

each time and the URL says they're 9 crashes total. And each have
happened at that rIP. So all we'd need is set a watchpoint when that
address is being written and dump stuff.

Dmitry, can the syzkaller do debugging stuff like that?

> Following my hypothesis about having a bad address calculation; the
> tricky part is I'd need to look through the relocations and try to see
> if any could resolve to the address that was accidentally modified.  I
> suspect objtool could be leveraged for that;

If you can find this at compile time...

> maybe it could check whether each `struct jump_entry`'s `target`
> member referred to either a NOP or a CMP, and error otherwise? (Do we
> have other non-NOP or CMP targets? IDK)

Follow jump_label_transform() - it does verify what it is going to
patch. And while I'm looking at this, I realize that the jump labels
patch 5 bytes but the above zeroes are 4 bytes. In the other opcode
bytes I decoded it is 4 bytes too. So this might not be caused by the
jump labels patching...

> This hypothesis might also be incorrect, and thus would be chasing a
> red herring...not really sure how else to pursue debugging this.

Yeah, this one is tricky to debug.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette