linux-kernel - Re: general protection fault in perf_misc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y4-vqdv01ebyzhUoggUCUyvbhjut7Wvj=r4dBfyxLeng@mail.gmail.com>
Date:   Wed, 23 Sep 2020 11:24:48 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Nick Desaulniers <ndesaulniers@...gle.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        syzbot <syzbot+ce179bc99e64377c24bc@...kaller.appspotmail.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, Jiri Olsa <jolsa@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Mark Rutland <mark.rutland@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: general protection fault in perf_misc_flags

On Wed, Sep 23, 2020 at 11:03 AM Borislav Petkov <bp@...en8.de> wrote:
>
> On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote:
> > So I think there's an issue with "deterministically reproducible."
> > The syzcaller report has:
> > > > Unfortunately, I don't have any reproducer for this issue yet.
>
> Yeah, Dmitry gave two other links of similar reports, the first one
> works for me:
>
> https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c
>
> and that one doesn't have a reproducer either. The bytes look familiar
> though:
>
> Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18
> All code
> ========
>    0:   c1 e8 03                shr    $0x3,%eax
>    3:   42 80 3c 20 00          cmpb   $0x0,(%rax,%r12,1)
>    8:   74 05                   je     0xf
>    a:   e8 79 7a a7 00          callq  0xa77a88
>    f:   49 8b 47 10             mov    0x10(%r15),%rax
>   13:   48 89 05 f6 d8 ef 09    mov    %rax,0x9efd8f6(%rip)        # 0x9efd910
>   1a:   49 8d 7f 08             lea    0x8(%r15),%rdi
>   1e:   48 89 f8                mov    %rdi,%rax
>   21:   48 c1 e8 03             shr    $0x3,%rax
>   25:   42 80 3c 00 00          cmpb   $0x0,(%rax,%r8,1)
>   2a:*  00 00                   add    %al,(%rax)               <-- trapping instruction
>   2c:   e8 57 7a a7 00          callq  0xa77a88
>   31:   49 8b 47 08             mov    0x8(%r15),%rax
>   35:   48 89 05 dc d8 ef 09    mov    %rax,0x9efd8dc(%rip)        # 0x9efd918
>   3c:   49 8d 7f 18             lea    0x18(%r15),%rdi
>
> 4 zero bytes again. And that .config has kasan stuff enabled too so
> could the failure be related to having kasan stuff enabled and it
> messing up offsets?
>
> That is, provided this is the mechanism how it would happen. We still
> don't know what and when wrote those zeroes in there. Not having a
> reproducer is nasty but looking at those reports above and if I'm
> reading this correctly, rIP points to
>
> RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline]
>
> each time and the URL says they're 9 crashes total. And each have
> happened at that rIP. So all we'd need is set a watchpoint when that
> address is being written and dump stuff.
>
> Dmitry, can the syzkaller do debugging stuff like that?

syzbot does not have direct support for such things.
It uses CONFIG_DEBUG_AID_FOR_SYZBOT=y:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches
But that's generally useful for linux-next only and the clang build is
on the upstream tree...

Options I see:
1. Add stricter debug checks for code that overwrites code. Then maybe
we can catch it red handed.
2. Setup clang instance on linux-next
3. Run syzkaller locally with custom patches.


> > Following my hypothesis about having a bad address calculation; the
> > tricky part is I'd need to look through the relocations and try to see
> > if any could resolve to the address that was accidentally modified.  I
> > suspect objtool could be leveraged for that;
>
> If you can find this at compile time...
>
> > maybe it could check whether each `struct jump_entry`'s `target`
> > member referred to either a NOP or a CMP, and error otherwise? (Do we
> > have other non-NOP or CMP targets? IDK)
>
> Follow jump_label_transform() - it does verify what it is going to
> patch. And while I'm looking at this, I realize that the jump labels
> patch 5 bytes but the above zeroes are 4 bytes. In the other opcode
> bytes I decoded it is 4 bytes too. So this might not be caused by the
> jump labels patching...
>
> > This hypothesis might also be incorrect, and thus would be chasing a
> > red herring...not really sure how else to pursue debugging this.
>
> Yeah, this one is tricky to debug.
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette