[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y4-vqdv01ebyzhUoggUCUyvbhjut7Wvj=r4dBfyxLeng@mail.gmail.com>
Date: Wed, 23 Sep 2020 11:24:48 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Nick Desaulniers <ndesaulniers@...gle.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
syzbot <syzbot+ce179bc99e64377c24bc@...kaller.appspotmail.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Jiri Olsa <jolsa@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Mark Rutland <mark.rutland@....com>,
Ingo Molnar <mingo@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Thomas Gleixner <tglx@...utronix.de>,
"the arch/x86 maintainers" <x86@...nel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: general protection fault in perf_misc_flags
On Wed, Sep 23, 2020 at 11:03 AM Borislav Petkov <bp@...en8.de> wrote:
>
> On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote:
> > So I think there's an issue with "deterministically reproducible."
> > The syzcaller report has:
> > > > Unfortunately, I don't have any reproducer for this issue yet.
>
> Yeah, Dmitry gave two other links of similar reports, the first one
> works for me:
>
> https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c
>
> and that one doesn't have a reproducer either. The bytes look familiar
> though:
>
> Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18
> All code
> ========
> 0: c1 e8 03 shr $0x3,%eax
> 3: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1)
> 8: 74 05 je 0xf
> a: e8 79 7a a7 00 callq 0xa77a88
> f: 49 8b 47 10 mov 0x10(%r15),%rax
> 13: 48 89 05 f6 d8 ef 09 mov %rax,0x9efd8f6(%rip) # 0x9efd910
> 1a: 49 8d 7f 08 lea 0x8(%r15),%rdi
> 1e: 48 89 f8 mov %rdi,%rax
> 21: 48 c1 e8 03 shr $0x3,%rax
> 25: 42 80 3c 00 00 cmpb $0x0,(%rax,%r8,1)
> 2a:* 00 00 add %al,(%rax) <-- trapping instruction
> 2c: e8 57 7a a7 00 callq 0xa77a88
> 31: 49 8b 47 08 mov 0x8(%r15),%rax
> 35: 48 89 05 dc d8 ef 09 mov %rax,0x9efd8dc(%rip) # 0x9efd918
> 3c: 49 8d 7f 18 lea 0x18(%r15),%rdi
>
> 4 zero bytes again. And that .config has kasan stuff enabled too so
> could the failure be related to having kasan stuff enabled and it
> messing up offsets?
>
> That is, provided this is the mechanism how it would happen. We still
> don't know what and when wrote those zeroes in there. Not having a
> reproducer is nasty but looking at those reports above and if I'm
> reading this correctly, rIP points to
>
> RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline]
>
> each time and the URL says they're 9 crashes total. And each have
> happened at that rIP. So all we'd need is set a watchpoint when that
> address is being written and dump stuff.
>
> Dmitry, can the syzkaller do debugging stuff like that?
syzbot does not have direct support for such things.
It uses CONFIG_DEBUG_AID_FOR_SYZBOT=y:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches
But that's generally useful for linux-next only and the clang build is
on the upstream tree...
Options I see:
1. Add stricter debug checks for code that overwrites code. Then maybe
we can catch it red handed.
2. Setup clang instance on linux-next
3. Run syzkaller locally with custom patches.
> > Following my hypothesis about having a bad address calculation; the
> > tricky part is I'd need to look through the relocations and try to see
> > if any could resolve to the address that was accidentally modified. I
> > suspect objtool could be leveraged for that;
>
> If you can find this at compile time...
>
> > maybe it could check whether each `struct jump_entry`'s `target`
> > member referred to either a NOP or a CMP, and error otherwise? (Do we
> > have other non-NOP or CMP targets? IDK)
>
> Follow jump_label_transform() - it does verify what it is going to
> patch. And while I'm looking at this, I realize that the jump labels
> patch 5 bytes but the above zeroes are 4 bytes. In the other opcode
> bytes I decoded it is 4 bytes too. So this might not be caused by the
> jump labels patching...
>
> > This hypothesis might also be incorrect, and thus would be chasing a
> > red herring...not really sure how else to pursue debugging this.
>
> Yeah, this one is tricky to debug.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists