lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z7__4qeMP-jG07-M+ugL3PxkQ_z83=TB8O9e4=jjV4ug@mail.gmail.com>
Date:   Fri, 22 Dec 2017 09:26:28 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        syzbot 
        <bot+72c44cd8b0e8a1a64b9c03c4396aea93a16465ef@...kaller.appspotmail.com>,
        Ingo Molnar <mingo@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Jiang <dave.jiang@...el.com>,
        Hugh Dickins <hughd@...gle.com>, Jan Kara <jack@...e.cz>,
        Jerome Glisse <jglisse@...hat.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>, tcharding <me@...in.cc>,
        Michal Hocko <mhocko@...e.com>,
        Minchan Kim <minchan@...nel.org>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        syzkaller-bugs@...glegroups.com,
        Matthew Wilcox <willy@...radead.org>,
        Eric Biggers <ebiggers3@...il.com>
Subject: Re: general protection fault in finish_task_switch

On Fri, Dec 22, 2017 at 9:17 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Dec 21, 2017 at 10:42:04AM -0800, Linus Torvalds wrote:
>> On Wed, Dec 20, 2017 at 8:03 AM, syzbot
>> <bot+72c44cd8b0e8a1a64b9c03c4396aea93a16465ef@...kaller.appspotmail.com>
>> wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > 7dc9f647127d6955ffacaf51cb6a627b31dceec2
>> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> >
>> > kasan: CONFIG_KASAN_INLINE enabled
>> > kasan: GPF could be caused by NULL-ptr deref or user memory access
>> > general protection fault: 0000 [#1] SMP KASAN
>> > Dumping ftrace buffer:
>> >    (ftrace buffer empty)
>> > Modules linked in:
>> > CPU: 0 PID: 4227 Comm: syzkaller244813 Not tainted 4.15.0-rc4-next-20171220+
>> > #77
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > RIP: __fire_sched_in_preempt_notifiers kernel/sched/core.c:2534 [inline]
>>
>> That line 2534 is the call inside the hlist_for_each_entry() loop:
>>
>>         hlist_for_each_entry(notifier, &curr->preempt_notifiers, link)
>>                 notifier->ops->sched_in(notifier, raw_smp_processor_id());
>>
>> and the Code: line disassembly is
>>
>>    0: ff 11                callq  *(%rcx)
>>    2: 4c 89 f9              mov    %r15,%rcx
>>    5: 48 c1 e9 03          shr    $0x3,%rcx
>>    9: 42 80 3c 31 00        cmpb   $0x0,(%rcx,%r14,1)
>>    e: 0f 85 1b 02 00 00    jne    0x22f
>>   14: 4d 8b 3f              mov    (%r15),%r15
>>   17: 4d 85 ff              test   %r15,%r15
>>   1a: 0f 84 c0 fd ff ff    je     0xfffffffffffffde0
>>   20: 49 8d 7f 10          lea    0x10(%r15),%rdi
>>   24: 48 89 f9              mov    %rdi,%rcx
>>   27: 48 c1 e9 03          shr    $0x3,%rcx
>>   2b:* 42 80 3c 31 00        cmpb   $0x0,(%rcx,%r14,1) <-- trapping instruction
>>   30: 74 ae                je     0xffffffffffffffe0
>>   32: e8 a7 cc 5b 00        callq  0x5bccde
>>   37: eb a7                jmp    0xffffffffffffffe0
>>   39: 4c 89 fe              mov    %r15,%rsi
>>   3c: 4c 89 e7              mov    %r12,%rdi
>>
>> and while the "callq *(%rcx)" might be just the end part of some
>> previous instruction, I think it may be right (there is indeed an
>> indirect call in that function - that very "->sched_in()" call).
>>
>> So I think the oops happens after the indirect call returns.
>>
>> I think the second "callq" is
>>
>>     call    __asan_report_load8_noabort
>>
>> and the actual trapping instruction is loading the KASAN byte state.
>>
>> As far as I can tell, the kasan check is trying to check this part of
>> hlist_for_each_entry():
>>
>>     movq    (%r15), %r15    # notifier_110->link.next,
>>
>> and %r15 is dead000000000100, which is LIST_POISON1.
>>
>> End result: KASAN actually makes these things harder to debug, because
>> it's trying to "validate" the list poison values before they are used,
>> and takes a much more complex and indirect fault in the process,
>> instead of just getting a page-fault on the LIST_POISON1 that would
>> have made it more obvious.
>>
>> Oh well.
>>
>> There is nothing in this that indicates that it's actually related to
>> KASAN, and it _should_ oops even without KASAN enabled.
>>
>> But the reproducer does nothing for me. Of course, I didn't actually
>> run it on linux-next at all, so it is quite possibly related to
>> scheduler work (or the TLB/pagetable work) that just hasn't hit
>> mainstream yet.
>>
>> None of the scheduler people seem to have been on the report, though.
>> Adding some in.
>
> So the only user of that preempt_notifier stuff is KVM, if you don't run
> a guest the notifiers are empty and are in fact disabled with a
> static_key.
>
> We've not touched this part of the scheduler in a fair while. I'll go
> dig out the original report and see if that reproducer does anything for
> me.


I think this is another manifestation of "KASAN: use-after-free Read
in __schedule":
https://groups.google.com/forum/#!msg/syzkaller-bugs/-8JZhr4W8AY/FpPFh8EqAQAJ
+Eric already mailed a fix for it (indeed new bug in kvm code).

Let's tell syzbot:

#syz dup: KASAN: use-after-free Read in __schedule

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ