lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Feb 2018 09:57:27 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Eric Biggers <ebiggers3@...il.com>,
        syzbot <syzbot+bacbe5d8791f30c9cee5@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Dan Williams <dan.j.williams@...el.com>,
        James Morse <james.morse@....com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Ingo Molnar <mingo@...nel.org>,
        syzkaller-bugs@...glegroups.com
Subject: Re: possible deadlock in get_user_pages_unlocked

On Fri, Feb 2, 2018 at 7:20 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Fri, Feb 02, 2018 at 05:46:26AM +0000, Al Viro wrote:
>> On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:
>>
>> > Try starting up multiple instances of the program; that sometimes helps with
>> > these races that are hard to hit (since you may e.g. have a different number of
>> > CPUs than syzbot used).  If I start up 4 instances I see the lockdep splat after
>> > around 2-5 seconds.
>>
>> 5 instances in parallel, 10 minutes into the run...
>>
>> >  This is on latest Linus tree (4bf772b1467).  Also note the
>> > reproducer uses KVM, so if you're running it in a VM it will only work if you've
>> > enabled nested virtualization on the host (kvm_intel.nested=1).
>>
>> cat /sys/module/kvm_amd/parameters/nested
>> 1
>>
>> on host
>>
>> > Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
>> > get_user_page_nowait() to get_user_pages_unlocked()").
>>
>> That simply prevents this reproducer hitting get_user_pages_unlocked()
>> instead of grab mmap_sem/get_user_pages/drop mmap_sem.  I.e. does not
>> allow __get_user_pages_locked() to drop/regain ->mmap_sem.
>>
>> The bug may be in the way we call get_user_pages_unlocked() in that
>> commit, but it might easily be a bug in __get_user_pages_locked()
>> exposed by that reproducer somehow.
>
> I think I understand what's going on.  FOLL_NOWAIT handling is a serious
> mess ;-/  I'll probably have something to test tomorrow - I still can't
> reproduce it here, unfortunately.

Hi Al,

syzbot tests for up to 5 minutes. However, if there is a race involved
then you may need more time because the crash is probabilistic.
But from what I see most of the time, if one can't reproduce it
easily, it's usually due to some differences in setup that just don't
allow the crash to happen at all.
FWIW syzbot re-runs each reproducer on a freshly booted dedicated VM
and what it provided is the kernel output it got during run of the
provided program. So we have reasonably high assurance that this
reproducer worked in at least one setup.

Even if you can't reproduce it locally, you can use syzbot testing
service, see "syz test" here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot

We also try to collect known causes of non-working reproducers, so if
you get any hints as to why it does not reproduce for you, we can add
it here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce
Since kvm/ept are present in the stacks, I suspect that it may be due
to a different host CPU unfortunately.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ