linux-kernel - Re: kernel BUG at mm/memory.c:LINE!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACT4Y+bU9y3sZvynTYdwSj=T3-OvE_C4_6eWiS0i=8HE0wEb9g@mail.gmail.com>
Date:   Tue, 10 Jul 2018 12:02:17 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     syzbot <syzbot+3f84280d52be9b7083cc@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Jerome Glisse <jglisse@...hat.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        ldufour@...ux.vnet.ibm.com, LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Michal Hocko <mhocko@...e.com>,
        Minchan Kim <minchan@...nel.org>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Matthew Wilcox <willy@...radead.org>, ying.huang@...el.com
Subject: Re: kernel BUG at mm/memory.c:LINE!

On Tue, Jul 10, 2018 at 12:07 AM, Kirill A. Shutemov
<kirill@...temov.name> wrote:
> On Mon, Jul 09, 2018 at 07:23:15PM +0200, Dmitry Vyukov wrote:
>> On Mon, Jul 9, 2018 at 5:25 PM, Kirill A. Shutemov <kirill@...temov.name> wrote:
>> > On Mon, Jul 09, 2018 at 05:21:55PM +0300, Kirill A. Shutemov wrote:
>> >> > This also happened only once so far:
>> >> > https://syzkaller.appspot.com/bug?extid=3f84280d52be9b7083cc
>> >> > and I can't reproduce it rerunning this program. So it's either a very
>> >> > subtle race, or fd in the middle of netlink address magically matched
>> >> > some fd once, or something else...
>> >>
>> >> Okay, I've got it reproduced. See below.
>> >>
>> >> The problem is that kcov doesn't set vm_ops for the VMA and it makes
>> >> kernel think that the VMA is anonymous.
>> >>
>> >> It's not necessary the way it was triggered by syzkaller. I just found
>> >> that kcov's ->mmap doesn't set vm_ops. There can more such cases.
>> >> vma_is_anonymous() is what we need to fix.
>> >>
>> >> ( Although, I found logic around mmaping the file second time questinable
>> >>   at best. It seems broken to me. )
>> >>
>> >> It is known that vma_is_anonymous() can produce false-positives. It tried
>> >> to fix it once[1], but it back-fired[2].
>> >>
>> >> I'll look at this again.
>> >
>> > Below is a patch that seems work. But it definately requires more testing.
>> >
>> > Dmitry, could you give it a try in syzkaller?
>>
>> Trying.
>>
>> Not sure what you expect from this. Either way it will be hundreds of
>> crashes before vs hundreds of crashes after ;)
>>
>> But one that started popping up is this, looks like it's somewhere
>> around the code your patch touches:
>>
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault: 0000 [#1] SMP KASAN
>> CPU: 0 PID: 6711 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #43
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> RIP: 0010:__get_vma_policy+0x61/0x160 mm/mempolicy.c:1620
>
> Right, my bad. Here's fixup.
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d508c7844681..12b2b3c7f51e 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -597,6 +597,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>         memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
>         pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
>         pseudo_vma.vm_file = file;
> +       pseudo_vma.vm_ops = &anon_vm_ops;
>
>         for (index = start; index < end; index++) {
>                 /*


With this change I don't see anything that stands out, just a typical
mix of crashes like these:

BUG: unable to handle kernel paging request in kfree
INFO: task hung in flush_work
KASAN: slab-out-of-bounds Read in fscache_alloc_cookie
KASAN: use-after-free Read in __queue_work
general protection fault in encode_rpcb_string
lost connection to test machine
no output from test machine
unregister_netdevice: waiting for DEV to become free

So I guess this can be qualified as +1 for the patch.