lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Apr 2022 19:30:07 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     syzbot <syzbot+cf4cf13056f85dec2c40@...kaller.appspotmail.com>
Cc:     akpm@...ux-foundation.org, dhowells@...hat.com, hughd@...gle.com,
        kirill.shutemov@...ux.intel.com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, syzkaller-bugs@...glegroups.com,
        vbabka@...e.cz, william.kucharski@...cle.com
Subject: Re: [syzbot] kernel BUG in __filemap_get_folio

On Thu, Apr 21, 2022 at 09:21:34PM +0100, Matthew Wilcox wrote:
> I wish I knew which 'index' we were looking up.  I'll try reproducing it
> locally so I can print that out too.

I can't reproduce it locally because the OOM killer says I don't have
enough RAM.  That's with giving 4GB to the VM.  If I give more than 4GB
to the VM, my laptop is insufficiently studly, and the host OOM killer
takes out qemu instead ;-P

> My suspicion is that there's a race where the folio is split during the
> lookup, and the bug is really in mapping_get_entry().  The folio->index
> is weird though; if this was the explanation, I'd expect it to find a
> page at a multiple of 512 or at least a multiple of 64.

I think I have an explanation (from thinking really hard, rather than
testing).  Before we call xas_split(), the tree looks like this:

node (shift=6)
 -> page (index 0)
 -> sibling of 0
 -> sibling of 0
 -> sibling of 0
 -> sibling of 0
 -> sibling of 0
 -> sibling of 0
 -> sibling of 0
 -> page (index 0x200)
 -> sibling of 8
 -> sibling of 8
 -> sibling of 8
 -> sibling of 8
 -> sibling of 8
 -> sibling of 8
 -> sibling of 8
 -> sibling of 8

Then we split the page at index 0x200.  Simultaneously, we try to load
the page at index 0x274 (or 2b4 or 2f4 or ... 3f4).  The load picks
up the sibling entry at offset 9 (0x274 >> 6), which says to refer to
the entry at offset 8.  But by the time it gets the entry at offset 8,
the split has replaced the compound page at index 0x200 with a node that
points to pages at indices 0x200-0x23f.

Solving it on the split side is possible, but I think it's easier to
solve on the load side.  I have a patch, it seems to work; let's see
what syzbot thinks of it:

#syz test: git://git.infradead.org/users/willy/xarray.git main

Powered by blists - more mailing lists