netdev - Re: [syzbot] [sound?] kernel BUG in filemap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4551b48c-eba5-42a3-aa53-74d412a126b9@redhat.com>
Date: Wed, 17 Sep 2025 11:04:12 +0200
From: David Hildenbrand <david@...hat.com>
To: Jan Kara <jack@...e.cz>
Cc: Ryan Roberts <ryan.roberts@....com>,
 syzbot <syzbot+263f159eb37a1c4c67a4@...kaller.appspotmail.com>,
 akpm@...ux-foundation.org, chaitanyas.prakash@....com, davem@...emloft.net,
 edumazet@...gle.com, hdanton@...a.com, horms@...nel.org, kuba@...nel.org,
 kuniyu@...gle.com, linux-kernel@...r.kernel.org,
 linux-sound@...r.kernel.org, netdev@...r.kernel.org, pabeni@...hat.com,
 perex@...ex.cz, syzkaller-bugs@...glegroups.com, tiwai@...e.com,
 willemb@...gle.com
Subject: Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)

>>
>> 	if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
>> 		goto out_retry;
>>
>> 	/* Did it get truncated? */
>> 	if (unlikely(folio->mapping != mapping)) {
>> 		folio_unlock(folio);
>> 		folio_put(folio);
>> 		goto retry_find;
>> 	}
>> 	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
>>
>>
>> I would assume that if !folio_contains(folio, index), either the folio got
>> split in the meantime (filemap_get_folio() returned with a raised reference,
>> though) or that file pagecache contained something wrong.
> 
> Right.
> 
>> In __filemap_get_folio() we perform the same checks after locking the folio
>> (with FGP_LOCK), and weird enough it didn't trigger yet there.
> 
> But we don't call __filemap_get_folio() with FGP_LOCK from filemap_fault().

Yes. I should have clarified that we haven't seen the VM_BUG_ON_FOLIO() 
trigger on other callpaths that set FGP_LOCK, because I would think the 
very same problem could happen there as well.

> The folio locking is handled by lock_folio_maybe_drop_mmap() as you
> mentioned. So this is the first time we do the assert after getting the
> folio AFAICT. So some race with folio split looks plausible. Checking the
> reproducer it does play with mmap(2) and madvise(MADV_REMOVE) over the
> mapped range so the page fault may be racing with
> truncate_inode_partial_folio()->try_folio_split(). But I don't see the race
> there now...

__filemap_get_folio() will grab a reference and verify that the xarray 
didn't change. So having a concurrent split succeed would be weird, 
because freezing the refcount should fail. Of course, some refcounting 
inconsistency could trigger something weird like that.

I can spot that we are also manually calling 
__filemap_get_folio(FGP_CREAT|FGP_FOR_MMAP) on the else path if 
filemap_get_folio() failed, maybe that's the problematic bit (and maybe 
that's where readahead logic makes a difference).

-- 
Cheers

David / dhildenb