[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4551b48c-eba5-42a3-aa53-74d412a126b9@redhat.com>
Date: Wed, 17 Sep 2025 11:04:12 +0200
From: David Hildenbrand <david@...hat.com>
To: Jan Kara <jack@...e.cz>
Cc: Ryan Roberts <ryan.roberts@....com>,
syzbot <syzbot+263f159eb37a1c4c67a4@...kaller.appspotmail.com>,
akpm@...ux-foundation.org, chaitanyas.prakash@....com, davem@...emloft.net,
edumazet@...gle.com, hdanton@...a.com, horms@...nel.org, kuba@...nel.org,
kuniyu@...gle.com, linux-kernel@...r.kernel.org,
linux-sound@...r.kernel.org, netdev@...r.kernel.org, pabeni@...hat.com,
perex@...ex.cz, syzkaller-bugs@...glegroups.com, tiwai@...e.com,
willemb@...gle.com
Subject: Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
>>
>> if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
>> goto out_retry;
>>
>> /* Did it get truncated? */
>> if (unlikely(folio->mapping != mapping)) {
>> folio_unlock(folio);
>> folio_put(folio);
>> goto retry_find;
>> }
>> VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
>>
>>
>> I would assume that if !folio_contains(folio, index), either the folio got
>> split in the meantime (filemap_get_folio() returned with a raised reference,
>> though) or that file pagecache contained something wrong.
>
> Right.
>
>> In __filemap_get_folio() we perform the same checks after locking the folio
>> (with FGP_LOCK), and weird enough it didn't trigger yet there.
>
> But we don't call __filemap_get_folio() with FGP_LOCK from filemap_fault().
Yes. I should have clarified that we haven't seen the VM_BUG_ON_FOLIO()
trigger on other callpaths that set FGP_LOCK, because I would think the
very same problem could happen there as well.
> The folio locking is handled by lock_folio_maybe_drop_mmap() as you
> mentioned. So this is the first time we do the assert after getting the
> folio AFAICT. So some race with folio split looks plausible. Checking the
> reproducer it does play with mmap(2) and madvise(MADV_REMOVE) over the
> mapped range so the page fault may be racing with
> truncate_inode_partial_folio()->try_folio_split(). But I don't see the race
> there now...
__filemap_get_folio() will grab a reference and verify that the xarray
didn't change. So having a concurrent split succeed would be weird,
because freezing the refcount should fail. Of course, some refcounting
inconsistency could trigger something weird like that.
I can spot that we are also manually calling
__filemap_get_folio(FGP_CREAT|FGP_FOR_MMAP) on the else path if
filemap_get_folio() failed, maybe that's the problematic bit (and maybe
that's where readahead logic makes a difference).
--
Cheers
David / dhildenb
Powered by blists - more mailing lists