[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5pl5yhhxyz7dn4r2v6c4ll53ejboe5xa5226ytgg7kjirgmh5@tofyas4lp4uy>
Date: Wed, 17 Sep 2025 10:35:46 +0200
From: Jan Kara <jack@...e.cz>
To: David Hildenbrand <david@...hat.com>
Cc: Jan Kara <jack@...e.cz>, Ryan Roberts <ryan.roberts@....com>,
syzbot <syzbot+263f159eb37a1c4c67a4@...kaller.appspotmail.com>, akpm@...ux-foundation.org, chaitanyas.prakash@....com,
davem@...emloft.net, edumazet@...gle.com, hdanton@...a.com, horms@...nel.org,
kuba@...nel.org, kuniyu@...gle.com, linux-kernel@...r.kernel.org,
linux-sound@...r.kernel.org, netdev@...r.kernel.org, pabeni@...hat.com, perex@...ex.cz,
syzkaller-bugs@...glegroups.com, tiwai@...e.com, willemb@...gle.com
Subject: Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
On Wed 17-09-25 09:57:19, David Hildenbrand wrote:
> On 16.09.25 15:05, Jan Kara wrote:
> > On Tue 16-09-25 13:50:08, Ryan Roberts wrote:
> > > On 14/09/2025 11:51, syzbot wrote:
> > > > syzbot suspects this issue was fixed by commit:
> > > >
> > > > commit bdb86f6b87633cc020f8225ae09d336da7826724
> > > > Author: Ryan Roberts <ryan.roberts@....com>
> > > > Date: Mon Jun 9 09:27:23 2025 +0000
> > > >
> > > > mm/readahead: honour new_order in page_cache_ra_order()
> > >
> > > I'm not sure what original bug you are claiming this is fixing? Perhaps this?
> > >
> > > https://lore.kernel.org/linux-mm/6852b77e.a70a0220.79d0a.0214.GAE@google.com/
> >
> > I think it was:
> >
> > https://lore.kernel.org/all/684ffc59.a00a0220.279073.0037.GAE@google.com/
> >
> > at least that's what the syzbot email replies to... And it doesn't make a
> > lot of sense but it isn't totally off either. So I'd just let the syzbot
> > bug autoclose after some timeout.
>
> Hm, in the issue we ran into was:
>
> VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
>
> in filemap_fault().
>
> Now, that sounds rather bad, especially given that it was reported upstream.
>
> So likely we should figure out what happened and see if it really fixed it
> and if so, why it fixed it (stable backports etc)?
Ok, ok, fair enough ;)
> Could be that Ryans patch is just making the problem harder to reproduce, of
> course (what I assume right now).
>
> Essentially we do a
>
> folio = filemap_get_folio(mapping, index);
>
> followed by
>
> if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
> goto out_retry;
>
> /* Did it get truncated? */
> if (unlikely(folio->mapping != mapping)) {
> folio_unlock(folio);
> folio_put(folio);
> goto retry_find;
> }
> VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
>
>
> I would assume that if !folio_contains(folio, index), either the folio got
> split in the meantime (filemap_get_folio() returned with a raised reference,
> though) or that file pagecache contained something wrong.
Right.
> In __filemap_get_folio() we perform the same checks after locking the folio
> (with FGP_LOCK), and weird enough it didn't trigger yet there.
But we don't call __filemap_get_folio() with FGP_LOCK from filemap_fault().
The folio locking is handled by lock_folio_maybe_drop_mmap() as you
mentioned. So this is the first time we do the assert after getting the
folio AFAICT. So some race with folio split looks plausible. Checking the
reproducer it does play with mmap(2) and madvise(MADV_REMOVE) over the
mapped range so the page fault may be racing with
truncate_inode_partial_folio()->try_folio_split(). But I don't see the race
there now...
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists