lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 17 Feb 2022 22:33:34 -0800 From: John Hubbard <jhubbard@...dia.com> To: Theodore Ts'o <tytso@....edu> Cc: Lee Jones <lee.jones@...aro.org>, linux-ext4@...r.kernel.org, Christoph Hellwig <hch@....de>, Dave Chinner <dchinner@...hat.com>, Goldwyn Rodrigues <rgoldwyn@...e.com>, "Darrick J . Wong" <darrick.wong@...cle.com>, Bob Peterson <rpeterso@...hat.com>, Damien Le Moal <damien.lemoal@....com>, Andreas Gruenbacher <agruenba@...hat.com>, Ritesh Harjani <riteshh@...ux.ibm.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Johannes Thumshirn <jth@...nel.org>, linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org, cluster-devel@...hat.com, linux-kernel@...r.kernel.org Subject: Re: [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers() On 2/17/22 20:08, Theodore Ts'o wrote: > On Thu, Feb 17, 2022 at 05:06:45PM -0800, John Hubbard wrote: >> Yes. And looking at the pair of backtraces below, this looks very much >> like another aspect of the "get_user_pages problem" [1], originally >> described in Jan Kara's 2018 email [2]. > > Hmm... I just posted my analysis, which tracks with yours; but I had > forgotten about Jan's 2018 e-mail on the matter. > >> I'm getting close to posting an RFC for the direct IO conversion to >> FOLL_PIN, but even after that, various parts of the kernel (reclaim, >> filesystems/block layer) still need to be changed so as to use >> page_maybe_dma_pinned() to help avoid this problem. There's a bit >> more than that, actually. > > The challenge is that fixing this "the right away" is probably not > something we can backport into an LTS kernel, whether it's 5.15 or > 5.10... or 4.19. > > The only thing which can probably survive getting backported is > something like this. It won't make the right thing happen if someone > is trying to RDMA or call process_vm_writev() into a file-backed > memory region --- but I'm not sure I care. Certainly if the goal is > to make Android kernels, I'm pretty sure they are't either using RDMA, > and I suspect they are probably masking out the process_vm_writev(2) > system call (at least, for Android O and newer). So if the goal is to > just to close some Syzbot bugs, what do folks think about this? > > - Ted Hi Ted! This seems reasonable...-ish. Although this could turn into a pretty grand game of whack-a-mole, one filesystem at a time. :) > > commit 7711b1fda6f7f04274fa1cba6f092410262b0296 > Author: Theodore Ts'o <tytso@....edu> > Date: Thu Feb 17 22:54:03 2022 -0500 > > ext4: work around bugs in mm/gup.c that can cause ext4 to BUG() > > [un]pin_user_pages_remote is dirtying pages without properly warning > the file system in advance (or faulting in the file data if the page Just a small thing I'll say once, to get it out of my system. No action required here, I just want it understood: Before commit 803e4572d7c5 ("mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()"), you would have written that like this: "process_vm_writev() is dirtying pages without properly warning the file system in advance..." Because, there were many callers that were doing this: get_user_pages*() ...use the pages... for_each_page() { set_page_dirty*() put_page() } anyway, moving on smartly... > is not yet in the page cache). This was noted by Jan Kara in 2018[1] > and more recently has resulted in bug reports by Syzbot in various > Android kernels[2]. > > Fixing this for real is non-trivial, and will never be backportable > into stable kernels. So this is a simple workaround that stops the > kernel from BUG()'ing. The changed pages will not be properly written > back, but given that the current gup code is missing the "read" in > "read-modify-write", the dirty page in the page cache might contain > corrupted data anyway. > > [1] https://www.spinics.net/lists/linux-mm/msg142700.html (Sorry my earlier response mangled this link. I've manually fixed it here, and am working with our IT to get the root cause addressed.) > [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com > > Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@...kaller.appspotmail.com > Reported-by: Lee Jones <lee.jones@...aro.org> > Signed-off-by: Theodore Ts'o <tytso@....edu> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 01c9e4f743ba..3b2f336a90d1 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1993,6 +1993,15 @@ static int ext4_writepage(struct page *page, > else > len = PAGE_SIZE; > > + /* Should never happen but for buggy gup code */ > + if (!page_has_buffers(page)) { > + ext4_warning_inode(inode, > + "page %lu does not have buffers attached", page->index); I see that ext4_warning_inode() has rate limiting, but it doesn't look like it's really intended for a per-page rate. It looks like it is per-superblock (yes?), so I suspect one instance of this problem, with many pages involved, could hit the limit. Often, WARN_ON_ONCE() is used with per-page assertions. That's not great either, but it might be better here. OTOH, I have minimal experience with ext4_warning_inode() and maybe it really is just fine with per-page failure rates. thanks, -- John Hubbard NVIDIA > + ClearPageDirty(page); > + unlock_page(page); > + return 0; > + } > + > page_bufs = page_buffers(page); > /* > * We cannot do block allocation or other extent handling in this > @@ -2594,6 +2603,14 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > wait_on_page_writeback(page); > BUG_ON(PageWriteback(page)); > > + /* Should never happen but for buggy gup code */ > + if (!page_has_buffers(page)) { > + ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", page->index); > + ClearPageDirty(page); > + unlock_page(page); > + continue; > + } > + > if (mpd->map.m_len == 0) > mpd->first_page = page->index; > mpd->next_page = page->index + 1;
Powered by blists - more mailing lists