[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94635da5-ce28-a8fb-84e3-7a9f5240fe6a@google.com>
Date: Wed, 16 Aug 2023 13:27:17 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: "Matthew Wilcox (Oracle)" <willy@...radead.org>
cc: Jens Axboe <axboe@...nel.dk>,
Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>, Hannes Reineke <hare@...e.de>,
linux-mm@...ck.org, linux-block@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] block: Remove special-casing of compound pages
a.k.a "Fix rare user data corruption when using THP" :)
On Mon, 14 Aug 2023, Matthew Wilcox (Oracle) wrote:
> The special casing was originally added in pre-git history; reproducing
> the commit log here:
>
> > commit a318a92567d77
> > Author: Andrew Morton <akpm@...l.org>
> > Date: Sun Sep 21 01:42:22 2003 -0700
> >
> > [PATCH] Speed up direct-io hugetlbpage handling
> >
> > This patch short-circuits all the direct-io page dirtying logic for
> > higher-order pages. Without this, we pointlessly bounce BIOs up to
> > keventd all the time.
>
> In the last twenty years, compound pages have become used for more than
> just hugetlb. Rewrite these functions to operate on folios instead
> of pages and remove the special case for hugetlbfs; I don't think
> it's needed any more (and if it is, we can put it back in as a call
> to folio_test_hugetlb()).
>
> This was found by inspection; as far as I can tell, this bug can lead
> to pages used as the destination of a direct I/O read not being marked
> as dirty. If those pages are then reclaimed by the MM without being
> dirtied for some other reason, they won't be written out. Then when
> they're faulted back in, they will not contain the data they should.
> It'll take a pretty unusual setup to produce this problem with several
> races all going the wrong way.
>
> This problem predates the folio work; it could for example have been
> triggered by mmaping a THP in tmpfs and using that as the target of an
> O_DIRECT read.
>
> Fixes: 800d8c63b2e98 ("shmem: add huge pages support")
No. It's a good catch, but bug looks specific to the folio work to me.
Almost all shmem pages are dirty from birth, even as soon as they are
brought back from swap; so it is not necessary to re-mark them dirty.
The exceptions are pages allocated to holes when faulted: so you did
get me worried as to whether khugepaged could collapse a pmd-ful of
those into a THP without marking the result as dirty.
But no, in v6.5-rc6 the collapse_file() success path has
if (is_shmem)
folio_mark_dirty(folio);
and in v5.10 the same appears as
if (is_shmem)
set_page_dirty(new_page);
(IIRC, that or marking pmd dirty was missed from early shmem THP
support, but fairly soon corrected, and backported to stable then.
I have a faint memory of versions which assembled pmd_dirty from
collected pte_dirtys.)
And the !is_shmem case is for CONFIG_READ_ONLY_THP_FOR_FS: writing
into those pages, by direct IO or whatever, is already prohibited.
It's dem dirty (or not dirty) folios dat's the trouble!
Hugh
> Cc: stable@...r.kernel.org
> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> ---
> block/bio.c | 46 ++++++++++++++++++++++++----------------------
> 1 file changed, 24 insertions(+), 22 deletions(-)
Powered by blists - more mailing lists