[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YahHZOnT1Uh41XnP@casper.infradead.org>
Date: Thu, 2 Dec 2021 04:11:16 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Jann Horn <jannh@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Jan Kara <jack@...e.cz>,
Kirill Shutemov <kirill@...temov.name>,
Oleg Nesterov <oleg@...hat.com>,
Christoph Hellwig <hch@....de>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Mike Kravetz <mike.kravetz@...cle.com>
Subject: Re: [5.4 PATCH] mm/gup: Do not force a COW break on file-backed
memory
On Thu, Dec 02, 2021 at 04:51:47AM +0100, Jann Horn wrote:
> On Thu, Dec 2, 2021 at 12:18 AM Matthew Wilcox (Oracle)
> <willy@...radead.org> wrote:
> > Commit 17839856fd58 ("gup: document and work around "COW can break either
> > way" issue") forces a COW break, even for read-only GUP. This interacts
> > badly with CONFIG_READ_ONLY_THP_FOR_FS as it tries to write to a read-only
> > PMD and follow_trans_huge_pmd() returns NULL which induces an endless
> > loop as __get_user_pages() interprets that as page-not-present, tries
> > to fault it in and retries the follow_page_mask().
> >
> > The issues fixed by 17839856fd58 don't apply to files. We know which way
> > the COW breaks; the page cache keeps the original and any modifications
> > are private to that process. There's no optimisation that allows a
> > process to reuse a file-backed MAP_PRIVATE page. So we can skip the
> > breaking of the COW for file-backed mappings.
> >
> > This problem only exists in v5.4.y; other stable kernels either predate
> > CONFIG_READ_ONLY_THP_FOR_FS or they include commit a308c71bf1e6 ("mm/gup:
> > Remove enfornced COW mechanism").
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> > ---
> > mm/gup.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3ef769529548..d55e02411010 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -176,7 +176,8 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> > */
> > static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
> > {
> > - return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET);
> > + return is_cow_mapping(vma->vm_flags) && vma_is_anonymous(vma) &&
> > + (flags & FOLL_GET);
> > }
>
> To be fully correct, the check would have to check for PageAnon(), not
> whether the mapping is anonymous, right? Since a private file mapping
> can still contain anonymous pages from a prior CoW?
Oh, right. So parent process maps a file with MAP_PRIVATE, writes to
it, gets an anon page, forks. Child stuffs the page into a pipe,
unmaps page. Parent writes to page again, now child can read() the
modification?
The problem is that we don't even get to seeing the struct page with
the current code paths. And we're looking for a fix for RO THP that's
less intrusive for v5.4 than backporting
09854ba94c6a ("mm: do_wp_page() simplification")
1a0cf26323c8 ("mm/ksm: Remove reuse_ksm_page()")
a308c71bf1e6 ("mm/gup: Remove enfornced COW mechanism")
The other patch we've been kicking around (and works) is:
static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned
int flags)
{
- return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET);
+ return is_cow_mapping(vma->vm_flags) &&
+ (!(vma->vm_flags & VM_DENYWRITE)) && (flags & FOLL_GET);
}
That limits the change to be only text pages. Generally programs do
not write to their text pages, and they certainly don't write *secrets*
to their text pages; if somebody else can read it, that's probably not
a problem in the same way as writing to a page of heap.
Powered by blists - more mailing lists