lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200917190332.GB133226@xz-x1>
Date:   Thu, 17 Sep 2020 15:03:32 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Jason Gunthorpe <jgg@...pe.ca>, John Hubbard <jhubbard@...dia.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Maya B . Gokhale" <gokhale2@...l.gov>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Marty Mcfadden <mcfadden8@...l.gov>,
        Kirill Shutemov <kirill@...temov.name>,
        Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
        Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

On Thu, Sep 17, 2020 at 11:26:01AM -0700, Linus Torvalds wrote:
> On Thu, Sep 17, 2020 at 11:14 AM Peter Xu <peterx@...hat.com> wrote:
> >
> > In my humble opinion, the real solution is still to use MADV_DONTFORK properly
> > so we should never share the DMA pages with others when we know the fact.
> 
> Is this all just because somebody does a fork() after doing page pinning?
> 
> If so, I feel this should be trivially fixed in  copy_one_pte().
> That's where we currently do
> 
>         /*
>          * If it's a COW mapping, write protect it both
>          * in the parent and the child
>          */
>         if (is_cow_mapping(vm_flags) && pte_write(pte)) {
>                 ptep_set_wrprotect(src_mm, addr, src_pte);
>                 pte = pte_wrprotect(pte);
>         }
> 
> and I feel that that is where we could just change the code to do a
> COW event for pinned pages (and *not* mark the parent write protected,
> since the parent page now isn't a COW page).
> 
> Because if that's the case that Jason is hitting, then I feel that
> really is the correct fix: make sure that the pinning action is
> meaningful.
> 
> As mentioned, I really think the whole (and only) point of page
> pinning is that it should keep the page locked in the page tables. And
> by "locked" I mean exactly that: not just present, but writable.
> 
> And then the "we never COW a pinned page" comes not from the COW code
> doing magic, but by it simply never becoming non-writable - because
> the page table entry is locked!

Looks reasonable to me.

The fork() should be slightly slower though, since we'll need to copy the data
for all the DMA buffers for each of the child processes, even if we should be
pretty sure those processes won't use these pages at all. But it seems a good
approach anyway if we care about the potential breakages in the userspace so
the breakage is turned into perf degrades, and if any userspace noticed such
perf degrade on fork() then people will probably remember to use MADV_DONTFORK
properly since afaict MADV_DONTFORK can remove this extra overhead..

Another side effect I can think of is that we'll bring some uncertainty to
fork() starting from when page_maybe_dma_pinned() is used, since it's sometimes
bogus (hpage_pincount_available()==false) so some COWs might be triggered
during fork() even when not necessary if we've got some normal pages with too
many refcounts (over GUP_PIN_COUNTING_BIAS).  But assuming that's not a big
deal since it should be extremely rare, or is it?..

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ