lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 18 Sep 2020 21:28:50 -0300
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Xu <peterx@...hat.com>, John Hubbard <jhubbard@...dia.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Maya B . Gokhale" <gokhale2@...l.gov>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Marty Mcfadden <mcfadden8@...l.gov>,
        Kirill Shutemov <kirill@...temov.name>,
        Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
        Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

On Fri, Sep 18, 2020 at 01:59:41PM -0700, Linus Torvalds wrote:

> Honestly, if we had a completely *reliable* sign of "this page is
> pinned", then I think the much nicer option would be to just say
> "pinned pages will not be copied at all". Kind of an implicit
> VM_DONTCOPY.

It would be simpler to implement, but it makes the programming model
really sketchy. For instance O_DIRECT is using FOLL_PIN, so imagine
this program:

        CPU0                                      CPU1

 a = malloc(1024);
                                                b = malloc(1024);
 read(fd, a, 1024); // FD is O_DIRECT
 ...                                            fork()
                                                  *b = ...
 read completes

Here a and b got lucky and both come from the same page due to the
allocator.

In this case the fork() child in CPU1, would be very surprised that
'b' was not mapped into the fork.

Similiarly, CPU0 would have silent data corruption if the read didn't
deposit data into 'a' - which is a bug we have today. In this race the
COW break of *b might steal the physical page to the child, and *a
won't see the data. For this reason, John is right, fork needs to
eventually do this for O_DIRECT as well.

The copy on fork nicely fixes all of this weird oddball stuff.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ