lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi=XvCZ9r897LjEb4ZarLzLtKN1p+Fyig+F2fmQDF8GSA@mail.gmail.com>
Date: Sun, 7 Jul 2024 17:08:29 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Hildenbrand <david@...hat.com>
Cc: "Jason A. Donenfeld" <Jason@...c4.com>, linux-kernel@...r.kernel.org, patches@...ts.linux.dev, 
	tglx@...utronix.de, linux-crypto@...r.kernel.org, linux-api@...r.kernel.org, 
	x86@...nel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, "Carlos O'Donell" <carlos@...hat.com>, 
	Florian Weimer <fweimer@...hat.com>, Arnd Bergmann <arnd@...db.de>, Jann Horn <jannh@...gle.com>, 
	Christian Brauner <brauner@...nel.org>, David Hildenbrand <dhildenb@...hat.com>, linux-mm@...ck.org
Subject: Re: [PATCH v21 1/4] mm: add VM_DROPPABLE for designating always
 lazily freeable mappings

On Sun, 7 Jul 2024 at 14:01, David Hildenbrand <david@...hat.com> wrote:
>
> At least MAP_DROPPABLE doesn't quite make sense with hugetlb, but at least
> the other ones do have semantics with hugetlb?

Hmm.

How about we just say that VM_DROPPABLE really is something separate
from MAP_PRIVATE or MAP_SHARED..

And then we make the rule be that VM_DROPPABLE is never dumped and
always dropped on fork, just to make things simpler.

It not only avoids a flag, but it actually makes sense: the pages
aren't stable for dumping anyway, and not copying them on fork() not
only avoids some overhead, but makes it much more reliable and
testable.

IOW, how about taking this approach:

   --- a/include/uapi/linux/mman.h
   +++ b/include/uapi/linux/mman.h
   @@ -17,5 +17,6 @@
    #define MAP_SHARED  0x01            /* Share changes */
    #define MAP_PRIVATE 0x02            /* Changes are private */
    #define MAP_SHARED_VALIDATE 0x03    /* share + validate extension flags */
   +#define MAP_DROPPABLE       0x08    /* 4 is not in MAP_TYPE on parisc? */

    /*

with do_mmap() doing:

   --- a/mm/mmap.c
   +++ b/mm/mmap.c
   @@ -1369,6 +1369,23 @@ unsigned long do_mmap(struct file *file,
                        pgoff = 0;
                        vm_flags |= VM_SHARED | VM_MAYSHARE;
                        break;
   +            case MAP_DROPPABLE:
   +                    /*
   +                     * A locked or stack area makes no sense to
   +                     * be droppable.
   +                     *
   +                     * Also, since droppable pages can just go
   +                     * away at any time, it makes no sense to
   +                     * copy them on fork or dump them.
   +                     */
   +                    if (flags & MAP_LOCKED)
   +                            return -EINVAL;
   +                    if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
   +                            return -EINVAL;
   +
   +                    vm_flags |= VM_DROPPABLE;
   +                    vm_flags |= VM_WIPEONFORK | VM_DONTDUMP;
   +                    fallthrough;
                case MAP_PRIVATE:
                        /*
                         * Set pgoff according to addr for anon_vma.

which looks rather simple.

The only oddity is that parisc thing - every other archiecture has the
MAP_TYPE bits being 0xf, but parisc uses 0x2b (also four bits, but
instead of the low four bits it's 00101011 - strange).

So using 8 as a MAP_TYPE bit for MAP_DROPPABLE works everywhere, and
if we eventually want to do a "signaling" MAP_DROPPABLE we could use
9.

This has the added advantage that if somebody does this on an old
kernel,. they *will* get an error. Because unlike the 'flag' bits in
general, the MAP_TYPE bit space has always been tested.

Hmm?

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ