lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 25 Mar 2023 10:26:06 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        Michal Hocko <mhocko@...e.com>,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: WARN_ON in move_normal_pmd

On Sat, Mar 25, 2023 at 10:06 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> So what I'm saying is that *if* we start out with that situation, and
> we have that
>
>     old = 0x1fff000
>     new = 1dff000
>     len = 0x201000
>
> we could easily decode "let's just move the whole PMD", and expand the
> move to be
>
>     old = 0x1e00000
>     new = 0x1c00000
>     len = 0x400000
>
> instead. And then instead of moving PTE's around at first, we'd move
> PMD's around *all* the time, and turn this into that "simple case
> (a)".
>
> NOTE! For this to work, there must be no mapping right below 'old' or
> 'new', of course. But during the execve() startup, that should be
> trivially true.
>
> See what I'm saying?

Also note that my comments about "this can be tested with mremap()"
are because the above optimization works and is valid even when old
and new are not originally overlapping, but they overlap after the
expansion.

IOW, imagine that you have a 2GB mapping, but it is not 2GB-aligned
virtually, and you want to move that mapping down by 2GB.

Now, because that 2GB mapping is *not* 2GB-aligned, it actually takes
up *two* PMD entries. But if that mapping is the only thing that
exists in those two PMD entries, and the PMD entry below it is clear
(because there is no mapping right below the new address), then we can
still do that unaligned 2GB mapping move entirely at the PMD level.

So instead of wasting time to move it one page at a time (until it is
2GB aligned), we could just move two PMD entries around.

Here's a (UNTESTED! It compiles, but that's it) user test-case for
this situation:

  #define _GNU_SOURCE
  #include <sys/mman.h>
  #include <string.h>

  /* Pick some random 2GB-aligned address that isn't near anything else */
  #define GB (1ul << 20)
  #define VA ((void *)(128 * GB))

  #define old (VA+GB)
  #define new (VA-GB)
  #define len (2*GB)

  int main(int argc, char **argv)
  {
        void *addr;

        addr = mmap(old, len,
                PROT_READ | PROT_WRITE,
                MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED,
                -1, 0);
        memset(addr, 0xff, len);
        mremap(old, len, len,
                MREMAP_MAYMOVE | MREMAP_FIXED, new);
        return 0;
  }

and I claim that that mremap() right now ends up doing the whole 2GB
page table move one page at a time, but it *should* be doable as just
two PMD entry moves.

See?

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ