lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 21 Nov 2014 11:06:41 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Steven Rostedt <rostedt@...dmis.org>, Tejun Heo <tj@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Don Zickus <dzickus@...hat.com>, Dave Jones <davej@...hat.com>,
	"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4

On Fri, Nov 21, 2014 at 10:22 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
>  (d) the non-PAE case would look something like this:
>
>     static noinline int vmalloc_fault(unsigned long address)
>     {
>         unsigned index;
>         pgd_t *pgd_dst, pgd_entry;
>
>         /* Make sure we are in vmalloc area: */
>         if (!(address >= VMALLOC_START && address < VMALLOC_END))
>                 return -1;

Side note: I think this is just unnecessary confusion, and generates
big constants for no good reason.

The thing is, the kernel PGD's should always be in sync. In fact, at
PGD allocation time, we just do

     clone_pgd_range(.. KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS);

and it might actually be better to structure this to be that exact same thing.

So instead of checking the address, we could just do

        index = pgd_index(address);
        if (index < KERNEL_PGD_BOUNDARY)
                return -1;

which actually matches our initialization sequence much better anyway.
And avoids those random big constants.

Also, it turns out that this:

        if (pgd_present(pgd_dst[index]))

generates a crazy big constant because of bad compiler issues (the
"pgd_present()" thing only checks the low bit, but it does so on
pgd_flags(), which does "native_pgd_val(pgd) & PTE_FLAGS_MASK", so you
have an insane extra "and" with the constant 0xffffc00000000fff, just
to then "and" it again with "1". It doesn't do that with the first
pgd_present() check, oddly enough.

WTF, gcc?

Anyway, even more importantly, because of the whole issue with nesting
page tables, it's probably best to actually avoid all the
"pgd_present()" etc helpers, because those might be hardcoded to 1
etc. So avoid the whole issue by just accessign the raw data.

Simplify, simplify, simplify. The actual code generation for this all
should be maybe 20 instructions.

Here's the simplified end result. Again, this is TOTALLY UNTESTED. I
compiled it and verified that the code generation looks like what I'd
have expected, but that's literally it.

  static noinline int vmalloc_fault(unsigned long address)
  {
        pgd_t *pgd_dst;
        pgdval_t pgd_entry;
        unsigned index = pgd_index(address);

        if (index < KERNEL_PGD_BOUNDARY)
                return -1;

        pgd_entry = init_mm.pgd[index].pgd;
        if (!pgd_entry)
                return -1;

        pgd_dst = __va(PAGE_MASK & read_cr3());
        pgd_dst += index;

        if (pgd_dst->pgd)
                return -1;

        ACCESS_ONCE(pgd_dst->pgd) = pgd_entry;
        return 0;
  }
  NOKPROBE_SYMBOL(vmalloc_fault);

Hmm? Does anybody see anything fundamentally wrong with this?

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ