lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjNes0wn0KUMMY6dOK_sN69z2TiGpDZ2cyzYF07s64bXQ@mail.gmail.com>
Date:   Sun, 10 Mar 2019 14:43:13 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Kees Cook <keescook@...omium.org>
Cc:     Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [GIT pull] x86/asm for 5.1

On Sun, Mar 10, 2019 at 4:33 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
>
> +extern volatile unsigned long cr4_pin;
> +
>  static inline void native_write_cr4(unsigned long val)
>  {
> +       unsigned long warn = 0;
> +
> +again:
> +       val |= cr4_pin;
>         asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
> +       /*
> +        * If the MOV above was used directly as a ROP gadget we can
> +        * notice the lack of pinned bits in "val" and start the function
> +        * from the beginning to gain the cr4_pin bits for sure. Note
> +        * that "val" must be volatile to keep the compiler from
> +        * optimizing away this check.
> +        */
> +       if ((val & cr4_pin) != cr4_pin) {
> +               warn = ~val & cr4_pin;
> +               goto again;
> +       }
> +       WARN_ONCE(warn, "Attempt to unpin cr4 bits: %lx; bypass attack?!\n",
> +                 warn);
>  }

Yeah, no.

No way am I pulling a "security fix" that is that stupid, and that
badly documented. The comments are actively *not* matching the code,
and the actual code is so horribly written that it forces the compiler
to do extra stupid and unnecessary things (reloading that 'cr4_pin"
twice _after_ the thing you're trying to fix, for no actual good
reason - it only needs one reload).

The asm could trivially have just had a memory clobber in it, instead
of the wrongheaded - and incorrectly documented - volatile.

And btw, when you do the same thing twice, just do it the same way,
instead of having two different ways of doing it. There was a totally
different model to handling native_write_cr0() having similar issues.

Finally, I suspect this is all subtly buggy anyway, because your
'cr4_pin' thing is a global variable, but the cr4 bits are per-cpu,
and I don't see how this works with the different CPU's setting things
up one after each other, but the first one sets the pinned bits.

So it basically "pins" the bits on CPU's before they are ready, so the
secondary CPU's maghically get the bits set _independently_ of running
the actual setup code to set them. That may *work*, but it sure looks
iffy.

In particular, the whole problem with "the compiler might optimize it
away" comes from the fact that you do the "or" before even writing the
value, which shouldn't have been done anyway, but was forced by the
fact that you used a global mask for something that was really per-cpu
state to be set up (ie you *had* to set the bits wrong on the CPU
early, because you tested the wrong global bits afterwards),

If you had made the pinned bits value be percpu, all of the problems
would have gone away, because it wouldn't have had that bogus "val |=
cr4_pin" before.

So I think this code is
 (a) buggy
 (b) actively incorrectly documented
 (c) inconsistent
 (d) badly designed

and when you have code where the _only_ raison d'etre is security, you
had better have some seriously higher requirements than this.

IOW, the code probably should just do

    DECLARE_PER_CPU_READ_MOSTLY(unsigned long, cr4_pin);

    static inline void native_write_cr4(unsigned long val)
    {
        for (;;) {
                unsigned long pin;
                asm volatile("mov %0,%%cr4": : "r" (val):"memory");
                pin = __this_cpu_read(cr4_pin);
                if (likely((val & pin) == pin)
                        return;
                WARN_ONCE("Attempt to unpin cr4 bits: %lx; bypass
attack?!\n", pin & ~val);
                val |= pin;
        }
    }

which is a lot more straightforward, doesn't have that odd data
pollution between CPU's, and doesn't need to play any odd games.

And no, the above isn't tested, of course. But at least it makes sense
and doesn't have odd volatiles or strange "set global state and
percolate it into percpu stuff by magic". It *might* work.

Would you perhaps want to add a percpu section that is
read-after-boot? Maybe. But honestly, if the attacker can already
modify arbitrary percpu data, you may have bigger issues than cr4.

                  Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ