lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161130084828.7jsi6r6pxztj5dmz@pd.tnic>
Date:   Wed, 30 Nov 2016 09:48:28 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Jiri Olsa <jolsa@...hat.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Josh Triplett <josh@...htriplett.org>,
        Andi Kleen <andi@...stfloor.org>,
        Jan Stancek <jstancek@...hat.com>
Subject: Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> The issue is that you obvioulsy start with the assumption, that the machine
> has this bug. As a consequence the machine is brute forced into tick
> broadcast mode, which cannot be reverted when you clear that misfeature
> after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> forced into broadcast mode, which is not what you want.
> 
> As far as I understood the whole magic, this C1E misfeature takes only
> effect _after_ ACPI has been initialized. So instead of setting the bug in
> early boot and therefor forcing the broadcast nonsense, we should only set
> it when ACPI has actually detected it.

Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
there's a window where we don't definitively know yet whether the box is
actually going to enter C1E or not.

  [ I presume the reason why we have to do the proper detection after
    ACPI has been initialized is because the frickelware decides whether
    to do C1E entry or not and then sets those bits in the MSR (or not). ]

If in that window we enter idle and we're on an affected machine and we
*don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
the main reason this fix was even done.

So, if we "prematurely" switch to broadcast mode on the affected CPUs,
we're ok, it will be detected properly later and we're in broadcast
mode already.

Now, on those machines which are not affected and we clear
X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
even freeing it so that they can do default_idle().

But you're saying tick_broadcast_enter() is irreversible?

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ