linux-kernel - Re: [BUG] msr-trace.h:42 suspicious rcu_dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1611301010030.3439@nanos>
Date:   Wed, 30 Nov 2016 10:14:42 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Borislav Petkov <bp@...en8.de>
cc:     Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Jiri Olsa <jolsa@...hat.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Josh Triplett <josh@...htriplett.org>,
        Andi Kleen <andi@...stfloor.org>,
        Jan Stancek <jstancek@...hat.com>
Subject: Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Wed, Nov 30, 2016 at 09:54:58AM +0100, Thomas Gleixner wrote:
> > Right, that's the safe bet. But I'm quite sure that the C1E crap only
> > starts to work _after_ ACPI initialization.
> 
> Yap, I think it is an ACPI decision whether to enter C1E or not. And all
> those boxes which are unaffected - they actually are but since the FW
> doesn't enter C1E, they don't hit the bug.
> 
> > tick_force_broadcast() is irreversible
> 
> So if I let the cores in that small window use amd_e400_idle() and
> then when I detect the machine doesn't enter C1E after all, I do
> tick_broadcast_exit() on all cores in amd_e400_c1e_mask, then clear it
> and free it, that would work?
> 
> Or do you see a better solution?

Start with the bug cleared and select amd_e400_idle(), which will then use
default_idle() and not do any of the broadcast crap.

After ACPI gets initialized check the C1E misfeature and if it's detected,
then set the CPU BUG and send an IPI to all online CPUs.

I think this is safe because the CPU running the ACPI initialization is not
in idle and if any of the other CPUs would be stuck due to the C1E
enablement, then the IPI will kick them out.

At least worth a try.

Thanks,

	tglx