linux-kernel - Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150924212541.GA13483@linux.intel.com>
Date:	Thu, 24 Sep 2015 14:25:41 -0700
From:	"Raj, Ashok" <ashok.raj@...el.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	"Luck, Tony" <tony.luck@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	Ashok Raj <ashok.raj@...el.com>
Subject: Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core
 parts

Hi Boris

I should have expanded on it.. 

On Thu, Sep 24, 2015 at 11:07:33PM +0200, Borislav Petkov wrote:
> 
> How are you ever going to call into those from an offlined CPU?!
> 
> And that's easy:
> 
> 	if (!cpu_online(cpu))
> 		return;
> 

The last patch of that series had 2 changes.

1. Allow offline cpu's to participate in the rendezvous. Since in the odd
chance the offline cpus have any errors collected we can still report them.
(we changed mce_start/mce_end to use cpu_present_mask instead of just 
online map). 

Without this change today if i were to inject an broadcast MCE
it ends up hanging, since the offline cpu is also incrementing mce_callin.
It will always end up more than cpu_online_mask by the number of cpu's
logically offlined

Consider for e.g. if 2 thread of the core are offline. And the MLC picks up
an error. Other cpus in the socket can't access them. Only way is to let those 
CPUs read and report their own banks as they are core scoped. In upcoming CPUs
we have some banks that can be thread scoped as well.

Its understood OS doesn't execute any code on those CPUs. But SMI can still
run on them, and could collect errors that can be logged.

2. If the cpu is offline, we copied them to mce_log buffer, and them copy 
those out from the rendezvous master during mce_reign().

If we were to replace this mce_log_add() with gen_pool_add(), then i would 
have to call mce_gen_pool_add() from the offline CPU. This will end up calling
RCU functions. 

We don't want to leave any errors reported by the offline CPU for purpose
of logging. It is rare, but still interested in capturing those errors if they
were to happen.

Does this help?

Cheers,
Ashok

> > Also the function doesn't seem safe to be called in NMI context. Although
> 
> That's why it is a lockless buffer - we added it *exactly* because we didn't
> want to call printk in an NMI context. So please expand...
> 

S
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/