lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120625091734.GA25839@gmail.com>
Date:	Mon, 25 Jun 2012 11:17:34 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	ShuoX Liu <shuox.liu@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Borislav Petkov <bp@...en8.de>,
	Yanmin Zhang <yanmin_zhang@...ux.intel.com>,
	"Luck, Tony" <tony.luck@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"andi@...stfloor.org" <andi@...stfloor.org>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v7 2/2] x86 mce: use new printk recursion disabling
 interface


* ShuoX Liu <shuox.liu@...el.com> wrote:

> From: ShuoX Liu <shuox.liu@...el.com>
> 
> On x86 machines, some times MCE happens just when kernel calls printk
> to output some log info to serial console, while usually MCE module in
> kernel is used to print out some hardware error information, such like
> bad cache or bad memory bank. That causes printk recursion and printk
> would omit MCE printk output.
> 
> We hit it when running MTBF testing on Android ATOM mobiles.
> 
> Here in print_mce, we choose to disable printk recursion to make sure
> MCE logs printed out.
> 
> Signed-off-by: Yanmin Zhang <yanmin_zhang@...ux.intel.com>
> Signed-off-by: ShuoX Liu <shuox.liu@...el.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2afcbd2..6056e94 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -242,6 +242,7 @@ static void print_mce(struct mce *m)
>  {
>  	int ret = 0;
>  
> +	printk_recursion_check_disable();
>  	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
>  	       m->extcpu, m->mcgstatus, m->bank, m->status);
>  
> @@ -275,10 +276,13 @@ static void print_mce(struct mce *m)
>  	 * (if the CPU has an implementation for that)
>  	 */
>  	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> -	if (ret == NOTIFY_STOP)
> +	if (ret == NOTIFY_STOP) {
> +		printk_recursion_check_enable();
>  		return;
> +	}
>  
>  	pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
> +	printk_recursion_check_enable();

Ok, this looks useful and it solves a real problem, but I'd 
prefer a better interface: instead of exposing the guts of 
printk to drivers in an unsafe manner (and allowing them to keep 
printk in an unsafe state indefinitely), shouldn't we instead 
introduce printk_emergency() (and variants) that just disable 
the recursion check, do the printk and then enable them?

That way drivers cannot possibly leave the recursion check 
disabled permanently and it would also be much more obvious 
*which* actual printk sites are affected by this exception.

In theory this could be achieved via a new, super-high-prio 
printk level: KERN_CRASH or so, which the core printk code could 
check - without introducing a new side-facility.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ