linux-kernel - Re: [BUGFIX 2/2] x86, mce, inject: Make injected mce valid only during faked handler call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090922152347.GB31702@elte.hu>
Date:	Tue, 22 Sep 2009 17:23:47 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Huang Ying <ying.huang@...el.com>
Cc:	"H. Peter Anvin" <hpa@...or.com>, Andi Kleen <ak@...ux.intel.com>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [BUGFIX 2/2] x86, mce, inject: Make injected mce valid only
	during faked handler call


* Huang Ying <ying.huang@...el.com> wrote:

> In current implementation, injected mce is valid from MCE is injected 
> to MCE is processed by faked handler call. It is possible for it to be 
> consumed by real machine_check_poll. This may confuse real system 
> error and mce test suite.
> 
> To fix this, this patch introduces another flag MCJ_VALID to indacate 
> the MCE entry is valid for injector but not for handler, while 
> mce.finished is used to indicate the MCE entry is valid for handler. 
> mce.finished is enabled only during faked MCE handler call and 
> protected by IRQ disabling. This make it impossible for real 
> machine_check_poll to consume it.
> 
> Signed-off-by: Huang Ying <ying.huang@...el.com>

here's a fixed up changelog for that, please use it for subsequent 
submissions:

 In the current implementation, injected MCE is valid from the point
 the MCE is injected to the point the MCE is processed by the faked
 handler call.

 This has an undesired side-effect: it is possible for it to be
 consumed by real machine_check_poll. This may confuse a real system
 error and may confuse the mce test suite.

 To fix this, this patch introduces another flag MCJ_VALID to
 indicate that the MCE entry is valid for injector but not for the
 handler. Another flag, mce.finished is used to indicate the MCE
 entry is valid for the handler.

 mce.finished is enabled only during faked MCE handler call and
 protected by IRQ disabling. This make it impossible for real
 machine_check_poll to consume it.


> ---
>  arch/x86/include/asm/mce.h              |   17 +++++++++++------
>  arch/x86/kernel/cpu/mcheck/mce-inject.c |   23 ++++++++++++++++-------
>  2 files changed, 27 insertions(+), 13 deletions(-)
> 
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -38,13 +38,18 @@
>  #define MCM_ADDR_MEM	 3	/* memory address */
>  #define MCM_ADDR_GENERIC 7	/* generic */
>  
> -#define MCJ_CTX_MASK		3
> +#define _MCJ_NMI_BROADCAST	2    /* do NMI broadcasting */
> +#define _MCJ_EXCEPTION		3    /* raise as exception */
> +#define _MCJ_VALID		4    /* entry is valid for injector */
> +
> +#define MCJ_CTX_MASK		0x03
>  #define MCJ_CTX(flags)		((flags) & MCJ_CTX_MASK)
> -#define MCJ_CTX_RANDOM		0    /* inject context: random */
> -#define MCJ_CTX_PROCESS		1    /* inject context: process */
> -#define MCJ_CTX_IRQ		2    /* inject context: IRQ */
> -#define MCJ_NMI_BROADCAST	4    /* do NMI broadcasting */
> -#define MCJ_EXCEPTION		8    /* raise as exception */
> +#define MCJ_CTX_RANDOM		0x00    /* inject context: random */
> +#define MCJ_CTX_PROCESS		0x01    /* inject context: process */
> +#define MCJ_CTX_IRQ		0x02    /* inject context: IRQ */
> +#define MCJ_NMI_BROADCAST	(1 << _MCJ_NMI_BROADCAST)
> +#define MCJ_EXCEPTION		(1 << _MCJ_EXCEPTION)
> +#define MCJ_VALID		(1 << _MCJ_VALID)

This is ugly and fragile. MCJ_VALID and _MCJ_VALID are both integers and 
just a single unremarkable character of a typo away from doing the wrong 
thing.

Please use the standard kernel convention to distinguish to name bits 
and masks:

  MCJ_VALID_BIT
  MCJ_VALID_MASK

That makes it far less likely to typo them - and it also makes the end 
result far more readable.

Furthermore, please use better names for these constants.

For example MJC_VALID is a misnomer: it indicates that the message is 
'valid'. While in reality it wants to say that the message is 
'artificial' and should not be handled by a real #MC event.

Also, what does MCJ mean? It has absolutely zero first-glance meaning, 
which is not good.

>  /* Fields are zero when not available */
>  struct mce {
> --- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
> @@ -32,16 +32,16 @@ static void inject_mce(struct mce *m)
>  
>  	/* Make sure noone reads partially written injectm */
>  	i->finished = 0;
> +	clear_bit(_MCJ_VALID, (unsigned long *)&i->inject_flags);

Most of arch/x86/kernel/cpu/mcheck/mce-inject.c is rather ugly. 
mce->inject_flag is u8 due to a silly ABI and (unsigned long *)
casts now litter the code.

What we want instead is a cleaner design for MCE injection.

A cleaner, more workable approach would be to create clean function call 
interfaces for low level hardware that we extract MCE information from. 
Incidentally, such a clean abstraction is mostly what is needed for 
adding generic events as well - so the two go hand in hand.

So for example, right now machine_check_poll() does the following, in a 
nutshell:

 - reads the global status
 - in a loop:
   - reads the bank status
   - [optional] reads misc
   - [optional] reads addr
   - logs the event
   - writes status

the current injector logic is incomplete because it does not allow the 
proper simulation of all aspects of this loop - for example multiple, 
different events pending are not supported.

The reason is a bug in its design: it has been shoe-horned into the 
limited concept of write()-ing to /dev/mcelog and building something 
from there - while in reality an MCE message is not demux-able in the 
multi-message scenario.

A cleaner structure of this code would be:

 - reads the global status
 - in a loop:
   - reads the bank status
   - [optional] reads misc
   - [optional] reads addr
   - lowlevel_mce_event(gstatus, status, misc, addr);
   - writes status

And then hooking up the injector with lowlevel_mce_event(), not by 
trying to shoehorn it into at the MSR level via mce_rdmsr().

I.e. first minimally extract raw hardware info - then pass that to a mid 
level function. The injector can then interface into this mid level 
function.

Note, there wouldnt be need for the 'injectm' mess either - which caused 
this bug to begin with. The injector injects into the mid level function 
- which from that point on does not care whether it's an injected 
message or not. (_maybe_ pass along a status flag that directs the 
mid-level function to for example not panic the system - but otherwise 
dont do this kind of messy injectm driven global state logic.)

A _real_ injector would work at the hardware level anyway, or could 
perhaps be done as a KVM extension to simulate the MCE MSR environment 
much more fully - there's no way to simulate all aspects of MCEs and 
doing it at the rdmsr level here with simulated state from an injected 
struct mce is pretty limiting and incomplete.

Ok?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/