lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170412091442.dwonfr4dwyta7nvx@pd.tnic>
Date:   Wed, 12 Apr 2017 11:14:42 +0200
From:   Borislav Petkov <bp@...e.de>
To:     Vishal Verma <vishal.l.verma@...el.com>
Cc:     linux-kernel@...r.kernel.org, linux-nvdimm@...ts.01.org,
        x86@...nel.org, Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Tony Luck <tony.luck@...el.com>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from
 'atomic'

On Tue, Apr 11, 2017 at 04:44:57PM -0600, Vishal Verma wrote:
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
> 
>   BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
>   in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
>   [..]
>   Call Trace:
>    dump_stack+0x86/0xc3
>    ___might_sleep+0x178/0x240
>    __might_sleep+0x4a/0x80
>    mutex_lock_nested+0x43/0x3f0
>    ? __lock_acquire+0xcbc/0x1290
>    nfit_handle_mce+0x33/0x180 [nfit]
>    notifier_call_chain+0x4a/0x70
>    atomic_notifier_call_chain+0x6e/0x110
>    ? atomic_notifier_call_chain+0x5/0x110
>    mce_gen_pool_process+0x41/0x70
> 
> Commit 648ed94038c030245a06e4be59744fd5cdc18c40
>       x86/mce: Provide a lockless memory pool to save error records
> Changes the mce notifier callbacks to be run in a process context, and
> this can allow us to use the 'blocking' type notifier, where we can take
> mutexes etc. in the call chain functions.
> 
> Reported-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
> Cc: Borislav Petkov <bp@...e.de>
> Cc: Tony Luck <tony.luck@...el.com>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@...el.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce-genpool.c  | 2 +-
>  arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c          | 8 ++++----
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> While this patch almost solves the problem, I think it is not quite right.
> The x86_mce_decoder_chain is also called from print_mce for fatal machine
> checks, and that is, afaict, still from an atomic context. One thing Tony
> suggested was splitting the notifier chain into two distinct chains, one
> for regular logging and recoverable actions that allows blocking, the
> other from the panic path.

Well, if Mohammad won't come to the mountain...

So the NFIT handler has:

        /* We only care about memory errors */
        if (!(mce->status & MCACOD))
                return NOTIFY_DONE;

what severity are we talking here? Errors which can be reported on the
panic path, i.e., in atomic context or only AO/AR ones which don't raise
an #MC exception?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ