linux-kernel - Re: [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4DF6CC58.8050601@jp.fujitsu.com>
Date:	Tue, 14 Jun 2011 11:50:00 +0900
From:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To:	Tony Luck <tony.luck@...el.com>
CC:	Avi Kivity <avi@...hat.com>, Borislav Petkov <bp@...64.org>,
	Ingo Molnar <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>
Subject: Re: [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement
 task return notifier

(2011/06/14 2:13), Tony Luck wrote:
> On Mon, Jun 13, 2011 at 9:31 AM, Avi Kivity <avi@...hat.com> wrote:
>> I don't think a user_return_notifier is needed here.  You don't just want to
>> do things before a userspace return, you also want to do them soon.  A user
>> return notifier might take a very long time to run, if a context switch
>> occurs to a thread that spends a lot of time in the kernel (perhaps a
>> realtime thread).
>>
>> So I think the best choice here is MCE -> irq_work -> realtime kernel thread
>> (or work queue)
> 
> In the AO (action optional case (e.g. patrol scrubber) - there isn't much rush.
> We'd like to process things "soon" (before someone hits the corrupt location)
> but we don't need to take extraordinary efforts to make "soon" happen.
> 
> In the AR (action required - instruction or data fetch from a corrupted
> memory location) our main priority is making sure that we don't continue
> the task that hit the error - because we don't want to hit it again (as Boris
> said, on Intel cpus this is very disruptive to the system as every cpu is
> sent the machine check signal - and the code has to read a large number
> of slow "msr" registers to figure out what happened. If we can guarantee
> that we won't run this task - then the time pressure is greatly reduced.
> 
> So if we can do:
> 
>   MCE -> irq_work -> make-task-not-runnable -> thread-or-work-queue
> 
> in a reliable way, then that would meet the needs.  PeterZ didn't like the
> idea of setting TASK_STOPPED or _UNINTERRUPTIBLE in NMI
> context in the MC handler - but I think he was okay with it inside the
> irq_work handler.
> 
> -Tony

I've made small patches to clear things. Could you take a look?

These are based on my cleanup patch set:
https://lkml.org/lkml/2011/6/7/677

Thanks,
H.Seto

 arch/x86/kernel/cpu/mcheck/mce.c |   52 ++++++++++++++++++++-----------------
 1 files changed, 28 insertions(+), 24 deletions(-)

Hidetoshi Seto (2):
      x86, mce: introduce mce_memory_failure_process
      x86, mce: rework use of TIF_MCE_NOTIFY


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/