[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DF6CC58.8050601@jp.fujitsu.com>
Date: Tue, 14 Jun 2011 11:50:00 +0900
From: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To: Tony Luck <tony.luck@...el.com>
CC: Avi Kivity <avi@...hat.com>, Borislav Petkov <bp@...64.org>,
Ingo Molnar <mingo@...e.hu>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Huang, Ying" <ying.huang@...el.com>
Subject: Re: [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement
task return notifier
(2011/06/14 2:13), Tony Luck wrote:
> On Mon, Jun 13, 2011 at 9:31 AM, Avi Kivity <avi@...hat.com> wrote:
>> I don't think a user_return_notifier is needed here. You don't just want to
>> do things before a userspace return, you also want to do them soon. A user
>> return notifier might take a very long time to run, if a context switch
>> occurs to a thread that spends a lot of time in the kernel (perhaps a
>> realtime thread).
>>
>> So I think the best choice here is MCE -> irq_work -> realtime kernel thread
>> (or work queue)
>
> In the AO (action optional case (e.g. patrol scrubber) - there isn't much rush.
> We'd like to process things "soon" (before someone hits the corrupt location)
> but we don't need to take extraordinary efforts to make "soon" happen.
>
> In the AR (action required - instruction or data fetch from a corrupted
> memory location) our main priority is making sure that we don't continue
> the task that hit the error - because we don't want to hit it again (as Boris
> said, on Intel cpus this is very disruptive to the system as every cpu is
> sent the machine check signal - and the code has to read a large number
> of slow "msr" registers to figure out what happened. If we can guarantee
> that we won't run this task - then the time pressure is greatly reduced.
>
> So if we can do:
>
> MCE -> irq_work -> make-task-not-runnable -> thread-or-work-queue
>
> in a reliable way, then that would meet the needs. PeterZ didn't like the
> idea of setting TASK_STOPPED or _UNINTERRUPTIBLE in NMI
> context in the MC handler - but I think he was okay with it inside the
> irq_work handler.
>
> -Tony
I've made small patches to clear things. Could you take a look?
These are based on my cleanup patch set:
https://lkml.org/lkml/2011/6/7/677
Thanks,
H.Seto
arch/x86/kernel/cpu/mcheck/mce.c | 52 ++++++++++++++++++++-----------------
1 files changed, 28 insertions(+), 24 deletions(-)
Hidetoshi Seto (2):
x86, mce: introduce mce_memory_failure_process
x86, mce: rework use of TIF_MCE_NOTIFY
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists