[<prev] [next>] [day] [month] [year] [list]
Message-Id: <FDBACF11-D9F6-4DE5-A0D4-800903A243B7@gmail.com>
Date: Tue, 27 May 2014 22:09:54 -0700
From: Tony Luck <tony.luck@...il.com>
To: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc: "iskra@....anl.gov" <iskra@....anl.gov>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Andi Kleen <andi@...stfloor.org>, Borislav Petkov <bp@...e.de>,
"gong.chen@...ux.jf.intel.com" <gong.chen@...ux.jf.intel.com>
Subject: Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
I'm exploring options to see what writers of threaded applications might want/need. I'm very doubtful that they would really want "broadcast to all threads". What if there are hundreds or thousands of threads? We send the signals from the context of the thread that hit the error. But that might take a while. Meanwhile any of those threads that were already scheduled on other CPUs are back running again. So there are big races even if we broadcast.
Sent from my iPhone
> On May 27, 2014, at 17:15, Naoya Horiguchi <n-horiguchi@...jp.nec.com> wrote:
>
> On Tue, May 27, 2014 at 03:53:55PM -0700, Tony Luck wrote:
>>> - make sure that every thread in a recovery aware application should have
>>> a SIGBUS handler, inside which
>>> * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
>>> * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread
>>
>> But how does the kernel know which is the special thread that
>> should see the "AO" signal? Broadcasting the signal to all
>> threads seems to be just as likely to cause problems to
>> an application as the h/w broadcasting MCE to all processors.
>
> I thought that kernel doesn't have to know about which thread is the
> special one if the AO signal is broadcasted to all threads, because
> in such case the special thread always gets the AO signal.
>
> The reported problem happens only the application sets PF_MCE_EARLY flag,
> and such application is surely recovery aware, so we can assume that the
> coders must implement SIGBUS handler for all threads. Then all other threads
> but the special one can intentionally ignore AO signal. This is to avoid the
> default behavior for SIGBUS ("kill all threads" as Kamil said in the previous
> email.)
>
> And I hope that downside of signal broadcasting is smaller than MCE
> broadcasting because the range of broadcasting is limited to a process group,
> not to the whole system.
>
> # I don't intend to rule out other possibilities like adding another prctl
> # flag, so if you have a patch, that's would be great.
>
> Thanks,
> Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists