[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210128175735.GB2120@zn.tnic>
Date: Thu, 28 Jan 2021 18:57:35 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: x86@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Darren Hart <dvhart@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user
recovery
On Tue, Jan 26, 2021 at 02:36:05PM -0800, Luck, Tony wrote:
> In some cases Linux might context switch to something else. Perhaps
> this task even gets picked up by another CPU to run the task work
> queued functions. But I imagine that the context switch should act
> as a barrier ... shouldn't it?
I'm given to understand that the #MC from user is likely to schedule and
a context switch has a barrier character.
> After a few cycles of the test injection to user mode, I saw an
> overflow in the machine check bank. As if it hadn't been cleared
> from the previous iteration ...
This sounds weird. As if something else is happening which we haven't
thought of yet...
> When the tests were failing, code was on top of v5.11-rc3. Latest
> experiments moved to -rc5. There's just a tracing fix from
> PeterZ between rc3 and rc5 to mce/core.c:
>
> 737495361d44 ("x86/mce: Remove explicit/superfluous tracing")
>
> which doesn't appear to be a candidate for the problems I saw.
Doesn't look like it.
> This is the bit that changed during my detour using atomic_t mce_count.
> I added the local variable to capture value from atomic_inc_return(), then
> used it later, instead of a bunch of atomic_read() calls.
>
> I kept it this way because "if (count == 1)" is marginally easier to read
> than "if (current->mce_count++ == 0)"
Right.
So still no explanation why it would fail before. ;-\
Crazy idea: if you still can reproduce on -rc3, you could bisect: i.e.,
if you apply the patch on -rc3 and it explodes and if you apply the same
patch on -rc5 and it works, then that could be a start... Yeah, don't
have a better idea here. :-\
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists