[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVLwzhQVMbu2c=NCzAfEBWa8=SOpfu5gPoxpuTBcNngRA@mail.gmail.com>
Date: Fri, 14 Nov 2014 17:21:40 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "H. Peter Anvin" <hpa@...or.com>,
Steven Rostedt <rostedt@...dmis.org>,
Andi Kleen <andi@...stfloor.org>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Request for help: what did I do wrong with idtentry?
On Fri, Nov 14, 2014 at 4:52 PM, Luck, Tony <tony.luck@...el.com> wrote:
>> causes Tony's MCE stress test to fail, presumably when some CPU either
>> becomes permanently non-interruptable or otherwise wanders off into
>> the weeds.
>
> It might be that recent "improvements" I made to my test harness have
> messed things up. I trimmed one delay (between injection and consumption),
> but it turns out the other delay in the code never get executed (because we
> take a SIGBUS on consumption and then longjmp). So my test that used
> to pause a bit between iterations were running almost back to back
> consumption and injection of next error.
Hmm.
Am I right that the timeout code in mce.c is overly aggressive, too?
>
> This meant the serial console was a huge bottleneck (especially as my
> development BIOS is also kicking its own debug junk onto the same port).
> Some of the errors pointed obliquely at console.
>
> I've slowed things back down to where they used to be, and things are
> ticking along nicely (with 0.6 second delay between iterations). Just
> passed the 2800 mark and still going. I'm leaving it running over the
> weekend - if it makes it into the 50k level I'm willing to call it good.
>
Phew :)
FWIW, I've confirmed that my code survives int3 from userspace, int3
from normal kernel code, and int3 from kernel with user gs. I'm not
completely thrilled with what it does to double_fault, though. If we
somehow get a double fault caused by an interrupt hitting userspace
with a bad kernel_stack, then we'll end up page faulting in the
double_fault prologue. I'm not convinced that this is worth worrying
about. It would be easy enough to fix, though, even if it would
further uglify the code.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists