linux-kernel - Re: Request for help: what did I do wrong with idtentry?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVLwzhQVMbu2c=NCzAfEBWa8=SOpfu5gPoxpuTBcNngRA@mail.gmail.com>
Date:	Fri, 14 Nov 2014 17:21:40 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	"H. Peter Anvin" <hpa@...or.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andi Kleen <andi@...stfloor.org>,
	Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Request for help: what did I do wrong with idtentry?

On Fri, Nov 14, 2014 at 4:52 PM, Luck, Tony <tony.luck@...el.com> wrote:
>> causes Tony's MCE stress test to fail, presumably when some CPU either
>> becomes permanently non-interruptable or otherwise wanders off into
>> the weeds.
>
> It might be that recent "improvements" I made to my test harness have
> messed things up.  I trimmed one delay (between injection and consumption),
> but it turns out the other delay in the code never get executed (because we
> take a SIGBUS on consumption and then longjmp).  So my test that used
> to pause a bit between iterations were running almost back to back
> consumption and injection of next error.

Hmm.

Am I right that the timeout code in mce.c is overly aggressive, too?

>
> This meant the serial console was a huge bottleneck (especially as my
> development BIOS is also kicking its own debug junk onto the same port).
> Some of the errors pointed obliquely at console.
>
> I've slowed things back down to where they used to be, and things are
> ticking along nicely (with 0.6 second delay between iterations).  Just
> passed the 2800 mark and still going.  I'm leaving it running over the
> weekend - if it makes it into the 50k level I'm willing to call it good.
>

Phew :)

FWIW, I've confirmed that my code survives int3 from userspace, int3
from normal kernel code, and int3 from kernel with user gs.  I'm not
completely thrilled with what it does to double_fault, though.  If we
somehow get a double fault caused by an interrupt hitting userspace
with a bad kernel_stack, then we'll end up page faulting in the
double_fault prologue.  I'm not convinced that this is worth worrying
about.  It would be easy enough to fix, though, even if it would
further uglify the code.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/