linux-kernel - Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrU6x4yVQy57KGm-Rk==ZYNqHorytAs6WvgKNVoT+Dk9Uw@mail.gmail.com>
Date:	Mon, 17 Nov 2014 14:26:54 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	Borislav Petkov <bp@...en8.de>, Andi Kleen <andi@...stfloor.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

On Mon, Nov 17, 2014 at 1:55 PM, Luck, Tony <tony.luck@...el.com> wrote:
>>> However, I'd like to be very sure this thing doesn't introduce any
>>> regressions to the MCA code. So even if Tony's testing passes, I'd like
>>> to be very conservative here and stress it more than usual. Because once
>>> this thing hits upstream and stuff starts breaking, it'll be a serious
>>> PITA reverting it.
>
> The test I left running on Friday was just running the stack-switch asm
> patch, without any mce.c changes.  It died at 16000 iterations with the
> mce synchronization issue.

I still wonder whether the timeout code is the real culprit.  My patch
will slow down entry into do_machine_check by tens of cycles, several
cachelines, and possibly a couple of TLB misses.  Given that the
timing seemed marginal to me, it's possible (albeit not that likely)
that it pushed the time needed for synchronization into the range of
unreliability.

Any chance you can retry it at some point with that USEC_PER_SEC thing
changed to NSEC_PER_SEC and SPINUNIT set to something closer to 10
than 100?

--Andy

>
> This morning I started a new test with all the mce changes (no TIF_MCE_NOTIFY,
> just process the recovery in the tail of do_machine_check().
>
> It just passed the 18000 point, and it still going.  In addition I've been throwing
> the odd "make -j144" kernel build at the machine so we check out the non-idle
> paths too.
>
> -Tony



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/