lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 17 Nov 2014 14:26:54 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	Borislav Petkov <bp@...en8.de>, Andi Kleen <andi@...stfloor.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

On Mon, Nov 17, 2014 at 1:55 PM, Luck, Tony <tony.luck@...el.com> wrote:
>>> However, I'd like to be very sure this thing doesn't introduce any
>>> regressions to the MCA code. So even if Tony's testing passes, I'd like
>>> to be very conservative here and stress it more than usual. Because once
>>> this thing hits upstream and stuff starts breaking, it'll be a serious
>>> PITA reverting it.
>
> The test I left running on Friday was just running the stack-switch asm
> patch, without any mce.c changes.  It died at 16000 iterations with the
> mce synchronization issue.

I still wonder whether the timeout code is the real culprit.  My patch
will slow down entry into do_machine_check by tens of cycles, several
cachelines, and possibly a couple of TLB misses.  Given that the
timing seemed marginal to me, it's possible (albeit not that likely)
that it pushed the time needed for synchronization into the range of
unreliability.

Any chance you can retry it at some point with that USEC_PER_SEC thing
changed to NSEC_PER_SEC and SPINUNIT set to something closer to 10
than 100?

--Andy

>
> This morning I started a new test with all the mce changes (no TIF_MCE_NOTIFY,
> just process the recovery in the tail of do_machine_check().
>
> It just passed the 18000 point, and it still going.  In addition I've been throwing
> the odd "make -j144" kernel build at the machine so we check out the non-idle
> paths too.
>
> -Tony



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ