linux-kernel - Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrWPzO1vz+Rcjs78B+wWWi4eu_g__-k9iCHH1G1Mc6af+A@mail.gmail.com>
Date:	Mon, 17 Nov 2014 16:05:00 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	Borislav Petkov <bp@...en8.de>, Andi Kleen <andi@...stfloor.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

On Mon, Nov 17, 2014 at 3:16 PM, Luck, Tony <tony.luck@...el.com> wrote:
>> I still wonder whether the timeout code is the real culprit.  My patch
>> will slow down entry into do_machine_check by tens of cycles, several
>> cachelines, and possibly a couple of TLB misses.  Given that the
>> timing seemed marginal to me, it's possible (albeit not that likely)
>> that it pushed the time needed for synchronization into the range of
>> unreliability.
>
> I think we have very conservative timeouts.  Here are the significant bits from mce
>
> First SPINUNIT ... how many nanoseconds to spin "ndelay(SPINUNIT)"  before pounding on an atomic op to see if we are done:
> #define SPINUNIT 100    /* 100ns */
>
> Initialization of our timeout - comment above this claims we are aiming for "one second"
>
>         cfg->monarch_timeout = USEC_PER_SEC;
>
> Inside mce_start() - set up our timeout. Math says this will be 1e9
>
>         u64 timeout = (u64)mca_cfg.monarch_timeout * NSEC_PER_USEC;

Aha.  I missed the extra multiplication.

>
> Now the actual loop:
>         while (atomic_read(&mce_callin) != cpus) {
>                 if (mce_timed_out(&timeout)) {
>                         atomic_set(&global_nwo, 0);
>                         return -1;
>                 }
>                 ndelay(SPINUNIT);
>         }

And I missed the ndelay.

*sigh* reading comprehension wasn't so good there.

>From a debugging POV, it would be kind of nice to know which of the
four mce_timed_out calls is reporting the failure.  IIUC, the first
one would mean that a cpu didn't make it in to do_machine_check at
all, whereas the second one would mean that a cpu got lost after
entering do_machine_check.  Also, the second one presumably knows how
far it got (i.e. mce_executing and order).

It could also be interesting to tweak mce_panic to not actually panic
the machine but to try to return and stop the test instead.  Then real
debugging could be possible :)

I still wish I could test this.  The mce-inject code is completely
useless here, since it calls do_machine_check directly instead of
going through the MCE code.

>
> And the inside of mce_timed_out() ... we decrement our 1e9 by 100 for each call:
>
>         if ((s64)*t < SPINUNIT) {
>                 if (mca_cfg.tolerant <= 1)
>                         mce_panic("Timeout synchronizing machine check over CPUs",
>                                   NULL, NULL);
>                 cpu_missing = 1;
>                 return 1;
>         }
>         *t -= SPINUNIT;
>
> Not sure what the worst case is for waiting for other processors to show up. Perhaps it is
> when the other cpu is in some deep C-state and we have to wait for it to power the core
> back on and resync clock.  Maybe it is when a cpu has just started a "wbinvd" instruction
> with a cache entirely full of dirty lines that belong on some other processor so have to
> be written back across QPI (may when QPI links are congested with 10gbit network traffic?
>
> But a full second seems like a lot ... you adding a handful of cache and TLB misses shouldn't
> push us over the edge.

Agreed.

>
> -Tony
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/