[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWPzO1vz+Rcjs78B+wWWi4eu_g__-k9iCHH1G1Mc6af+A@mail.gmail.com>
Date: Mon, 17 Nov 2014 16:05:00 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...en8.de>, Andi Kleen <andi@...stfloor.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
X86 ML <x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace
On Mon, Nov 17, 2014 at 3:16 PM, Luck, Tony <tony.luck@...el.com> wrote:
>> I still wonder whether the timeout code is the real culprit. My patch
>> will slow down entry into do_machine_check by tens of cycles, several
>> cachelines, and possibly a couple of TLB misses. Given that the
>> timing seemed marginal to me, it's possible (albeit not that likely)
>> that it pushed the time needed for synchronization into the range of
>> unreliability.
>
> I think we have very conservative timeouts. Here are the significant bits from mce
>
> First SPINUNIT ... how many nanoseconds to spin "ndelay(SPINUNIT)" before pounding on an atomic op to see if we are done:
> #define SPINUNIT 100 /* 100ns */
>
> Initialization of our timeout - comment above this claims we are aiming for "one second"
>
> cfg->monarch_timeout = USEC_PER_SEC;
>
> Inside mce_start() - set up our timeout. Math says this will be 1e9
>
> u64 timeout = (u64)mca_cfg.monarch_timeout * NSEC_PER_USEC;
Aha. I missed the extra multiplication.
>
> Now the actual loop:
> while (atomic_read(&mce_callin) != cpus) {
> if (mce_timed_out(&timeout)) {
> atomic_set(&global_nwo, 0);
> return -1;
> }
> ndelay(SPINUNIT);
> }
And I missed the ndelay.
*sigh* reading comprehension wasn't so good there.
>From a debugging POV, it would be kind of nice to know which of the
four mce_timed_out calls is reporting the failure. IIUC, the first
one would mean that a cpu didn't make it in to do_machine_check at
all, whereas the second one would mean that a cpu got lost after
entering do_machine_check. Also, the second one presumably knows how
far it got (i.e. mce_executing and order).
It could also be interesting to tweak mce_panic to not actually panic
the machine but to try to return and stop the test instead. Then real
debugging could be possible :)
I still wish I could test this. The mce-inject code is completely
useless here, since it calls do_machine_check directly instead of
going through the MCE code.
>
> And the inside of mce_timed_out() ... we decrement our 1e9 by 100 for each call:
>
> if ((s64)*t < SPINUNIT) {
> if (mca_cfg.tolerant <= 1)
> mce_panic("Timeout synchronizing machine check over CPUs",
> NULL, NULL);
> cpu_missing = 1;
> return 1;
> }
> *t -= SPINUNIT;
>
> Not sure what the worst case is for waiting for other processors to show up. Perhaps it is
> when the other cpu is in some deep C-state and we have to wait for it to power the core
> back on and resync clock. Maybe it is when a cpu has just started a "wbinvd" instruction
> with a cache entirely full of dirty lines that belong on some other processor so have to
> be written back across QPI (may when QPI links are congested with 10gbit network traffic?
>
> But a full second seems like a lot ... you adding a handful of cache and TLB misses shouldn't
> push us over the edge.
Agreed.
>
> -Tony
>
>
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists