[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1AE640813FDE7649BE1B193DEA596E8802712130@SHSMSX101.ccr.corp.intel.com>
Date: Thu, 30 Apr 2015 08:05:12 +0000
From: "Zheng, Lv" <lv.zheng@...el.com>
To: Borislav Petkov <bp@...en8.de>
CC: linux-edac <linux-edac@...r.kernel.org>,
Jiri Kosina <jkosina@...e.cz>, Borislav Petkov <bp@...e.de>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
"Len Brown" <lenb@...nel.org>, "Luck, Tony" <tony.luck@...el.com>,
Tomasz Nowicki <tomasz.nowicki@...aro.org>,
"Chen, Gong" <gong.chen@...ux.intel.com>,
Wolfram Sang <wsa@...-dreams.de>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
Hi,
> From: Borislav Petkov [mailto:bp@...en8.de]
> Sent: Wednesday, April 29, 2015 4:14 PM
> Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
>
> On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote:
> > > > We absolutely want to use atomic_add_unless() because we get to save us
> > > > the expensive
> > > >
> > > > LOCK; CMPXCHG
> > > >
> > > > if the value was already 1. Which is exactly what this patch is trying
> > > > to avoid - a thundering herd of cores CMPXCHGing a global variable.
> > >
> > > IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
> > > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.
>
> Even if CMPXCHG is being split into several microops, they all still
> need to flow down the pipe and require resources and tracking. And you
> only know at retire time what the CMP result is and can "discard" the
> XCHG part. Provided the uarch is smart enough to do that.
>
> This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on
> uarch and vendor, if I can trust Agner Fog's tables. And I bet those
> numbers are best-case only and in real-life they probably tend to fall
> out even worse.
>
> CMP needs only 1. On almost every uarch and vendor. And even that cycle
> probably gets hidden with a good branch predictor.
Are there any such data around the SC and LL (MIPS)?
>
> > If you man the LOCK prefix, I understand now.
>
> And that makes several times worse: 22, 40, 80, ... cycles.
I'm OK if the code still keeps the readability then.
Thanks and best regards
-Lv
>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --
Powered by blists - more mailing lists