[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1701171059150.3495@nanos>
Date: Tue, 17 Jan 2017 11:05:46 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Stephane Eranian <eranian@...gle.com>
cc: zhouchengming <zhouchengming1@...wei.com>,
LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
"H. Peter Anvin" <hpa@...or.com>,
"ak@...ux.intel.com" <ak@...ux.intel.com>,
"Liang, Kan" <kan.liang@...el.com>,
David Carrillo Cisneros <davidcc@...gle.com>,
dave.hansen@...ux.intel.com, qiaonuohan@...wei.com,
guohanjun@...wei.com
Subject: Re: [PATCH] fix race caused by hyperthreads when online an offline
cpu
On Mon, 16 Jan 2017, Stephane Eranian wrote:
> On Mon, Jan 16, 2017 at 1:53 AM, zhouchengming
> <zhouchengming1@...wei.com> wrote:
> > On 2017/1/16 17:05, Thomas Gleixner wrote:
> >>
> >> On Mon, 16 Jan 2017, Zhou Chengming wrote:
> >>
> >> Can you please stop sending the same patch over and over every other day?
> >>
> >> Granted, things get forgotten, but sending a polite reminder after a week
> >> is definitely enough.
> >>
> >> Maintainers are not machines responding within a split second on every
> >> mail
> >> they get. And that patch is not so substantial that it justifies that kind
> >> of spam.
> >>
> >
> > Very sorry for the noise. We are just not sure this is the right fix because
> > it's
> > hard to reproduce.
> >
> I believe this is the right fixed. I tried it and instrumented the
> code to verify thread_id
> assignment. The problem is easy to reproduce.
>
> $ echo 0 >/sys/devices/system/cpu/cpu2/online
> $ echo 1 >/sys/devices/system/cpu/cpu2/online
>
> Normally on Haswell Desktop part, CPU2 gets thread_id 0 on boot, CPU6
> gets thread_id 1.
> If you offline CPU2 and bring it back in, it will get thread_id 1 and
> thus both sibling will point
> to the same exclusive state. The fix is, indeed, to check if the
> sibling is not already assigned 1,
> and if so to keep 0 for the CPU being online'd.
Right. So it's a simple static fully reproducible problem and not a race of
some sorts. I'll amend the changelog ....
Btw, this code has the hardcoded assumption two threads per core. So
anything which has more than two threads is broken vs. that exclusive
access. No idea whether that matters in practice, but I just noticed.
Thanks,
tglx
Powered by blists - more mailing lists