lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZDRJfiOaS2bOxiT5@yury-laptop>
Date:   Mon, 10 Apr 2023 10:38:06 -0700
From:   Yury Norov <yury.norov@...il.com>
To:     "yebin (H)" <yebin10@...wei.com>
Cc:     Ye Bin <yebin@...weicloud.com>, dennis@...nel.org, tj@...nel.org,
        cl@...ux.com, linux-mm@...ck.org,
        andriy.shevchenko@...ux.intel.com, linux@...musvillemoes.dk,
        linux-kernel@...r.kernel.org, dchinner@...hat.com
Subject: Re: [PATCH 2/2] lib/percpu_counter: fix dying cpu compare race

On Tue, Apr 04, 2023 at 02:54:25PM +0800, yebin (H) wrote:
> 
> 
> On 2023/4/4 10:50, Yury Norov wrote:
> > On Tue, Apr 04, 2023 at 09:42:06AM +0800, Ye Bin wrote:
> > > From: Ye Bin <yebin10@...wei.com>
> > > 
> > > In commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") a race
> > > condition between a cpu dying and percpu_counter_sum() iterating online CPUs
> > > was identified.
> > > Acctually, there's the same race condition between a cpu dying and
> > > __percpu_counter_compare(). Here, use 'num_online_cpus()' for quick judgment.
> > > But 'num_online_cpus()' will be decreased before call 'percpu_counter_cpu_dead()',
> > > then maybe return incorrect result.
> > > To solve above issue, also need to add dying CPUs count when do quick judgment
> > > in __percpu_counter_compare().
> > Not sure I completely understood the race you are describing. All CPU
> > accounting is protected with percpu_counters_lock. Is it a real race
> > that you've faced, or hypothetical? If it's real, can you share stack
> > traces?
> > > Signed-off-by: Ye Bin <yebin10@...wei.com>
> > > ---
> > >   lib/percpu_counter.c | 11 ++++++++++-
> > >   1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> > > index 5004463c4f9f..399840cb0012 100644
> > > --- a/lib/percpu_counter.c
> > > +++ b/lib/percpu_counter.c
> > > @@ -227,6 +227,15 @@ static int percpu_counter_cpu_dead(unsigned int cpu)
> > >   	return 0;
> > >   }
> > > +static __always_inline unsigned int num_count_cpus(void)
> > This doesn't look like a good name. Maybe num_offline_cpus?
> > 
> > > +{
> > > +#ifdef CONFIG_HOTPLUG_CPU
> > > +	return (num_online_cpus() + num_dying_cpus());
> >                 ^                                    ^
> >           'return' is not a function. Braces are not needed
> > 
> > Generally speaking, a sequence of atomic operations is not an atomic
> > operation, so the above doesn't look correct. I don't think that it
> > would be possible to implement raceless accounting based on 2 separate
> > counters.
> Yes, there is indeed a concurrency issue with doing so here. But I saw that
> the process was first
> set up dying_mask and then reduce the number of online CPUs. The total
> quantity maybe is larger
> than the actual value and may fall back to a slow path.But this won't cause
> any problems.

This sounds like an implementation detail. If it will change in
future, your accounting will get broken.

If you think it's a consistent behavior and will be preserved in
future, then it must be properly commented in your patch.

Thanks,
Yury

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ