lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <ZBUHyXkdzViG2VmT@destitution>
Date:   Sat, 18 Mar 2023 11:37:29 +1100
From:   Dave Chinner <david@...morbit.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     linux-kernel@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-mm@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        yebin10@...wei.com
Subject: Re: [PATCH 2/4] pcpcntrs: fix dying cpu summation race

On Thu, Mar 16, 2023 at 07:36:18AM +0800, Hillf Danton wrote:
> On 15 Mar 2023 19:49:36 +1100 Dave Chinner <dchinner@...hat.com>
> > @@ -141,11 +141,20 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
> >  
> >  /*
> >   * Add up all the per-cpu counts, return the result.  This is a more accurate
> > - * but much slower version of percpu_counter_read_positive()
> > + * but much slower version of percpu_counter_read_positive().
> > + *
> > + * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
> > + * from CPUs that are in the process of being taken offline. Dying cpus have
> > + * been removed from the online mask, but may not have had the hotplug dead
> > + * notifier called to fold the percpu count back into the global counter sum.
> > + * By including dying CPUs in the iteration mask, we avoid this race condition
> > + * so __percpu_counter_sum() just does the right thing when CPUs are being taken
> > + * offline.
> >   */
> >  s64 __percpu_counter_sum(struct percpu_counter *fbc)
> >  {
> > -	return __percpu_counter_sum_mask(fbc, cpu_online_mask);
> > +
> > +	return __percpu_counter_sum_mask(fbc, cpu_dying_mask);
> >  }
> >  EXPORT_SYMBOL(__percpu_counter_sum);
> >  
> > -- 
> > 2.39.2
> 
> Hm... the window of the race between a dying cpu and the sum of percpu counter
> spotted in commit f689054aace2 is stil open after a text-book log message.
> 
> 	cpu 0			cpu 2
> 	---			---
> 	percpu_counter_sum() 	percpu_counter_cpu_dead()
> 
> 	raw_spin_lock_irqsave(&fbc->lock, flags);
> 	ret = fbc->count;
> 	for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
> 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
> 		ret += *pcount;
> 	}
> 	raw_spin_unlock_irqrestore(&fbc->lock, flags);
> 
> 				raw_spin_lock(&fbc->lock);
> 				pcount = per_cpu_ptr(fbc->counters, cpu);
> 				fbc->count += *pcount;
> 				*pcount = 0;
> 				raw_spin_unlock(&fbc->lock);

Their is no race condition updating fbc->count here - I explained
this in the cover letter. i.e. the sum in percpu_counter_sum() is to
a private counter and does not change fbc->count. Therefore we only
need/want to fold the dying cpu percpu count into fbc->count in the
CPU_DEAD callback.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ