lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180314154312.GZ2943022@devbig577.frc2.facebook.com>
Date:   Wed, 14 Mar 2018 08:43:12 -0700
From:   Tejun Heo <tj@...nel.org>
To:     David Chen <tuxoko@...il.com>
Cc:     Shaohua Li <shli@...com>, lkml <linux-kernel@...r.kernel.org>
Subject: Re: blk_mq_freeze_queue hang and possible race in percpu-refcount

Hello, David.

On Tue, Mar 13, 2018 at 03:50:47PM -0700, David Chen wrote:
> ====
> CPU A                           CPU B
> -----                           -----
> percpu_ref_kill()               percpu_ref_tryget_live()
> {
>                                 if (__ref_is_percpu())
>   set __PERCPU_REF_DEAD;
>   __percpu_ref_switch_mode();
>    ^ sum up current percpu_count
>                                 this_cpu_inc(*percpu_count); <- this
> increment got leaked.
> 
> ====
> 
> So if later CPU B later does percpu_ref_put, it will cause ref->count
> to drop to -1.
> And thus causing the above hung task issue.
> 
> Do you think this theory is correct, or am I missing something?
> Please tell me what do you think.

The switching to atomic mode does something like the following.

1. Mark the refcnt so that __ref_is_percpu() is false.

2. Wait for RCU grace period so that everyone including
   percpu_ref_tryget_live() which has seen true __ref_is_percpu() is
   done with its operation.

3. Now that it knows nobody is operating on the assumption that the
   counter is in percpu mode, it adds up all the percpu counters.

So, provided there aren't some silly bugs, what you described
shouldn't happen.  Can you force the refcnt into atomic mode w/
PERCPU_REF_INIT_ATOMIC and see whether the problem persists?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ