lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 14 Mar 2018 10:42:28 -0700
From:   David Chen <tuxoko@...il.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Shaohua Li <shli@...com>, lkml <linux-kernel@...r.kernel.org>
Subject: Re: blk_mq_freeze_queue hang and possible race in percpu-refcount

Hi Tejun,

Thanks, I see I missed the RCU part.
I'll try the force atomic thing.
Though so far I haven't been able to reproduce it yet.

Thanks,
David


2018-03-14 8:43 GMT-07:00 Tejun Heo <tj@...nel.org>:
> Hello, David.
>
> On Tue, Mar 13, 2018 at 03:50:47PM -0700, David Chen wrote:
>> ====
>> CPU A                           CPU B
>> -----                           -----
>> percpu_ref_kill()               percpu_ref_tryget_live()
>> {
>>                                 if (__ref_is_percpu())
>>   set __PERCPU_REF_DEAD;
>>   __percpu_ref_switch_mode();
>>    ^ sum up current percpu_count
>>                                 this_cpu_inc(*percpu_count); <- this
>> increment got leaked.
>>
>> ====
>>
>> So if later CPU B later does percpu_ref_put, it will cause ref->count
>> to drop to -1.
>> And thus causing the above hung task issue.
>>
>> Do you think this theory is correct, or am I missing something?
>> Please tell me what do you think.
>
> The switching to atomic mode does something like the following.
>
> 1. Mark the refcnt so that __ref_is_percpu() is false.
>
> 2. Wait for RCU grace period so that everyone including
>    percpu_ref_tryget_live() which has seen true __ref_is_percpu() is
>    done with its operation.
>
> 3. Now that it knows nobody is operating on the assumption that the
>    counter is in percpu mode, it adds up all the percpu counters.
>
> So, provided there aren't some silly bugs, what you described
> shouldn't happen.  Can you force the refcnt into atomic mode w/
> PERCPU_REF_INIT_ATOMIC and see whether the problem persists?
>
> Thanks.
>
> --
> tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ