lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Oct 2021 09:09:03 +0800
From:   "yukuai (C)" <yukuai3@...wei.com>
To:     Michal Koutný <mkoutny@...e.com>
CC:     <tj@...nel.org>, <axboe@...nel.dk>, <cgroups@...r.kernel.org>,
        <linux-block@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <yi.zhang@...wei.com>
Subject: Re: [PATCH] blk-cgroup: check blkcg policy is enabled in
 blkg_create()

On 2021/10/11 23:23, Michal Koutný wrote:
> Hello.
> 
> On Fri, Oct 08, 2021 at 03:27:20PM +0800, Yu Kuai <yukuai3@...wei.com> wrote:
>> This is because blkg_alloc() is called from blkg_conf_prep() without
>> holding 'q->queue_lock', and elevator is exited before blkg_create():
>   
> IIUC the problematic interleaving is this one (I've noticed `blkg->pd[i]
> = NULL` to thread 2 call trace):

The new blkg will not add to blkg_list untill pd_init_fn() is done in
blkg_create(), thus blkcg_deactivate_policy() can't access this blkg.
> 
>> thread 1                            thread 2
>> blkg_conf_prep
>>   spin_lock_irq(&q->queue_lock);
>>   blkg_lookup_check -> return NULL
>>   spin_unlock_irq(&q->queue_lock);
>>
>>   blkg_alloc
>>    blkcg_policy_enabled -> true
>>    pd = ->pd_alloc_fn
>>                                     blk_mq_exit_sched
>>                                      bfq_exit_queue
>>                                       blkcg_deactivate_policy
>>                                        spin_lock_irq(&q->queue_lock);
>>                                        __clear_bit(pol->plid, q->blkcg_pols);
>>
>                                          pol->pd_free_fn(blkg->pd[i]);
>                                          blkg->pd[i] = NULL;
>>
>>                                        spin_unlock_irq(&q->queue_lock);
>>                                      q->elevator = NULL;
>      blkg->pd[i] = pd
>>    spin_lock_irq(&q->queue_lock);
>>     blkg_create
>>      if (blkg->pd[i])
>>       ->pd_init_fn -> q->elevator is NULL
>>    spin_unlock_irq(&q->queue_lock);
> 
> In high-level terms, is this a race between (blk)io controller attribute
> write and a device scheduler (elevator) switch?
> If so, I'd add it to the commit message.
> 
>> Fix the problem by checking that policy is still enabled in
>> blkg_create().
> 
> Is this sufficient wrt some other q->elevator users later?
> 
>> @@ -252,6 +266,9 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
>>   		goto err_free_blkg;
>>   	}
>>   
> 
> I'd add a comment here like:
> 
>> Re-check policies are still enabled, since the caller blkg_conf_prep()
>> temporarily drops q->queue_lock and we can race with
>> blk_mq_exit_sched() removing policies.

Thanks for your advice.

Best regards,
Kuai
> 
>> +	if (new_blkg)
>> +		blkg_check_pd(q, new_blkg);
>> +
> 
> Thanks,
> Michal
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ