linux-kernel - Re: [PATCH v2 1/2] blk-iocost: add refcounting for iocg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <875eb43e-202d-5b81-0bff-ef0434358d99@huaweicloud.com>
Date:   Mon, 9 Jan 2023 09:32:46 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     Tejun Heo <tj@...nel.org>, Yu Kuai <yukuai1@...weicloud.com>
Cc:     hch@...radead.org, josef@...icpanda.com, axboe@...nel.dk,
        cgroups@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
        "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 1/2] blk-iocost: add refcounting for iocg

Hi,

在 2023/01/07 4:18, Tejun Heo 写道:
> On Fri, Jan 06, 2023 at 09:08:45AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/01/06 2:32, Tejun Heo 写道:
>>> On Thu, Jan 05, 2023 at 09:14:07AM +0800, Yu Kuai wrote:
>>>> 1) is related to blkg, while 2) is not, hence refcnting from blkg can't
>>>> fix the problem. refcnting from blkcg_policy_data should be ok, but I
>>>> see that bfq already has the similar refcnting, while other policy
>>>> doesn't require such refcnting.
>>>
>>> Hmm... taking a step back, wouldn't this be solved by moving the first part
>>> of ioc_pd_free() to pd_offline_fn()? The ordering is strictly defined there,
>>> right?
>>>
>>
>> Moving first part to pd_offline_fn() has some requirements, like what I
>> did in the other thread:
>>
>> iocg can be activated again after pd_offline_fn(), which is possible
>> because bio can be dispatched when cgroup is removed. I tried to avoid
>> that by:
>>
>> 1) dispatch all throttled bio io ioc_pd_offline()
>> 2) don't throttle bio after ioc_pd_offline()
>>
>> However, you already disagreed with that. 😔
> 
> Okay, I was completely wrong while I was replying to your original patch.
> Should have looked at the code closer, my apologies.
> 
> What I missed is that pd_offline doesn't happen when the cgroup goes
> offline. Please take a look at the following two commits:
> 
>   59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished")
>   d866dbf61787 ("blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it")
> 

These two commits are applied for three years, I don't check the details
yet but they seem can't guarantee that no io will be handled by
rq_qos_throttle() after pd_offline_fn(), because I just reproduced this
in another problem:

f02be9002c48 ("block, bfq: fix null pointer dereference in bfq_bio_bfqg()")

User thread can issue async io, and io can be throttled by
blk-throttle(not writeback), then user thread can exit and cgroup can be
removed before such io is dispatched to rq_qos_throttle.

> After the above two commits, ->pd_offline_fn() is called only after all
> possible writebacks are complete, so it shouldn't allow mass escapes to
> root. With writebacks out of the picture, it might be that there can be no
> further IOs once ->pd_offline_fn() is called too as there can be no tasks
> left in it and no dirty pages, but best to confirm that.
> 
> So, yeah, the original approach you took should work although I'm not sure
> the patches that you added to make offline blkg to bypass are necessary
> (that also contributed to my assumption that there will be more IOs on those
> blkg's). Have you seen more IOs coming down the pipeline after offline? If
> so, can you dump some backtraces and see where they're coming from?

Currently I'm sure such IOs can come from blk-throttle, and I'm not sure
yet but I also suspect io_uring can do this.

Thanks,
Kuai