linux-kernel - Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y7hTHZQYsCX6EHIN@slm.duckdns.org>
Date:   Fri, 6 Jan 2023 06:58:05 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Jan Kara <jack@...e.cz>
Cc:     Michal Koutný <mkoutny@...e.com>,
        Jinke Han <hanjinke.666@...edance.com>, josef@...icpanda.com,
        axboe@...nel.dk, cgroups@...r.kernel.org,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        yinxin.x@...edance.com
Subject: Re: [PATCH v3] blk-throtl: Introduce sync and async queues for
 blk-throtl

Hello,

On Fri, Jan 06, 2023 at 04:38:13PM +0100, Jan Kara wrote:
> Generally, problems are this are taken care of by IO schedulers. E.g. BFQ
> has quite a lot of logic exactly to reduce problems like this. Sync and
> async queues are one part of this logic inside BFQ (but there's more).

With modern ssd's, even deadline's overhead is too high and a lot (but
clearly not all) of what the IO schedulers do are no longer necessary. I
don't see a good way back to elevators.

> But given current architecture of the block layer IO schedulers are below
> throttling frameworks such as blk-throtl so they have no chance of
> influencing problems like this. So we are bound to reinvent the scheduling
> logic IO schedulers are already doing. That being said I don't have a good
> solution for this or architecture suggestion. Because implementing various
> throttling frameworks within IO schedulers is cumbersome (complex
> interactions) and generally the perfomance is too slow for some usecases.
> We've been there (that's why there's cgroup support in BFQ) and really
> the current architecture is much easier to reason about.

Another layering problem w/ controlling from elevators is that that's after
request allocation and the issuer has already moved on. We used to have
per-cgroup rq pools but ripped that out, so it's pretty easy to cause severe
priority inversions by depleting the shared request pool, and the fact that
throttling takes place after the issuing task returned from issue path makes
propagating the throttling operation upwards more challenging too.

At least in terms of cgroup control, the new bio based behavior is a lot
better. In the fb fleet, iocost is deployed on most (virtually all) of the
machines and we don't see issues with severe priority inversions.
Cross-cgroup control is pretty well controlled. Inside each cgroup, sync
writes aren't prioritized but nobody seems to be troubled by that.

My bet is that inversion issues are a lot more severe with blk-throttle
because it's not work-conserving and not doing things like issue-as-root or
other measures to alleviate issues which can arise from inversions.

Jinke, is the case you described in the original email what you actually saw
in production or a simplified test case for demonstration? If the latter,
can you describe actual problems seen in production?

Thanks.

-- 
tejun