lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y7hTHZQYsCX6EHIN@slm.duckdns.org>
Date:   Fri, 6 Jan 2023 06:58:05 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Jan Kara <jack@...e.cz>
Cc:     Michal Koutný <mkoutny@...e.com>,
        Jinke Han <hanjinke.666@...edance.com>, josef@...icpanda.com,
        axboe@...nel.dk, cgroups@...r.kernel.org,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        yinxin.x@...edance.com
Subject: Re: [PATCH v3] blk-throtl: Introduce sync and async queues for
 blk-throtl

Hello,

On Fri, Jan 06, 2023 at 04:38:13PM +0100, Jan Kara wrote:
> Generally, problems are this are taken care of by IO schedulers. E.g. BFQ
> has quite a lot of logic exactly to reduce problems like this. Sync and
> async queues are one part of this logic inside BFQ (but there's more).

With modern ssd's, even deadline's overhead is too high and a lot (but
clearly not all) of what the IO schedulers do are no longer necessary. I
don't see a good way back to elevators.

> But given current architecture of the block layer IO schedulers are below
> throttling frameworks such as blk-throtl so they have no chance of
> influencing problems like this. So we are bound to reinvent the scheduling
> logic IO schedulers are already doing. That being said I don't have a good
> solution for this or architecture suggestion. Because implementing various
> throttling frameworks within IO schedulers is cumbersome (complex
> interactions) and generally the perfomance is too slow for some usecases.
> We've been there (that's why there's cgroup support in BFQ) and really
> the current architecture is much easier to reason about.

Another layering problem w/ controlling from elevators is that that's after
request allocation and the issuer has already moved on. We used to have
per-cgroup rq pools but ripped that out, so it's pretty easy to cause severe
priority inversions by depleting the shared request pool, and the fact that
throttling takes place after the issuing task returned from issue path makes
propagating the throttling operation upwards more challenging too.

At least in terms of cgroup control, the new bio based behavior is a lot
better. In the fb fleet, iocost is deployed on most (virtually all) of the
machines and we don't see issues with severe priority inversions.
Cross-cgroup control is pretty well controlled. Inside each cgroup, sync
writes aren't prioritized but nobody seems to be troubled by that.

My bet is that inversion issues are a lot more severe with blk-throttle
because it's not work-conserving and not doing things like issue-as-root or
other measures to alleviate issues which can arise from inversions.

Jinke, is the case you described in the original email what you actually saw
in production or a simplified test case for demonstration? If the latter,
can you describe actual problems seen in production?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ