lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c25e591-34d6-7c42-b3a8-dcde86643fe7@bytedance.com>
Date:   Thu, 22 Dec 2022 23:38:35 +0800
From:   hanjinke <hanjinke.666@...edance.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk,
        cgroups@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, yinxin.x@...edance.com
Subject: Re: [External] Re: [PATCH v2] blk-throtl: Introduce sync and async
 queues for blk-throtl



在 2022/12/22 下午9:39, Michal Koutný 写道:
> Hello Jinke.
> 
> On Wed, Dec 21, 2022 at 06:42:46PM +0800, Jinke Han <hanjinke.666@...edance.com> wrote:
>> In our test, fio writes a 100g file in sequential 4k blocksize in
>> a container with low bps limit configured (wbps=10M). More than 1200
>> ios were throttled in blk-throtl queue and the avarage throtle time
>> of each io is 140s. At the same time, the operation of saving a small
>> file by vim will be blocked amolst 140s. As a fsync will be send by vim,
>> the sync ios of fsync will be blocked by a huge amount of buffer write
>> ios ahead. This is also a priority inversion problem within one cgroup.
>> In the database scene, things got really bad with blk-throtle enabled
>> as fsync is called very often.
> 
> I'm trying to make sense of the numbers:
> - at 10 MB/s, it's 0.4 ms per 4k block
> - there are 1.2k throttled bios that gives waiting time of roughly 0.5s
>    ~ 0.4ms * 1200
> - you say that you observe 280 times longer throttling time,
> - that'd mean there should be 340k queued bios
>    - or cummulative dispatch of ~1400 MB of data
> 
Hi
Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s 
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00   11.00      0.00      8.01     0.00     0.00 
0.00   0.00    0.00    7.18   0.08     0.00   745.45   3.27   3.60

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s 
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00    8.00      0.00      9.14     0.00     0.00 
0.00   0.00    0.00    7.38   0.06     0.00  1170.00   2.62   2.10

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s 
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00   16.00      0.00     12.02     0.00    12.00 
0.00  42.86    0.00    7.25   0.12     0.00   769.25   2.06   3.30

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s 
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00   11.00      0.00     10.91     0.00     1.00 
0.00   8.33    0.00    6.82   0.07     0.00  1015.64   2.36   2.60

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s 
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdb              0.00   11.00      0.00      9.14     0.00     1.00 
0.00   8.33    0.00    6.27   0.07     0.00   850.91   2.55   2.80

I used bcc to trace the time of bio form submit_bio to blk_mq_submit_bio
and found the avarage time was nearly 140s(use bcc trace fsync duration 
also get the same result).
The iostat above seem the avaerage of each io nearly 1M, so I have rough 
estimate the num of the bio queued is 140s * 10 m / 1m.


> So what are the queued quantities? Are there more than 1200 bios or are
> they bigger than the 4k you mention?
> 
"fio writes a 100g file in sequential 4k blocksize"
Bios may be more than 1M as ext4 will merged continuously logic blocks 
when physical block also continuously.
> Thanks for clarification.
> 
> (I acknowledge the possible problem with a large population of async
> writes delaying scarce sync writes.)
> 
> Michal

If the 0.4ms oberved by iostat, the way to estimate the throtle time of 
the bio by 0.4ms * 1200 may not work as the 0.4 is duration of the 
request from alloc to done.

If the average size of bio is 1m, dispatch one bio should cost 1m/ 10M = 
100ms. The queue is fifo, so the average throtle time 100ms * 1400.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ