lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Dec 2021 19:04:47 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Dexuan Cui <decui@...rosoft.com>,
        "'ming.lei@...hat.com'" <ming.lei@...hat.com>,
        'Christoph Hellwig' <hch@....de>,
        "'linux-block@...r.kernel.org'" <linux-block@...r.kernel.org>
Cc:     Long Li <longli@...rosoft.com>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>
Subject: Re: Random high CPU utilization in blk-mq with the none scheduler

On 12/10/21 6:29 PM, Dexuan Cui wrote:
>> From: Dexuan Cui
>> Sent: Thursday, December 9, 2021 7:30 PM
>>
>> Hi all,
>> I found a random high CPU utilization issue with some database benchmark
>> program running on a 192-CPU virtual machine (VM). Originally the issue
>> was found with RHEL 8.4 and Ubuntu 20.04, and further tests show that the
>> issue also reproduces with the latest upstream stable kernel v5.15.7, but
>> *not* with v5.16-rc1. It looks like someone resolved the issue in v5.16-rc1
>> recently?
> 
> I did git-bisect on the linux-block tree's for-5.16/block branch and this patch
> resolves the random high CPU utilization issue (I'm not sure how):
> 	dc5fc361d891 ("block: attempt direct issue of plug list")
> 	https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.16/block&id=dc5fc361d891e089dfd9c0a975dc78041036b906
> 
> Do you think if it's easy to backport it to earlier versions like 5.10?
> It looks like there are a lot of prerequisite patches.

It's more likely the real fix is avoiding the repeated plug list scan,
which I guess makes sense. That is this commit:

commit d38a9c04c0d5637a828269dccb9703d42d40d42b
Author: Jens Axboe <axboe@...nel.dk>
Date:   Thu Oct 14 07:24:07 2021 -0600

    block: only check previous entry for plug merge attempt

If that's the case, try 5.15.x again and do:

echo 2 > /sys/block/<dev>/queue/nomerges

for each drive you are using in the IO test, and see if that gets
rid of the excess CPU usage.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ