lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220817163059.kigrvdfmxfswmhls@quack3>
Date:   Wed, 17 Aug 2022 18:30:59 +0200
From:   Jan Kara <jack@...e.cz>
To:     Chris Murphy <lists@...orremedies.com>
Cc:     Jan Kara <jack@...e.cz>,
        Holger Hoffstätte 
        <holger@...lied-asynchrony.com>,
        Nikolay Borisov <nborisov@...e.com>,
        Jens Axboe <axboe@...nel.dk>,
        Paolo Valente <paolo.valente@...aro.org>,
        Linux-RAID <linux-raid@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Josef Bacik <josef@...icpanda.com>
Subject: Re: stalling IO regression since linux 5.12, through 5.18

On Wed 17-08-22 11:09:26, Chris Murphy wrote:
> 
> 
> On Wed, Aug 17, 2022, at 7:49 AM, Jan Kara wrote:
> 
> >
> > Another thing worth trying is to compile the kernel without
> > CONFIG_BFQ_GROUP_IOSCHED. That will essentially disable cgroup support in
> > BFQ so we will see whether the problem may be cgroup related or not.
> 
> The problem happens with a 5.12.0 kernel built without
> CONFIG_BFQ_GROUP_IOSCHED.

Thanks for testing! Just to answer your previous question: This is
different from cgroup.disable=io because BFQ takes different code paths. So
this makes it even less likely this is some obscure BFQ bug. Why BFQ could
be different here from mq-deadline is that it artificially reduces device
queue depth (it sets shallow_depth when allocating new tags) and maybe that
triggers some bug in request tag allocation.

BTW, are you sure the first problematic kernel is 5.12? Because support for
shared tagsets was added to megaraid_sas driver in 5.11 (5.11-rc3 in
particular - commit 81e7eb5bf08f3 ("Revert "Revert "scsi: megaraid_sas:
Added support for shared host tagset for cpuhotplug"")) and that is one
candidate I'd expect to start to trigger issues. BTW that may be an
interesting thing to try: Can you boot with
"megaraid_sas.host_tagset_enable = 0" kernel option and see whether the
issue reproduces?

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ