lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YkRfrjgNpD+S2WpN@T590>
Date:   Wed, 30 Mar 2022 21:48:30 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     James Bottomley <jejb@...ux.ibm.com>
Cc:     John Garry <john.garry@...wei.com>,
        Andrea Righi <andrea.righi@...onical.com>,
        Martin Wilck <martin.wilck@...e.com>,
        Bart Van Assche <bvanassche@....org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's
 budget map on queue depth change"

On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote:
> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
> > On 30/03/2022 12:21, Andrea Righi wrote:
> > > On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
> > > > On 30/03/2022 11:11, Andrea Righi wrote:
> > > > > Hello,
> > > > > 
> > > > > after this commit I'm experiencing some filesystem corruptions
> > > > > at boot on a power9 box with an aacraid controller.
> > > > > 
> > > > > At the moment I'm running a 5.15.30 kernel; when the filesystem
> > > > > is mounted at boot I see the following errors in the console:
> > 
> > About "scsi: core: Reallocate device's budget map on queue depth
> > change" being added to a stable kernel, I am not sure if this was
> > really a fix  or just a memory optimisation.
> 
> I can see how it becomes the problem: it frees and allocates a new
> bitmap across a queue freeze, but bits in the old one might still be in
> use.  This isn't a problem except when they return and we now possibly
> see a tag greater than we think we can allocate coming back. 
> Presumably we don't check this and we end up doing a write to
> unallocated memory.
> 
> I think if you want to reallocate on queue depth reduction, you might
> have to drain the queue as well as freeze it.

After queue is frozen, there can't be any in-flight request/scsi
command, so the sbitmap is zeroed at that time, and safe to reallocate.

The problem is aacraid specific, since the driver has hard limit
of 256 queue depth, see aac_change_queue_depth().


Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ