linux-kernel - Re: filesystem corruption with "scsi: core: Reallocate device's budget map on queue depth change"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ba090f1b-a767-46a1-5728-82d9c587ef3c@opensource.wdc.com>
Date:   Thu, 31 Mar 2022 07:30:35 +0900
From:   Damien Le Moal <damien.lemoal@...nsource.wdc.com>
To:     Ming Lei <ming.lei@...hat.com>,
        James Bottomley <jejb@...ux.ibm.com>
Cc:     John Garry <john.garry@...wei.com>,
        Andrea Righi <andrea.righi@...onical.com>,
        Martin Wilck <martin.wilck@...e.com>,
        Bart Van Assche <bvanassche@....org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: filesystem corruption with "scsi: core: Reallocate device's
 budget map on queue depth change"

On 3/30/22 22:48, Ming Lei wrote:
> On Wed, Mar 30, 2022 at 09:31:35AM -0400, James Bottomley wrote:
>> On Wed, 2022-03-30 at 13:59 +0100, John Garry wrote:
>>> On 30/03/2022 12:21, Andrea Righi wrote:
>>>> On Wed, Mar 30, 2022 at 11:38:02AM +0100, John Garry wrote:
>>>>> On 30/03/2022 11:11, Andrea Righi wrote:
>>>>>> Hello,
>>>>>>
>>>>>> after this commit I'm experiencing some filesystem corruptions
>>>>>> at boot on a power9 box with an aacraid controller.
>>>>>>
>>>>>> At the moment I'm running a 5.15.30 kernel; when the filesystem
>>>>>> is mounted at boot I see the following errors in the console:
>>>
>>> About "scsi: core: Reallocate device's budget map on queue depth
>>> change" being added to a stable kernel, I am not sure if this was
>>> really a fix  or just a memory optimisation.
>>
>> I can see how it becomes the problem: it frees and allocates a new
>> bitmap across a queue freeze, but bits in the old one might still be in
>> use.  This isn't a problem except when they return and we now possibly
>> see a tag greater than we think we can allocate coming back. 
>> Presumably we don't check this and we end up doing a write to
>> unallocated memory.
>>
>> I think if you want to reallocate on queue depth reduction, you might
>> have to drain the queue as well as freeze it.
> 
> After queue is frozen, there can't be any in-flight request/scsi
> command, so the sbitmap is zeroed at that time, and safe to reallocate.
> 
> The problem is aacraid specific, since the driver has hard limit
> of 256 queue depth, see aac_change_queue_depth().

256 is the scsi hard limit per device... Any SAS drive has the same limit
by default since there is no way to know the max queue depth of a scsi
disk. So what is special about aacraid ?

> 
> 
> Thanks,
> Ming
> 


-- 
Damien Le Moal
Western Digital Research