linux-kernel - Re: [PATCH] scsi: sd: add runtime pm to open / release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31f1ec62-7047-a34b-fdcb-5ea2a2104292@interlog.com>
Date:   Wed, 29 Jul 2020 14:10:04 -0400
From:   Douglas Gilbert <dgilbert@...erlog.com>
To:     Alan Stern <stern@...land.harvard.edu>,
        Martin Kepplinger <martin.kepplinger@...i.sm>
Cc:     Bart Van Assche <bvanassche@....org>, jejb@...ux.ibm.com,
        Can Guo <cang@...eaurora.org>, martin.petersen@...cle.com,
        linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel@...i.sm
Subject: Re: [PATCH] scsi: sd: add runtime pm to open / release

On 2020-07-29 10:32 a.m., Alan Stern wrote:
> On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote:
>> On 28.07.20 22:02, Alan Stern wrote:
>>> On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger wrote:
>>>> Hi Alan,
>>>>
>>>> Any API cleanup is of course welcome. I just wanted to remind you that
>>>> the underlying problem: broken block device runtime pm. Your initial
>>>> proposed fix "almost" did it and mounting works but during file access,
>>>> it still just looks like a runtime_resume is missing somewhere.
>>>
>>> Well, I have tested that proposed fix several times, and on my system
>>> it's working perfectly.  When I stop accessing a drive it autosuspends,
>>> and when I access it again it gets resumed and works -- as you would
>>> expect.
>>
>> that's weird. when I mount, everything looks good, "sda1". But as soon
>> as I cd to the mountpoint and do "ls" (on another SD card "ls" works but
>> actual file reading leads to the exact same errors), I get:
>>
>> [   77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result:
>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s
>> [   77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current]
>> [   77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0
>> [   77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 60
>> 40 00 00 01 00
> 
> This error report comes from the SCSI layer, not the block layer.

SCSI's first 11 byte command! I'm guessing the first byte is being
repeated and it's actually:
     28 00 00 00 60 40 00 00 01 00  [READ(10)]

That should be fixed. It should be something like: "...CDB in hex: 28 00 ...".

Doug Gilbert

>> [   77.474678] blk_update_request: I/O error, dev sda, sector 24640 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   77.485836] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.491628] blk_update_request: I/O error, dev sda, sector 24641 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   77.502275] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.508051] blk_update_request: I/O error, dev sda, sector 24642 op
>> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
>> [   77.518651] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (...)
>> [   77.947653] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.953434] FAT-fs (sda1): Directory bread(block 16448) failed
>> [   77.959333] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.965118] FAT-fs (sda1): Directory bread(block 16449) failed
>> [   77.971014] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [   77.976802] FAT-fs (sda1): Directory bread(block 16450) failed
>> [   77.982698] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> (...)
>> [   78.384929] FAT-fs (sda1): Filesystem has been set read-only
>> [  103.070973] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [  103.076751] print_req_error: 118 callbacks suppressed
>> [  103.076760] blk_update_request: I/O error, dev sda, sector 9748 op
>> 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
>> [  103.087428] Buffer I/O error on dev sda1, logical block 1556, lost
>> async page write
>> [  103.095309] sd 0:0:0:0: [sda] tag#0 device offline or changed
>> [  103.101123] blk_update_request: I/O error, dev sda, sector 17162 op
>> 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
>> [  103.111883] Buffer I/O error on dev sda1, logical block 8970, lost
>> async page write
> 
> I can't tell why you're getting that error.  In one of my tests the
> device returned the same kind of error status (Sense Key = 6, ASC =
> 0x28) but the operation was then retried successfully.  Perhaps the
> problem lies in the device you are testing.
> 
>>>> As we need to have that working at some point, I might look into it, but
>>>> someone who has experience in the block layer can surely do it more
>>>> efficiently.
>>>
>>> I suspect that any problems you still face are caused by something else.
>>>
>>
>> I then formatted sda1 to ext2 (on the runtime suspend system testing
>> your patch) and that seems to have worked!
>>
>> Again accessing the mountpoint then yield the very same "device offline
>> or changed" errors.
>>
>> What kind of device are you testing? You should be easily able to
>> reproduce this using an "sd" device.
> 
> I tested two devices: a SanDisk Cruzer USB flash drive and a
> g-mass-storage gadget running under dummy-hcd.  They each showed up as
> /dev/sdb on my system.
> 
> I haven't tried testing with an SD card.  If you have any specific
> sequence of commands you would like me to run, let me know.
> 
>> The problems must lie in the different other drivers we use I guess.
> 
> Or the devices.  Have you tried testing with a USB flash drive?
> 
> Alan Stern
>