[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45cdf1c2-9056-4ac2-8e4d-4f07996a9267@kernel.org>
Date: Wed, 7 Aug 2024 11:26:46 -0700
From: Damien Le Moal <dlemoal@...nel.org>
To: Christian Heusel <christian@...sel.eu>, Igor Pylypiv
<ipylypiv@...gle.com>, Niklas Cassel <cassel@...nel.org>,
linux-ide@...r.kernel.org
Cc: Hannes Reinecke <hare@...e.de>, regressions@...ts.linux.dev,
stable@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION][BISECTED][STABLE] hdparm errors since 28ab9769117c
On 2024/08/07 10:23, Christian Heusel wrote:
> Hello Igor, hello Niklas,
>
> on my NAS I am encountering the following issue since v6.6.44 (LTS),
> when executing the hdparm command for my WD-WCC7K4NLX884 drives to get
> the active or standby state:
>
> $ hdparm -C /dev/sda
> /dev/sda:
> SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> drive state is: unknown
>
>
> While the expected output is the following:
>
> $ hdparm -C /dev/sda
> /dev/sda:
> drive state is: active/idle
>
> I did a bisection within the stable series and found the following
> commit to be the first bad one:
>
> 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
>
> According to kernel.dance the same commit was also backported to the
> v6.10.3 and v6.1.103 stable kernels and I could not find any commit or
> pending patch with a "Fixes:" tag for the offending commit.
>
> So far I have not been able to test with the mainline kernel as this is
> a remote device which I couldn't rescue in case of a boot failure. Also
> just for transparency it does have the out of tree ZFS module loaded,
> but AFAIU this shouldn't be an issue here, as the commit seems clearly
> related to the error. If needed I can test with an untainted mainline
> kernel on Friday when I'm near the device.
>
> I have attached the output of hdparm -I below and would be happy to
> provide further debug information or test patches.
I confirm this, using 6.11-rc2. The problem is actually hdparm code which
assumes that the sense data is in descriptor format without ever looking at the
D_SENSE bit to verify that. So commit 28ab9769117c reveals this issue because as
its title explains, it (correctly) honors D_SENSE instead of always generating
sense data in descriptor format.
Hmm... This is annoying. The kernel is fixed to be spec compliant but that
breaks old/non-compliant applications... We definitely should fix hdparm code,
but I think we still need to revert 28ab9769117c...
Niklas, Igor, thoughts ?
>
> Cheers,
> Christian
>
> ---
>
> #regzbot introduced: 28ab9769117c
> #regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives
>
> ---
>
> $ pacman -Q hdparm
> hdparm 9.65-2
>
> $ hdparm -I /dev/sda
>
> /dev/sda:
>
> ATA device, with non-removable media
> Model Number: WDC WD40EFRX-68N32N0
> Serial Number: WD-WCC7K4NLX884
> Firmware Revision: 82.00A82
> Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
> Standards:
> Used: unknown (minor revision code 0x006d)
> Supported: 10 9 8 7 6 5
> Likely used: 10
> Configuration:
> Logical max current
> cylinders 16383 0
> heads 16 0
> sectors/track 63 0
> --
> LBA user addressable sectors: 268435455
> LBA48 user addressable sectors: 7814037168
> Logical Sector size: 512 bytes
> Physical Sector size: 4096 bytes
> Logical Sector-0 offset: 0 bytes
> device size with M = 1024*1024: 3815447 MBytes
> device size with M = 1000*1000: 4000787 MBytes (4000 GB)
> cache/buffer size = unknown
> Form Factor: 3.5 inch
> Nominal Media Rotation Rate: 5400
> Capabilities:
> LBA, IORDY(can be disabled)
> Queue depth: 32
> Standby timer values: spec'd by Standard, with device specific minimum
> R/W multiple sector transfer: Max = 16 Current = 16
> DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
> Cycle time: min=120ns recommended=120ns
> PIO: pio0 pio1 pio2 pio3 pio4
> Cycle time: no flow control=120ns IORDY flow control=120ns
> Commands/features:
> Enabled Supported:
> * SMART feature set
> Security Mode feature set
> * Power Management feature set
> * Write cache
> * Look-ahead
> * Host Protected Area feature set
> * WRITE_BUFFER command
> * READ_BUFFER command
> * NOP cmd
> * DOWNLOAD_MICROCODE
> Power-Up In Standby feature set
> * SET_FEATURES required to spinup after power up
> SET_MAX security extension
> * 48-bit Address feature set
> * Device Configuration Overlay feature set
> * Mandatory FLUSH_CACHE
> * FLUSH_CACHE_EXT
> * SMART error logging
> * SMART self-test
> * General Purpose Logging feature set
> * 64-bit World wide name
> * IDLE_IMMEDIATE with UNLOAD
> * WRITE_UNCORRECTABLE_EXT command
> * {READ,WRITE}_DMA_EXT_GPL commands
> * Segmented DOWNLOAD_MICROCODE
> * Gen1 signaling speed (1.5Gb/s)
> * Gen2 signaling speed (3.0Gb/s)
> * Gen3 signaling speed (6.0Gb/s)
> * Native Command Queueing (NCQ)
> * Host-initiated interface power management
> * Phy event counters
> * Idle-Unload when NCQ is active
> * NCQ priority information
> * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
> * DMA Setup Auto-Activate optimization
> * Device-initiated interface power management
> * Software settings preservation
> * SMART Command Transport (SCT) feature set
> * SCT Write Same (AC2)
> * SCT Error Recovery Control (AC3)
> * SCT Features Control (AC4)
> * SCT Data Tables (AC5)
> unknown 206[12] (vendor specific)
> unknown 206[13] (vendor specific)
> * DOWNLOAD MICROCODE DMA command
> * WRITE BUFFER DMA command
> * READ BUFFER DMA command
> Security:
> Master password revision code = 65534
> supported
> not enabled
> not locked
> frozen
> not expired: security count
> supported: enhanced erase
> 504min for SECURITY ERASE UNIT. 504min for ENHANCED SECURITY ERASE UNIT.
> Logical Unit WWN Device Identifier: 50014ee2647735a1
> NAA : 5
> IEEE OUI : 0014ee
> Unique ID : 2647735a1
> Checksum: correct
--
Damien Le Moal
Western Digital Research
Powered by blists - more mailing lists