[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191225180824.bql2o5whougii4ch@pali>
Date: Wed, 25 Dec 2019 19:08:24 +0100
From: Pali Rohár <pali.rohar@...il.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Christoph Hellwig <hch@....de>, Tejun Heo <tj@...nel.org>,
linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Broken sata_nv since 4.19
On Wednesday 25 December 2019 08:00:05 Jens Axboe wrote:
> On 12/25/19 7:01 AM, Pali Rohár wrote:
> > Hi Jens, can you please try to look at this problem?
> >
> > On Wednesday 25 December 2019 12:05:31 Pali Rohár wrote:
> >> On Tuesday 24 December 2019 18:35:29 Pali Rohár wrote:
> >>> Hello!
> >>>
> >>> I upgraded machine with NVIDIA SATA controller (nforce4 chipse) from
> >>> Debian Stretch to Debian Buster and SATA disks started to have problems.
> >>> I booted back to Debian Stretch kernel version (having userspace
> >>> untouched in Buster) and everything was like before, so problem is 100%
> >>> kernel related. Problematic is APM support (it does not work at all),
> >>> HPA support (kernel show warnings at boot time) and whole booting is
> >>> delayed by 10 seconds. Also broken is disk speed test.
> >>>
> >>> SATA controller is using sata_nv.ko kernel driver and in lspci is
> >>> identified as:
> >>>
> >>> 00:07.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0054] (rev f3)
> >>> 00:08.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0055] (rev f3)
> >>>
> >>> Debian Stretch has kernel version (which is working fine):
> >>>
> >>> 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
> >>>
> >>> Debian Buster has kernel version (which is problematic):
> >>>
> >>> 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
> >>>
> >>> So kernel regression happened somewhere between 4.9 and 4.19 versions.
> >>>
> >>> APM on Stretch:
> >>>
> >>> $ sudo hdparm -B /dev/sda
> >>>
> >>> /dev/sda:
> >>> APM_level = not supported
> >>>
> >>> $ sudo hdparm -B /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> APM_level = off
> >>>
> >>> APM on Buster:
> >>>
> >>> $ sudo hdparm -B /dev/sda
> >>>
> >>> /dev/sda:
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> APM_level = not supported
> >>>
> >>>
> >>> $ sudo hdparm -B /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> APM_level = not supported
> >>>
> >>> /dev/sda does not support APM, but /dev/sdb supports. I do not
> >>> understand what above SG_IO error means, but because it works fine on
> >>> older kernel version, it is not hardware problem.
> >>>
> >>> Disk speed test on Stretch:
> >>>
> >>> $ sudo hdparm -Tt /dev/sda
> >>>
> >>> /dev/sda:
> >>> Timing cached reads: 118 MB in 2.00 seconds = 58.91 MB/sec
> >>> Timing buffered disk reads: 116 MB in 3.09 seconds = 37.54 MB/sec
> >>>
> >>> $ sudo hdparm -Tt /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> Timing cached reads: 1242 MB in 2.00 seconds = 620.93 MB/sec
> >>> Timing buffered disk reads: 388 MB in 3.00 seconds = 129.31 MB/sec
> >>>
> >>> Disk speed test on Buster:
> >>>
> >>> $ sudo hdparm -Tt /dev/sda
> >>>
> >>> /dev/sda:
> >>> read() hit EOF - device too small
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> Timing buffered disk reads: read() hit EOF - device too small
> >>>
> >>> $ sudo hdparm -Tt /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> read() hit EOF - device too small
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> Timing buffered disk reads: read() hit EOF - device too small
> >>>
> >>> As can be seen disk speed test is completely broken on new kernel
> >>> version and hdparm returns same error as for APM.
> >>>
> >>> dmesg output on Stretch:
> >>>
> >>> [ 1.716970] sata_nv 0000:00:07.0: version 3.5
> >>> [ 1.717309] sata_nv 0000:00:07.0: Using ADMA mode
> >>> [ 1.717358] sata_nv 0000:00:07.0: Using MSI
> >>> [ 1.717810] scsi host0: sata_nv
> >>> [ 1.717954] scsi host1: sata_nv
> >>> [ 1.718016] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> >>> [ 1.718024] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> >>> [ 1.718308] sata_nv 0000:00:08.0: Using ADMA mode
> >>> [ 1.718345] sata_nv 0000:00:08.0: Using MSI
> >>> [ 1.718757] scsi host2: sata_nv
> >>> [ 1.718886] scsi host3: sata_nv
> >>> [ 2.192111] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 2.194691] ata1.00: HPA detected: current 976771055, native 976773168
> >>> [ 2.194701] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> >>> [ 2.194709] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 31/32)
> >>> [ 2.199241] ata1.00: configured for UDMA/133
> >>> [ 2.199501] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>> [ 2.708030] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 2.710442] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >>> [ 2.710455] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> >>> [ 2.715066] ata2.00: configured for UDMA/133
> >>> [ 2.715333] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>>
> >>> dmesg output on Buster:
> >>>
> >>> [ 2.079293] sata_nv 0000:00:07.0: version 3.5
> >>> [ 2.133503] sata_nv 0000:00:07.0: Using ADMA mode
> >>> [ 2.137138] sata_nv 0000:00:07.0: Using MSI
> >>> [ 2.142043] scsi host0: sata_nv
> >>> [ 2.174745] scsi host2: sata_nv
> >>> [ 2.178329] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> >>> [ 2.181675] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> >>> [ 2.188680] sata_nv 0000:00:08.0: Using ADMA mode
> >>> [ 2.215676] sata_nv 0000:00:08.0: Using MSI
> >>> [ 2.219649] scsi host4: sata_nv
> >>> [ 2.226626] scsi host5: sata_nv
> >>> [ 2.657732] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 7.773692] ata1.00: qc timeout (cmd 0x27)
> >>> [ 7.773738] ata1.00: failed to read native max address (err_mask=0x4)
> >>> [ 7.773785] ata1.00: HPA support seems broken, skipping HPA handling
> >>> [ 8.245678] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 8.248009] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> >>> [ 8.248065] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 32)
> >>> [ 8.252593] ata1.00: configured for UDMA/133
> >>> [ 8.252964] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>> [ 8.725693] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 13.917688] ata2.00: qc timeout (cmd 0x27)
> >>> [ 13.920096] ata2.00: failed to read native max address (err_mask=0x4)
> >>> [ 13.922491] ata2.00: HPA support seems broken, skipping HPA handling
> >>> [ 14.393683] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 14.398360] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >>> [ 14.400813] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> >>> [ 14.407722] ata2.00: configured for UDMA/133
> >>> [ 14.412939] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>>
> >>> As can be seen new kernel has problems with handling of both SATA
> >>> controllers and disks HPA area. Plus before kernel prints
> >>> "qc timeout (cmd 0x27)" there is nothing on output, seems that kernel
> >>> waits until 5s timeout occur and it slow down booting by 10s.
> >>>
> >>> Do you have any idea what is happening there? What those SG_IO errors
> >>> or dmesg errors means?
> >>>
> >>> I'm CCing all people who touched sata_nv.c file between 4.9 and 4.19
> >>> versions, so maybe somebody would know anything about this problem.
> >>>
> >>> If you need more information or other outputs, please let me know and I
> >>> can provide it.
> >>
> >> Now I tested also versions 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17 and
> >> 4.18. And problem appeared only in 4.18 (all previous versions work
> >> fine) In 4.18 dmesg is:
> >>
> >> [ 8.596039] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >> [ 8.598489] ata2: illegal qc_active transition (100000000->100000001)
> >> [ 13.792086] ata2.00: qc timeout (cmd 0x27)
> >> [ 13.792122] ata2.00: failed to read native max address (err_mask=0x4)
> >> [ 13.792167] ata2.00: HPA support seems broken, skipping HPA handling
> >> [ 14.264041] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >> [ 14.268756] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >> [ 14.271230] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> >> [ 14.278161] ata2.00: configured for UDMA/133
> >> [ 14.283427] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>
> >> (There is another line "illegal qc_active transition" which is not
> >> present in 4.19)
> >>
> >> So this problem must have been introduced during 4.18 release cycle as
> >> 4.17 version is working fine.
> >
> > I tried to git bisect this problem between 4.17 and 4.18. I used
> > following command to filter relevant libata.ko and sata_nv.ko modules:
> >
> > $ git bisect start v4.18 v4.17 -- drivers/ata/libata* drivers/ata/sata_nv.c
> >
> > And here are results:
> >
> > sata_nv.ko and libata.ko compiled from commit 804689 (libata: Fix
> > command retry decision) for 4.17 kernel are working fine.
> >
> > sata_nv.ko and libata.ko compiled from commit e3ed89 (libata: bump
> > ->qc_active to a 64-bit type) for 4.18 kernel are broken.
> >
> > So problem seems to be somehow related with introduction of hardware
> > tags done by Jens Axboe.
>
> Can you try with this patch:
>
> https://lore.kernel.org/linux-ide/20191213080408.27032-1-s.hauer@pengutronix.de/T/#u
>
Great! This patch fixes above problem with sata_nv.ko. Thank you for
quick response.
--
Pali Rohár
pali.rohar@...il.com
Download attachment "signature.asc" of type "application/pgp-signature" (196 bytes)
Powered by blists - more mailing lists