lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160121063039.3803.66.stgit@localhost.localdomain>
Date:	Wed, 20 Jan 2016 22:35:15 -0800
From:	Alexander Duyck <aduyck@...antis.com>
To:	jbottomley@...n.com, hare@...e.de, linux-scsi@...r.kernel.org
Cc:	alexander.duyck@...il.com, martin.petersen@...cle.com,
	linux-kernel@...r.kernel.org, shane.seymour@....com,
	jthumshirn@...e.de
Subject: [PATCH 0/2] scsi: Fix endless loop of ATA hard resets due to VPD
 reads

Recent changes to the kernel pulled in during the merge window have
resulted in my system generating an endless loop of the following type of
errors:

[  318.965756] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  318.968457] ata14.00: configured for UDMA/66
[  318.970656] ata14: EH complete
[  318.984366] ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[  318.986854] ata14.00: irq_stat 0x40000001
[  318.989138] ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 22 dma 16640 in
         Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
[  318.995986] ata14: hard resetting link

I bisected the issue and found the patch responsible for the issue was
commit 09e2b0b14690 "scsi: rescan VPD attributes".  This commit contained
several issues.

First, the commit had changed the behavior in terms of what devices we
called scsi_attach_vpd() for.  As a result we were calling it for devices
that didn't support a scsi_level of 6, SCSI 3, so VPD accesses could
result in errors.

Second, the commit as well as a follow-on patch for it contained a number
of RCU errors.  Specifically the code was structured such that we had
accesses outside of RCU locked regions, and repeated use of the RCU
protected pointer without using the proper accessors.  As such it was
possible to get into a serious corruption situation should a pointer be
updated.

Ultimately neither of these bugs were my root cause.  It turns out the
Marvel Console SCSI device in my system needed to have a flag set to
disable VPD access in order to keep things from looping through the error
repeatedly.  In order to resolve it I had to add the kernel parameter
"scsi_mod.dev_flags=Marvell:Console:0x4000000".  This allowed my system to
boot without any errors, however the first two issues described above are
still relevent so I thought I would provide the patches since I had already
written them up.

---

Alexander Duyck (2):
      scsi: Do not attach VPD to devices that don't support it
      scsi: Fix RCU handling for VPD pages


 drivers/scsi/scsi.c        |   55 ++++++++++++++++++++++++--------------------
 drivers/scsi/scsi_lib.c    |   12 +++++-----
 drivers/scsi/scsi_scan.c   |    3 +-
 drivers/scsi/scsi_sysfs.c  |   14 ++++++-----
 include/scsi/scsi_device.h |   14 +++++++----
 5 files changed, 54 insertions(+), 44 deletions(-)

--

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ