lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3573548.kp1edD77Gq@merkaba>
Date:   Wed, 14 Mar 2018 12:01:21 +0100
From:   Martin Steigerwald <martin@...htvoll.de>
To:     Hans de Goede <hdegoede@...hat.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Tejun Heo <tj@...nel.org>
Subject: Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts

Hans de Goede - 11.03.18, 15:37:
> Hi Martin,
> 
> On 11-03-18 09:20, Martin Steigerwald wrote:
> > Hello.
> > 
> > Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> > with SMART checks occassionally failing like this:
> > 
> > smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending checks
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error updating
> > SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00 50
> > 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00
> > 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> > udisksd[24408]: Error performing housekeeping for drive
> > /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error updating
> > SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected sense
> > data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00 00
> > 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00 
> > 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> > 
> > (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad T520)
> > 
> > However when I then check manually with smartctl -a | -x | -H the device
> > reports SMART data just fine.
> > 
> > As smartd correctly detects that device is in sleep mode, this may be an
> > userspace issue in udisksd.
> > 
> > Also at some boot attempts the boot hangs with a message like "could not
> > connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> > on to LVs (each on one of the SSDs). A configuration that requires a
> > manual
> > adaption to InitRAMFS in order to boot (basically vgchange -ay before
> > btrfs device scan).
> > 
> > I wonder whether that has to do with the new SATA LPM policy stuff, but as
> > I had issues with
> > 
> >   3 => Medium power with Device Initiated PM enabled
> > 
> > (machine did not boot, which could also have been caused by me
> > accidentally
> > removing all TCP/IP network support in the kernel with that setting)
> > 
> > I set it back to
> > 
> > CONFIG_SATA_MOBILE_LPM_POLICY=0
> > 
> > (firmware settings)
> 
> Right, so at that settings the LPM policy changes are effectively
> disabled and cannot explain your SMART issues.
> 
> Still I would like to zoom in on this part of your bug report, because
> for Fedora 28 we are planning to ship with CONFIG_SATA_MOBILE_LPM_POLICY=3
> and AFAIK Ubuntu has similar plans.
> 
> I suspect that the issue you were seeing with
> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've attached
> a patch for you to test, which disabled LPM for your model Crucial SSD (but
> keeps it on for the Intel disk) if you can confirm that with that patch you
> can run with
> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.

With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system successfully 
booted three times in a row. So feel free to add tested-by.

Let´s see whether the blk_mq_terminate_expired or the smartd/udisks error 
messages reappear with rc5. I still think they are a different issue.

Thanks,
-- 
Martin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ