lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 11 Sep 2018 12:51:32 +0200
From:   "Rafael J. Wysocki" <rjw@...ysocki.net>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Takashi Iwai <tiwai@...e.de>
Cc:     "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        James Wang <jnwang@...e.com>,
        Borislav Petkov <bpetkov@...e.de>,
        linux-kernel@...r.kernel.org, Pingfan Liu <kernelfans@...il.com>
Subject: Re: [REGRESSION] Errors at reboot after 722e5f2b1eec

On Tuesday, September 11, 2018 11:33:24 AM CEST Greg Kroah-Hartman wrote:
> On Tue, Sep 11, 2018 at 10:17:44AM +0200, Takashi Iwai wrote:
> > [ seems like my previous post didn't go out properly; if you have
> >   already received it, please discard this one ]
> 
> Sorry, I got it, it's just in my large queue :(
> 
> > Hi Rafael, Greg,
> > 
> > James Wang reported on SUSE bugzilla that his machine spews many
> > AMD-Vi errors at reboot like:
> > 
> > [  154.907879] systemd-shutdown[1]: Detaching loop devices.
> > [  154.954583] kvm: exiting hardware virtualization
> > [  154.999953] usb 5-2: USB disconnect, device number 2
> > [  155.025278] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.081360] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.136778] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.191772] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.247055] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.302614] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.358996] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.392155] usb 4-2: new full-speed USB device number 2 using ohci-pci
> > [  155.413752] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.413762] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.560307] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.616039] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.667843] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.719497] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.772697] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.823919] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.875490] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.927258] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.979318] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  156.031813] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  156.084293] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  156.272157] reboot: Restarting system
> > [  156.290316] reboot: machine restart
> > 
> > And, James bisected and spotted that it's introduced by the commit
> > 722e5f2b1eec ("driver core: Partially revert "driver core: correct
> > device's shutdown order"").  Reverting the commit fixes the problem.

Well, has anyone tried to understand why this is so?

It looks like the probe-time reordering of the devices_kset list worked around
some init-time dependency issue, but we can't reorder devices_kset then as it
breaks parent-child ordering in general.

> > He mentioned about Uncorrectable Machine Check Exception seen at
> > shutdown, too, where it doesn't appear after the revert.  (Though,
> > it's not sure whether it's really relevant.)
> > 
> > The errors are clearly related with the USB device (a KVM device,
> > IIRC), and the errors are not seen if the USB device is disconnected.
> > 
> > We experienced this at first with SLE15 kernel (4.12 with backports),
> > but later the same issue was confirmed on 4.18.y and 4.19-rc2.  Also,
> > it's confirmed that revert works on the upstream kernels, too.
> > 
> > Does this hit your radar?
> 
> Ugh, no, I haven't heard of this before, Rafael?
> 
> So the need for the revert fixes some machines, but others need the
> patch, this isn't going to be fun :(

We need to understand what's going on on the machines that stopped working
and fix them.

Calling devices_kset_move_last() from really_probe() is clearly incorrect
and restoring it would be a mistake IMO.

BTW, there is a series of patches from Pingfan Liu:

https://patchwork.kernel.org/project/linux-pm/list/?series=9535

that may help in principle, so any chance to try them on the affected
systems?

Thanks,
Rafael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ