lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200709163455.GA23821@mellanox.com>
Date:   Thu, 9 Jul 2020 13:34:55 -0300
From:   Jason Gunthorpe <jgg@...lanox.com>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Doug Ledford <dledford@...hat.com>,
        Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
        Linux ACPI <linux-acpi@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 11/12] PM, libnvdimm: Add 'mem-quiet' state and
 callback for firmware activation

On Thu, Jul 09, 2020 at 09:10:06AM -0700, Dan Williams wrote:
> On Thu, Jul 9, 2020 at 8:39 AM Jason Gunthorpe <jgg@...lanox.com> wrote:
> >
> > On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote:
> > > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote:
> > > > The runtime firmware activation capability of Intel NVDIMM devices
> > > > requires memory transactions to be disabled for 100s of microseconds.
> > > > This timeout is large enough to cause in-flight DMA to fail and other
> > > > application detectable timeouts. Arrange for firmware activation to be
> > > > executed while the system is "quiesced", all processes and device-DMA
> > > > frozen.
> > > >
> > > > It is already required that invoking device ->freeze() callbacks is
> > > > sufficient to cease DMA. A device that continues memory writes outside
> > > > of user-direction violates expectations of the PM core to be to
> > > > establish a coherent hibernation image.
> > > >
> > > > That said, RDMA devices are an example of a device that access memory
> > > > outside of user process direction.
> >
> > Are you saying freeze doesn't work for some RDMA drivers? That would
> > be a driver bug, I think.
> 
> Right, it's more my hunch than a known bug at this point, but in my
> experience with testing server class hardware when I've reported a
> power management bugs I've sometimes got the incredulous response "who
> suspends / hibernates servers!?". I can drop that comment.
> 
> Are there protocol timeouts that might need to be adjusted for a 100s
> of microseconds blip in memory controller response?

Survivability depends alot on HW support, it has to suspend, not
discard DMAs that it needs to issue. Most likely things are as you
say, and HW doesn't support safe short time suspend. The usual use of
PM stuff here is to make the machine ready for kexec

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ