lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 5 Jun 2019 22:01:27 +0000
From:   Trond Myklebust <trondmy@...merspace.com>
To:     "jonathanh@...dia.com" <jonathanh@...dia.com>,
        "Anna.Schumaker@...app.com" <Anna.Schumaker@...app.com>
CC:     "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as
 TIMER_DEFERRABLE (431235818bc3)

On Wed, 2019-06-05 at 09:40 +0100, Jon Hunter wrote:
> Hi Trond,
> 
> I have been noticing intermittent failures with a system suspend test
> on
> some of our machines that have a NFS mounted root file-system.
> Bisecting
> this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC
> timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3
> does
> appear to resolve the problem.
> 
> The cause of the suspend failure appears to be a long delay observed
> sometimes when resuming from suspend, and this is causing our test to
> timeout. For example, in a failing case I see something like the
> following ...
> 
> [   69.667385] PM: suspend entry (deep)
> 
> [   69.675642] Filesystems sync: 0.000 seconds
> 
> [   69.684983] Freezing user space processes ... (elapsed 0.001
> seconds) done.
> 
> [   69.697880] OOM killer disabled.
> 
> [   69.705670] Freezing remaining freezable tasks ... (elapsed 0.001
> seconds) done.
> 
> [   69.719043] printk: Suspending console(s) (use no_console_suspend
> to debug)
> 
> [   69.758911] Disabling non-boot CPUs ...
> 
> [   69.761875] IRQ 17: no longer affine to CPU3
> 
> [   69.762609] Entering suspend state LP1
> 
> [   69.762636] Enabling non-boot CPUs ...
> 
> [   69.763600] CPU1 is up
> 
> [   69.764517] CPU2 is up
> 
> [   69.765532] CPU3 is up
> 
> [   69.845832] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)
> 
> [   69.854223] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
> 
> [   69.857238] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
> 
> [   69.892700] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)
> 
> [   70.407286] OOM killer enabled.
> 
> [   70.414674] Restarting tasks ... done.
> 
> [   70.423232] PM: suspend exit
> 
> [   73.533252] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa
> 0xCDE1
> 
> [  105.461852] nfs: server 192.168.99.1 not responding, still trying
> 
> [  105.462347] nfs: server 192.168.99.1 not responding, still trying
> 
> [  105.484809] nfs: server 192.168.99.1 OK
> 
> [  105.486454] nfs: server 192.168.99.1 OK
> 
> 
> So it would appear that making these timers deferrable is having an
> impact
> when resuming from suspend. Do you have any thoughts on this?
> 

I'd be OK with just reverting this patch if it is causing a performance
issue.

Anna?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ