linux-kernel - [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as TIMER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <c54db63b-0d5d-2012-162a-cb08cf32245a@nvidia.com>
Date:   Wed, 5 Jun 2019 09:40:30 +0100
From:   Jon Hunter <jonathanh@...dia.com>
To:     Trond Myklebust <trondmy@...merspace.com>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>
Subject: [REGRESSION v5.2-rc] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE
 (431235818bc3)

Hi Trond,

I have been noticing intermittent failures with a system suspend test on
some of our machines that have a NFS mounted root file-system. Bisecting
this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC
timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does
appear to resolve the problem.

The cause of the suspend failure appears to be a long delay observed
sometimes when resuming from suspend, and this is causing our test to
timeout. For example, in a failing case I see something like the
following ...

[   69.667385] PM: suspend entry (deep)

[   69.675642] Filesystems sync: 0.000 seconds

[   69.684983] Freezing user space processes ... (elapsed 0.001 seconds) done.

[   69.697880] OOM killer disabled.

[   69.705670] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.

[   69.719043] printk: Suspending console(s) (use no_console_suspend to debug)

[   69.758911] Disabling non-boot CPUs ...

[   69.761875] IRQ 17: no longer affine to CPU3

[   69.762609] Entering suspend state LP1

[   69.762636] Enabling non-boot CPUs ...

[   69.763600] CPU1 is up

[   69.764517] CPU2 is up

[   69.765532] CPU3 is up

[   69.845832] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)

[   69.854223] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

[   69.857238] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

[   69.892700] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)

[   70.407286] OOM killer enabled.

[   70.414674] Restarting tasks ... done.

[   70.423232] PM: suspend exit

[   73.533252] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1

[  105.461852] nfs: server 192.168.99.1 not responding, still trying

[  105.462347] nfs: server 192.168.99.1 not responding, still trying

[  105.484809] nfs: server 192.168.99.1 OK

[  105.486454] nfs: server 192.168.99.1 OK


So it would appear that making these timers deferrable is having an impact
when resuming from suspend. Do you have any thoughts on this?

Thanks
Jon