linux-kernel - Re: 3.0+ NFS issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120529152416.GC3441@fieldses.org>
Date:	Tue, 29 May 2012 11:24:16 -0400
From:	"J. Bruce Fields" <bfields@...ldses.org>
To:	Michael Tokarev <mjt@....msk.ru>
Cc:	linux-nfs@...r.kernel.org,
	Linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: 3.0+ NFS issues

On Fri, May 25, 2012 at 10:53:11AM +0400, Michael Tokarev wrote:
> I updated my nfs server machine to kernel 3.0, and
> noticed that its main usage become, well, problematic.
> 
> While trying to dig deeper, I also found a few other
> interesting issues, which are mentioned below.
> 
> But first thing first: nfs.
> 
> i686pae kernel, lots of RAM, Atom-based (cedar trail)
> machine with usual rtl8169 NIC.  3.0 or 3.2 kernel
> (I will try current 3.4 but I don't have much hopes
> there).  NFSv4.
> 
> When a client machine (also 3.0 kernel) does some reading,
> the process often stalls somewhere in the read syscall,
> or, rarer, during close, for up to two MINUTES.  During
> this time, the client (kernel) reports "NFS server <foo>
> does not respond" several times, and finally "NFS server
> <foo> ok", client process "unstucks" from the read(2),
> and is able to perform a few more reads till the whole
> thing repeats.

You say 2.6.32 was OK; have you tried anything else between?

And you're holding the client constant while varying only the server
version, right?

Is your network otherwise working?  (E.g. does transferring a bunch of
data from server to client using some other protocol work reliably?)

Is there anything happening on the network during these stalls?  (You
can watch the network with wireshark, for example.)

Does NFSv3 behave the same way?

--b.

> 
> While this happens, ie, while the client process is stuck,
> any access to the mounted filesystem results in an
> uninterruptible sleep - all processes including the
> initial client are unkillable.  This is very annoying,
> but I was never able to fix this, including the real
> case when the server becomes unreachable - "intr" mount
> option which is supposed to help here makes no difference.
> This is first and longstanding issue, but it is not really
> related to the issue at hand.
> 
> On the server side things are more interesting.  During
> the client stall, all NFSD threads are busy running,
> using all available CPU.  The machine has 2 cores, and
> if I reduce number of NFSD threads to 1, one core will
> be used by the single NFSD to 100%.  With default 8
> NFSDs, both cores are used, by 4 NFSD threads each.
> This is according to top(1), which show proportional
> and equal CPU activity for each of NFSD thread.
> 
> But.  And this is where things becomes more interesting.
> perf top shows almost no activity with any number of NFSD
> threads but one.  For example, this is a typical perf top
> display with 2 NFSD threads:
> 
>      samples  pcnt function                  DSO
>      1558.00  6.4% svc_recv                  [sunrpc]
>      1084.00  5.2% __schedule                [kernel]
>      1065.00  5.1% select_task_rq_fair       [kernel]
>       692.00  3.3% try_to_wake_up            [kernel]
> 
> 
> Yes, there's some 6.4% time spent in svc_recv, but it is
> FAR from the 200% CPU usage reported by top(1).  Now, when
> I reduce number of NFSD threds to 1, perf top output becomes
> dramatically different:
> 
>      samples  pcnt function                    DSO
>      5996.00 59.2% svc_recv                    [sunrpc]
>       750.00  7.4% cache_check                 [sunrpc]
>       519.00  5.1% svc_tcp_has_wspace          [sunrpc]
>       465.00  4.6% local_bh_enable_ip          [kernel.kallsyms]
>       301.00  3.0% svc_defer                   [sunrpc]
>       294.00  2.9% page_address                [kernel.kallsyms]
> 
> Ie, scv_recv time usage jumps from 6 to 60%, ie, 10 times
> the difference.
> 
> So, apparently, `perf top' tool does not show the truth.
> 
> Trying to catch this high CPU usage with Sysrq+T does not
> show anything interesting, the NFSD thread is running (and
> is the only thread(s) shown in the "Runnable tasks" output):
> 
> 
> May 25 10:48:37 gnome vmunix: [ 1709.483771] nfsd            R running      0  5853      2 0x00000000
> May 25 10:48:37 gnome vmunix: [ 1709.483796]  f4477030 00000046 00000000 5c8e8929 0000018e f4c700f0 f4043a70 f545b140
> May 25 10:48:37 gnome vmunix: [ 1709.483841]  050a8544 0000018e c1469140 c1469140 f4043a70 00000086 00000000 c1048575
> May 25 10:48:37 gnome vmunix: [ 1709.483887]  c101ba11 c12fac18 f31bf800 f8f7bd30 0000003a c12fb156 0000003a 00000000
> May 25 10:48:37 gnome vmunix: [ 1709.483931] Call Trace:
> May 25 10:48:37 gnome vmunix: [ 1709.483950]  [<c1048575>] ? irq_exit+0x35/0x90
> May 25 10:48:37 gnome vmunix: [ 1709.483973]  [<c101ba11>] ? smp_apic_timer_interrupt+0x51/0x90
> May 25 10:48:37 gnome vmunix: [ 1709.483997]  [<c12fac18>] ? _raw_spin_lock_bh+0x8/0x30
> May 25 10:48:37 gnome vmunix: [ 1709.484029]  [<f8f7bd30>] ? svc_xprt_enqueue+0x40/0x1e0 [sunrpc]
> May 25 10:48:37 gnome vmunix: [ 1709.484053]  [<c12fb156>] ? apic_timer_interrupt+0x2a/0x30
> May 25 10:48:37 gnome vmunix: [ 1709.484084]  [<f8f7007b>] ? svc_destroy+0xdb/0x110 [sunrpc]
> May 25 10:48:37 gnome vmunix: [ 1709.484107]  [<c103d3e3>] ? __cond_resched+0x13/0x30
> May 25 10:48:37 gnome vmunix: [ 1709.484129]  [<c12f92b5>] ? _cond_resched+0x25/0x30
> May 25 10:48:37 gnome vmunix: [ 1709.484159]  [<f8f7cb5d>] ? svc_recv+0x1cd/0x7e0 [sunrpc]
> May 25 10:48:37 gnome vmunix: [ 1709.484182]  [<c101ba11>] ? smp_apic_timer_interrupt+0x51/0x90
> May 25 10:48:37 gnome vmunix: [ 1709.484206]  [<c103a3d0>] ? try_to_wake_up+0x200/0x200
> May 25 10:48:37 gnome vmunix: [ 1709.484233]  [<f90ea08c>] ? nfsd+0x8c/0x120 [nfsd]
> May 25 10:48:37 gnome vmunix: [ 1709.484255]  [<c102d240>] ? complete+0x40/0x60
> May 25 10:48:37 gnome vmunix: [ 1709.484279]  [<f90ea000>] ? 0xf90e9fff
> May 25 10:48:37 gnome vmunix: [ 1709.484299]  [<c105ea99>] ? kthread+0x69/0x70
> May 25 10:48:37 gnome vmunix: [ 1709.484321]  [<c105ea30>] ? kthread_worker_fn+0x150/0x150
> May 25 10:48:37 gnome vmunix: [ 1709.484345]  [<c1301876>] ? kernel_thread_helper+0x6/0xd
> 
> May 25 10:49:37 gnome vmunix: [ 1769.982791] nfsd            R running      0  5853      2 0x00000000
> May 25 10:49:37 gnome vmunix: [ 1769.982816]  f5449f80 00000000 00000000 7636b581 0000019c 00000003 7603dcb8 0000019c
> May 25 10:49:37 gnome vmunix: [ 1769.982859]  7603dcb8 0000019c 00000046 00000000 f4043a70 00000086 00000000 c1048575
> May 25 10:49:37 gnome vmunix: [ 1769.982903]  f31bf800 c12fac18 f31bf800 f8f7bd30 00000047 f31bf800 f418c2c0 f8f7c03c
> May 25 10:49:37 gnome vmunix: [ 1769.982948] Call Trace:
> May 25 10:49:37 gnome vmunix: [ 1769.982966]  [<c1048575>] ? irq_exit+0x35/0x90
> May 25 10:49:37 gnome vmunix: [ 1769.982987]  [<c12fac18>] ? _raw_spin_lock_bh+0x8/0x30
> May 25 10:49:37 gnome vmunix: [ 1769.983019]  [<f8f7bd30>] ? svc_xprt_enqueue+0x40/0x1e0 [sunrpc]
> May 25 10:49:37 gnome vmunix: [ 1769.983050]  [<f8f7c03c>] ? svc_xprt_release+0x1c/0xd0 [sunrpc]
> May 25 10:49:37 gnome vmunix: [ 1769.983081]  [<f8f7c0ad>] ? svc_xprt_release+0x8d/0xd0 [sunrpc]
> May 25 10:49:37 gnome vmunix: [ 1769.983112]  [<f8f7cc03>] ? svc_recv+0x273/0x7e0 [sunrpc]
> May 25 10:49:37 gnome vmunix: [ 1769.983143]  [<f8f7c300>] ? svc_age_temp_xprts+0x110/0x190 [sunrpc]
> May 25 10:49:37 gnome vmunix: [ 1769.983166]  [<c103a3d0>] ? try_to_wake_up+0x200/0x200
> May 25 10:49:37 gnome vmunix: [ 1769.983193]  [<f90ea08c>] ? nfsd+0x8c/0x120 [nfsd]
> May 25 10:49:37 gnome vmunix: [ 1769.983215]  [<c102d240>] ? complete+0x40/0x60
> May 25 10:49:37 gnome vmunix: [ 1769.983238]  [<f90ea000>] ? 0xf90e9fff
> May 25 10:49:37 gnome vmunix: [ 1769.983258]  [<c105ea99>] ? kthread+0x69/0x70
> May 25 10:49:37 gnome vmunix: [ 1769.983278]  [<c105ea30>] ? kthread_worker_fn+0x150/0x150
> May 25 10:49:37 gnome vmunix: [ 1769.983301]  [<c1301876>] ? kernel_thread_helper+0x6/0xd
> 
> This is 3.2, on 3.0 it was in different place.
> 
> This problem is persistent, always reproducible, happens
> on reads from the server (I haven't seen it with writes),
> and completely goes away when reverting to the kernel I
> previously used on this machine - 2.6.32.  But unfortunately
> 2.6.32 does not quite work on the new hardware (I replaced
> the motherboard), due to lack of drivers for several pieces.
> (This is a different story, I think NFSD should work in 3.0
> regardless of 2.6.32 issues :)
> 
> Anything I can do to make NFS work again?
> 
> Thanks,
> 
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/