lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFLxGvxm43MaOqv1JQ8vxkn4s50uHZ8Js6A-Ec+O_sjvFszehQ@mail.gmail.com>
Date:	Thu, 22 May 2014 12:50:34 +0200
From:	Richard Weinberger <richard.weinberger@...il.com>
To:	Linux NFS mailing list <linux-nfs@...r.kernel.org>
Cc:	trond.myklebust@...marydata.com, NeilBrown <neilb@...e.de>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: NFS issue while copying huge file

On Wed, May 21, 2014 at 11:33 PM, Richard Weinberger
<richard.weinberger@...il.com> wrote:
> Hi!
>
> My backup server runs CentOS6 with kernel 2.6.32-431.17.1.el6.x86_64.
> Many servers copy theirs backups to it. This setups works perfectly
> fine since ages.
> Today I've configured a new server which runs Kernel 3.12.20 to store
> it's backups to the NFS server too.
>
> The backup file has around 400GiB, after 100GiB the client aborts the
> copy job with EIO.
> dmesg contains:
> ---cut---
> RPC: AUTH_GSS upcall timed out.
> Please check user daemon is running.
> nfs: server 10.42.0.1 not responding, timed out
> nfs: server 10.42.0.1 not responding, timed out
> nfs: server 10.42.0.1 not responding, timed out
> nfs: server 10.42.0.1 not responding, timed out
> ---cut---
> Some times the client reports also a hung task too.
>
> I'm using vers=4 and tcp.
> (Tried also vers=3 with udp and other combinations)
>
> And now the really strange part, on the NFS server the load increases up to 10.
> I see lots of nfsd kernel threads consuming 100% CPU.
> *sometimes* even the oom killer triggers on the NFS server.
> Like:
> ---cut---
> top invoked oom-killer: gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0
> top cpuset=/ mems_allowed=0
> Pid: 1907, comm: top Not tainted 2.6.32-431.17.1.el6.x86_64 #1
> Call Trace:
>  [<ffffffff810d0211>] ? cpuset_print_task_mems_allowed+0x91/0xb0
>  [<ffffffff811225c0>] ? dump_header+0x90/0x1b0
>  [<ffffffff8122761c>] ? security_real_capable_noaudit+0x3c/0x70
>  [<ffffffff81122a42>] ? oom_kill_process+0x82/0x2a0
>  [<ffffffff8112293e>] ? select_bad_process+0x9e/0x120
>  [<ffffffff81122e80>] ? out_of_memory+0x220/0x3c0
>  [<ffffffff8112f79f>] ? __alloc_pages_nodemask+0x89f/0x8d0
>  [<ffffffff8116e082>] ? kmem_getpages+0x62/0x170
>  [<ffffffff8116ec9a>] ? fallback_alloc+0x1ba/0x270
>  [<ffffffff8116e6ef>] ? cache_grow+0x2cf/0x320
>  [<ffffffff8116ea19>] ? ____cache_alloc_node+0x99/0x160
>  [<ffffffff8116f99b>] ? kmem_cache_alloc+0x11b/0x190
>  [<ffffffff811f2910>] ? proc_alloc_inode+0x20/0xb0
>  [<ffffffff811a5747>] ? alloc_inode+0x27/0xa0
>  [<ffffffff811a683b>] ? new_inode+0x2b/0xb0
>  [<ffffffff811f45b4>] ? proc_pid_make_inode+0x14/0xb0
>  [<ffffffff811f5f11>] ? proc_pident_instantiate+0x21/0xc0
>  [<ffffffff811f610e>] ? proc_pident_lookup+0x8e/0xd0
>  [<ffffffff811f61aa>] ? proc_tgid_base_lookup+0x1a/0x20
>  [<ffffffff811986f5>] ? do_lookup+0x1a5/0x230
>  [<ffffffff81199014>] ? __link_path_walk+0x794/0xff0
>  [<ffffffff81199b2a>] ? path_walk+0x6a/0xe0
>  [<ffffffff81199d3b>] ? filename_lookup+0x6b/0xc0
>  [<ffffffff81226076>] ? security_file_alloc+0x16/0x20
>  [<ffffffff8119b214>] ? do_filp_open+0x104/0xd20
>  [<ffffffff81060aa3>] ? perf_event_task_sched_out+0x33/0x70
>  [<ffffffff812824f5>] ? _atomic_dec_and_lock+0x55/0x80
>  [<ffffffff811a7ea2>] ? alloc_fd+0x92/0x160
>  [<ffffffff811859e9>] ? do_sys_open+0x69/0x140
>  [<ffffffff8100c715>] ? math_state_restore+0x45/0x60
>  [<ffffffff81185b00>] ? sys_open+0x20/0x30
>  [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> Node 0 DMA32 per-cpu:
> CPU    0: hi:  186, btch:  31 usd:   0
> CPU    1: hi:  186, btch:  31 usd:  30
> active_anon:2091 inactive_anon:56 isolated_anon:0
>  active_file:208587 inactive_file:215722 isolated_file:32
>  unevictable:0 dirty:213512 writeback:481 unstable:0
>  free:13225 slab_reclaimable:14428 slab_unreclaimable:10804
>  mapped:299 shmem:63 pagetables:444 bounce:0
> Node 0 DMA free:8220kB min:336kB low:420kB high:504kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:7128kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:15276kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:332kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:42272 all_unreclaimable?
> yes
> lowmem_reserve[]: 0 1976 1976 1976
> Node 0 DMA32 free:44680kB min:44716kB low:55892kB high:67072kB
> active_anon:8364kB inactive_anon:224kB active_file:834348kB
> inactive_file:855676kB unevictable:0kB isolated(anon):0kB
> isolated(file):256kB present:2024312kB mlocked:0kB dirty:854048kB
> writeback:1924kB mapped:1196kB shmem:252kB slab_reclaimable:57380kB
> slab_unreclaimable:43216kB kernel_stack:1384kB pagetables:1776kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9952
> all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 5*4kB 3*8kB 3*16kB 4*32kB 1*64kB 2*128kB 0*256kB 1*512kB
> 3*1024kB 2*2048kB 0*4096kB = 8220kB
> Node 0 DMA32: 346*4kB 106*8kB 225*16kB 120*32kB 55*64kB 14*128kB
> 10*256kB 5*512kB 2*1024kB 1*2048kB 5*4096kB = 44680kB
> 424434 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap  = 0kB
> Total swap = 0kB
> 522224 pages RAM
> 48677 pages reserved
> 426546 pages shared
> 31791 pages non-shared
> [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> [  472]     0   472     2771      274   1     -17         -1000 udevd
> [ 1272]     0  1272     6910      148   1     -17         -1000 auditd
> [ 1312]    32  1312     4744      137   1       0             0 rpcbind
> [ 1338]    29  1338     5837      196   0       0             0 rpc.statd
> [ 1350]     0  1350     1173      101   0       0             0 mdadm
> [ 1423]     0  1423     6996      254   1       0             0 rpc.mountd
> [ 1459]     0  1459     6291      148   0       0             0 rpc.idmapd
> [ 1471]     0  1471   126650      252   0     -17         -1000 tgtd
> [ 1473]     0  1473     3722       90   0     -17         -1000 tgtd
> [ 1557]     0  1557    16652      276   1     -17         -1000 sshd
> [ 1565]    38  1565     7680      318   0       0             0 ntpd
> [ 1606]     0  1606    29324      235   1       0             0 crond
> [ 1621]     0  1621     1016       74   0       0             0 mingetty
> [ 1623]     0  1623     1016       74   1       0             0 mingetty
> [ 1625]     0  1625     1016       73   0       0             0 mingetty
> [ 1627]     0  1627     1016       73   0       0             0 mingetty
> [ 1629]     0  1629     1016       73   1       0             0 mingetty
> [ 1631]     0  1631     1016       74   1       0             0 mingetty
> [ 1638]     0  1638     2770      264   0     -17         -1000 udevd
> [ 1639]     0  1639     2770      264   0     -17         -1000 udevd
> [ 1889]     0  1889    25090      389   0       0             0 sshd
> [ 1891]   500  1891    25090      368   1       0             0 sshd
> [ 1892]   500  1892    27076      184   1       0             0 bash
> [ 1907]   500  1907     3757      219   0       0             0 top
> Out of memory: Kill process 1312 (rpcbind) score 1 or sacrifice child
> Killed process 1312, UID 32, (rpcbind) total-vm:18976kB,
> anon-rss:248kB, file-rss:300kB
> ---cut---
>
> or
>
> ---cut---
> collectd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0,
> oom_score_adj=0
> collectd cpuset=/ mems_allowed=0
> Pid: 1596, comm: collectd Not tainted 2.6.32-431.17.1.el6.x86_64 #1
> Call Trace:
>  [<ffffffff810d0211>] ? cpuset_print_task_mems_allowed+0x91/0xb0
>  [<ffffffff811225c0>] ? dump_header+0x90/0x1b0
>  [<ffffffff8122761c>] ? security_real_capable_noaudit+0x3c/0x70
>  [<ffffffff81122a42>] ? oom_kill_process+0x82/0x2a0
>  [<ffffffff81122981>] ? select_bad_process+0xe1/0x120
>  [<ffffffff81122e80>] ? out_of_memory+0x220/0x3c0
>  [<ffffffff81122438>] ? try_set_zonelist_oom+0xa8/0xe0
>  [<ffffffff8112f79f>] ? __alloc_pages_nodemask+0x89f/0x8d0
>  [<ffffffff8116769a>] ? alloc_pages_current+0xaa/0x110
>  [<ffffffff8111f9b7>] ? __page_cache_alloc+0x87/0x90
>  [<ffffffff8111f39e>] ? find_get_page+0x1e/0xa0
>  [<ffffffff81120957>] ? filemap_fault+0x1a7/0x500
>  [<ffffffff811bea76>] ? __set_page_dirty_buffers+0x46/0xc0
>  [<ffffffff81149c84>] ? __do_fault+0x54/0x530
>  [<ffffffff8114a257>] ? handle_pte_fault+0xf7/0xb00
>  [<ffffffffa0072360>] ? ext4_get_block+0x0/0x120 [ext4]
>  [<ffffffff8114ae8a>] ? handle_mm_fault+0x22a/0x300
>  [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
>  [<ffffffff811da055>] ? fcntl_setlk+0x75/0x320
>  [<ffffffff8152d73e>] ? do_page_fault+0x3e/0xa0
>  [<ffffffff8152aaf5>] ? page_fault+0x25/0x30
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> Node 0 DMA32 per-cpu:
> CPU    0: hi:  186, btch:  31 usd: 177
> CPU    1: hi:  186, btch:  31 usd:  40
> active_anon:3664 inactive_anon:56 isolated_anon:0
>  active_file:204351 inactive_file:217741 isolated_file:64
>  unevictable:8 dirty:215292 writeback:777 unstable:0
>  free:13232 slab_reclaimable:14527 slab_unreclaimable:10875
>  mapped:631 shmem:63 pagetables:595 bounce:0
> Node 0 DMA free:8220kB min:336kB low:420kB high:504kB active_anon:0kB
> inactive_anon:0kB active_file:284kB inactive_file:6844kB
> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15276kB
> mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:332kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:59200 all_unreclaimable? yes
> lowmem_reserve[]: 0 1976 1976 1976
> Node 0 DMA32 free:44708kB min:44716kB low:55892kB high:67072kB
> active_anon:14656kB inactive_anon:224kB active_file:817120kB
> inactive_file:864120kB unevictable:32kB isolated(anon):0kB
> isolated(file):256kB present:2024312kB mlocked:32kB dirty:861168kB
> writeback:3108kB mapped:2524kB shmem:252kB slab_reclaimable:57776kB
> slab_unreclaimable:43500kB kernel_stack:1552kB pagetables:2380kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:359776
> all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 5*4kB 3*8kB 3*16kB 4*32kB 1*64kB 2*128kB 2*256kB 2*512kB
> 2*1024kB 2*2048kB 0*4096kB = 8220kB
> Node 0 DMA32: 177*4kB 378*8kB 85*16kB 148*32kB 11*64kB 1*128kB 1*256kB
> 2*512kB 2*1024kB 1*2048kB 7*4096kB = 44708kB
> 422228 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap  = 0kB
> Total swap = 0kB
> 522224 pages RAM
> 48677 pages reserved
> 426430 pages shared
> 33618 pages non-shared
> [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
> [  472]     0   472     2771      278   1     -17         -1000 udevd
> [ 1272]     0  1272     6910      175   1     -17         -1000 auditd
> [ 1289]     0  1289    63855      312   0       0             0 rsyslogd
> [ 1312]    32  1312     4744      173   0       0             0 rpcbind
> [ 1338]    29  1338     5837      261   0       0             0 rpc.statd
> [ 1350]     0  1350     1173      135   0       0             0 mdadm
> [ 1423]     0  1423     6996      324   0       0             0 rpc.mountd
> [ 1459]     0  1459     6291      170   0       0             0 rpc.idmapd
> [ 1471]     0  1471   126650      294   0     -17         -1000 tgtd
> [ 1473]     0  1473     3722      109   0     -17         -1000 tgtd
> [ 1557]     0  1557    16652      279   1     -17         -1000 sshd
> [ 1565]    38  1565     7680      383   0       0             0 ntpd
> [ 1582]     0  1582     1530       79   1       0             0 collectdmon
> [ 1583]     0  1583   271094     1445   1       0             0 collectd
> [ 1606]     0  1606    29324      288   0       0             0 crond
> [ 1621]     0  1621     1016      140   0       0             0 mingetty
> [ 1623]     0  1623     1016      140   1       0             0 mingetty
> [ 1625]     0  1625     1016      139   0       0             0 mingetty
> [ 1627]     0  1627     1016      139   0       0             0 mingetty
> [ 1629]     0  1629     1016      139   1       0             0 mingetty
> [ 1631]     0  1631     1016      140   1       0             0 mingetty
> [ 1638]     0  1638     2770      266   0     -17         -1000 udevd
> [ 1639]     0  1639     2770      266   0     -17         -1000 udevd
> [ 1640]     0  1640    25090      467   0       0             0 sshd
> [ 1642]   500  1642    25090      384   0       0             0 sshd
> [ 1643]   500  1643    27076      366   1       0             0 bash
> [ 1659]   500  1659    36362      279   1       0             0 su
> [ 1660]     0  1660    27107      424   1       0             0 bash
> [ 1705]     0  1705     3757      329   0       0             0 top
> Out of memory: Kill process 1289 (rsyslogd) score 1 or sacrifice child
> Killed process 1289, UID 0, (rsyslogd) total-vm:255420kB,
> anon-rss:580kB, file-rss:668kB
> ---cut---
>
> If I use the CentOS6 kernel (2.6.32.x) on the NFS client too,
> everything works fine.
> No timeout, no oom killer and the load on the NFS server is all the
> time below 1.
>
> Looks like 3.12.x can DoS NFS servers. ;-)
Just gave 3.15-rc6 a try, it happens here too.

If it helps and the issue is not known I can try to find the exact release.

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ