linux-kernel - Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070417223738.8f49a39f.akpm@linux-foundation.org>
Date:	Tue, 17 Apr 2007 22:37:38 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Florin Iucha <florin@...ha.net>,
	Trond Myklebust <Trond.Myklebust@...app.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Adrian Bunk <bunk@...sta.de>,
	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

On Tue, 17 Apr 2007 22:14:02 -0700 (PDT) Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> 
> 
> On Tue, 17 Apr 2007, Florin Iucha wrote:
> > 
> > Already did.  Traces from vanilla kernel at
> >    http://iucha.net/nfs/21-rc7/big-copy
> 
> Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
> there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
> either just in schedule_timeout or similar "normal" waiting (pollwait 
> etc).
> 
> [ The call traces could be prettier, but sadly, even if you enable frame 
>   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
>   just have to ignore the noise) ]
> 
> The triggering process looks like it might be that "cp", it is in the 
> __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
> filemap_fdatawait.
> 
> Is this a trace from the "big copy" hang, or from a gnome splashscreen 
> hang? It *looks* like it's a big copy. Yes/no?
> 
> Anyway, looks like the cp did a "utimes()" system call, which triggers 
> "nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
> kind of endless wait. Nothing stands out from the traces, in other words. 
> Doesn't look like a locking thing, for example - we're not stuck on some 
> inode semaphore, we're literally waiting for the page to be written out.
> 

That's the unpatched kernel.  I think the trace with Trond's patchset
is http://iucha.net/nfs/21-rc7-nfs1/big-copy

[  243.594565] cp            S 00000038b55df454     0  2547   2526 (NOTLB)
[  243.594570]  ffff810078339988 0000000000000086 0000000000000000 00000001000087a9
[  243.594574]  ffff810078339908 ffffffff80162c03 000000000000000a ffff81007f612fe0
[  243.594579]  ffffffff8054c3e0 000000000000063f ffff81007f6131b8 0000000000000292
[  243.594583] Call Trace:
[  243.594587]  [<ffffffff80162c03>] _spin_lock_irqsave+0x11/0x18
[  243.594591]  [<ffffffff8011c0c5>] __mod_timer+0xa9/0xbb
[  243.594595]  [<ffffffff80161283>] schedule_timeout+0x8d/0xb4
[  243.594599]  [<ffffffff8018b8ac>] process_timeout+0x0/0xb
[  243.594603]  [<ffffffff80160bcf>] io_schedule_timeout+0x28/0x33
[  243.594607]  [<ffffffff801a9400>] congestion_wait_interruptible+0x86/0xa3
[  243.594611]  [<ffffffff8019489f>] autoremove_wake_function+0x0/0x38
[  243.594615]  [<ffffffff80418952>] rpc_save_sigmask+0x2f/0x31
[  243.594619]  [<ffffffff80236cfa>] nfs_writepage_setup+0xc8/0x482
[  243.594624]  [<ffffffff80237507>] nfs_updatepage+0x101/0x143
[  243.594628]  [<ffffffff8022da30>] nfs_commit_write+0x2e/0x41
[  243.594633]  [<ffffffff8010fb96>] generic_file_buffered_write+0x530/0x773
[  243.594639]  [<ffffffff8015fae7>] copy_user_generic_string+0x17/0x40
[  243.594643]  [<ffffffff8010cb51>] file_read_actor+0xaa/0x130
[  243.594648]  [<ffffffff80115ee8>] __generic_file_aio_write_nolock+0x38b/0x3fe
[  243.594652]  [<ffffffff8013824c>] debug_mutex_free_waiter+0x5b/0x5f
[  243.594657]  [<ffffffff80120d73>] generic_file_aio_write+0x64/0xc0
[  243.594662]  [<ffffffff8022e128>] nfs_file_write+0xee/0x15a
[  243.594666]  [<ffffffff801175c8>] do_sync_write+0xe2/0x126
[  243.594671]  [<ffffffff8019489f>] autoremove_wake_function+0x0/0x38
[  243.594676]  [<ffffffff80162ab5>] _spin_unlock+0x9/0xb
[  243.594680]  [<ffffffff80116294>] vfs_write+0xae/0x137
[  243.594684]  [<ffffffff80116bea>] sys_write+0x47/0x70
[  243.594688]  [<ffffffff8015ad5e>] system_call+0x7e/0x83

At a guess, bdi_write_congested() is failing to return false.  (nfs_update_request()
got inlined in nfs_writepage_setup(), even though it was defined afterwards?  gcc
got smarter)

We have a stuck pdflush, presumably waiting for dirty+writeback memory to subside:

[  243.594761] pdflush       D 00000038b5643b16     0  2552     11 (L-TLB)
[  243.594766]  ffff81000c617d70 0000000000000046 0000000000000000 00000001000087a9
[  243.594770]  ffff81000c617cf0 ffffffff80162c03 000000000000000a ffff81007e5ff510
[  243.594775]  ffff810002f4a080 0000000000000ae5 ffff81007e5ff6e8 0000000100000282
[  243.594779] Call Trace:
[  243.594783]  [<ffffffff80162c03>] _spin_lock_irqsave+0x11/0x18
[  243.594787]  [<ffffffff8011c0c5>] __mod_timer+0xa9/0xbb
[  243.594792]  [<ffffffff801946f2>] keventd_create_kthread+0x0/0x79
[  243.594795]  [<ffffffff80161283>] schedule_timeout+0x8d/0xb4
[  243.594799]  [<ffffffff8018b8ac>] process_timeout+0x0/0xb
[  243.594803]  [<ffffffff80160bcf>] io_schedule_timeout+0x28/0x33
[  243.594806]  [<ffffffff801a935e>] congestion_wait+0x6b/0x87
[  243.594810]  [<ffffffff8019489f>] autoremove_wake_function+0x0/0x38
[  243.594814]  [<ffffffff8014da11>] writeback_inodes+0xe1/0xea
[  243.594818]  [<ffffffff80153381>] pdflush+0x0/0x1e3
[  243.594821]  [<ffffffff801a732c>] wb_kupdate+0xbb/0x113
[  243.594825]  [<ffffffff80153381>] pdflush+0x0/0x1e3
[  243.594828]  [<ffffffff801534b9>] pdflush+0x138/0x1e3
[  243.594831]  [<ffffffff801a7271>] wb_kupdate+0x0/0x113
[  243.594835]  [<ffffffff801315d6>] kthread+0xd8/0x10b
[  243.594839]  [<ffffffff801270d2>] schedule_tail+0x45/0xa5
[  243.594843]  [<ffffffff8015bb78>] child_rip+0xa/0x12
[  243.594847]  [<ffffffff801946f2>] keventd_create_kthread+0x0/0x79
[  243.594850]  [<ffffffff801471f5>] worker_thread+0x0/0x14b
[  243.594854]  [<ffffffff801314fe>] kthread+0x0/0x10b
[  243.594858]  [<ffffffff8015bb6e>] child_rip+0x0/0x12

at a guess I'd say we have a ton of memory under writeback, but the completions
aren't coming back.

Florin, can we please see /proc/meminfo as well?

Also the result of `echo m > /proc/sysrq-trigger'

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/