linux-kernel - Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 22 Jun 2008 21:56:17 +0100
From:	"Daniel J Blueman" <daniel.blueman@...il.com>
To:	"Arjan van de Ven" <arjan@...radead.org>
Cc:	"Mel Gorman" <mel@....ul.ie>,
	"Christoph Lameter" <clameter@....com>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Alexander Beregalov" <a.beregalov@...il.com>,
	"Linux Kernel" <linux-kernel@...r.kernel.org>, dgc@....com
Subject: Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...

On Sun, Jun 22, 2008 at 7:21 PM, Arjan van de Ven <arjan@...radead.org> wrote:
> On Sun, 22 Jun 2008 19:10:11 +0100
> Mel Gorman <mel@....ul.ie> wrote:
>
>> On (22/06/08 10:58), Daniel J Blueman didst pronounce:
>> > I'm seeing a similar issue [2] to what was recently reported [1] by
>> > Alexander, but with another workload involving XFS and memory
>> > pressure.
>> >
>>
>> Is NFS involved or is this XFS only? It looks like XFS-only but no
>> harm in being sure.
>>
>> I'm beginning to wonder if this is a problem where a lot of dirty
>> inodes are being written back in this path and we stall while that
>> happens. I'm still not getting why we are triggering this now and did
>> not before 2.6.26-rc1 or why it bisects to the zonelist
>> modifications. Diffing the reclaim and allocation paths between
>> 2.6.25 and 2.6.26-rc1 has not yielded any candidates for me yet that
>> would explain this.
>
> this sort of thing can easily be exposed with the latencytop tool...
> it will at least tell you WHAT the system is blocking on.
> (not so much the why, the tool isn't smart enough to automatically spit
> out kernel patches yet)

Good plan. I reproduced this without NFS mounted, so pure XFS. I
wasn't able to capture the same process's (ie 5480) latencytop trace,
but it may have contributed to the list.

A fair amount of debugging was turned on, hurting the latency some
(despite it being a 3.6GHz Penryn).

Daniel

--- [1]

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
<snip>
 1  1    156  14424     12 2329672    0    0     0 110755  177 3820 57  6 36  0
 2  1    156  14736     12 2348152    0    0    24 25172  204 26018 35 21 43  1
 5  0    156  24252     12 2363800    0    0 59656    31  545 25292 35 14 28 23
 4  0    156  14696     12 2317784    0    0  3824     0   38 23083 95  6  0  0
 4  0    156  14440     12 2319304    0    0  4672     0   72 3372 93  3  3  2
 2  0    156  14428     12 2318484    0    0     0     4   27  731 52  0 49  0
 2  0    156  14480     12 2308512    0    0     0    12   32 36629 39 13 49  0
 2  0    156  14572     12 2301220    0    0  3904 12316  117 10760 58  7 26 11

--- [2]

Cause                                                Maximum     Percentage
down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 271.1 msec          0.4 %
down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f206.1 msec          1.3 %
down xfs_buf_lock xfs_getsb xfs_trans_getsb xfs_tr160.4 msec          0.5 %
sync_page sync_page_killable __lock_page_killable 155.3 msec         26.1 %
down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f101.6 msec          0.6 %
down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f 99.1 msec          0.7 %
sync_page __lock_page find_lock_page filemap_fault 84.0 msec          1.5 %
down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f 70.8 msec          0.4 %
sync_buffer __wait_on_buffer __block_prepare_write 66.4 msec          0.1 %
o_sync_readv_writev do_readv_writev vfs_writev

Process AutopanoPro (4697)                 Total: 4693.4 msec
down xfs_buf_lock xfs_getsb xfs_trans_getsb xfs_tr126.2 msec
2.7 %trans_commit xfs_create xfs_vn_mknod xfs_vn_create vfs_create
do_filp_open do_sys_open
sync_page sync_page_killable __lock_page_killable  89.3 msec
15.3 %_read __xfs_file_read xfs_file_aio_read do_sync_read vfs_read
sys_read system_call_after
down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f 88.5 msec
2.2 %fs_da_do_buf xfs_da_read_buf xfs_dir2_leaf_lookup_int
xfs_dir2_leaf_lookup xfs_dir_looku
Scheduler: waiting for cpu                         49.4 msec         77.7 %
congestion_wait __alloc_pages_internal __alloc_pag 19.8 msec
1.4 %e_fault error_exit
congestion_wait __alloc_pages_internal __alloc_pag 19.4 msec
0.4 %ead ondemand_readahead page_cache_async_readahead
generic_file_aio_read xfs_read __xfs_f
ile_read xfs_file_aio_read do_sync_read vfs_read

--- [3]

Cause                                                Maximum     Percentage
sync_page sync_page_killable __lock_page_killable 103.9 msec         36.5 %
Scheduler: waiting for cpu                         37.0 msec         49.5 %
sync_page __lock_page find_lock_page filemap_fault 28.6 msec          1.2 %
congestion_wait __alloc_pages_internal __alloc_pag 19.5 msec          1.3 %
congestion_wait __alloc_pages_internal __alloc_pag 19.2 msec          0.4 %
congestion_wait __alloc_pages_internal __alloc_pag 19.2 msec          0.4 %
do_page_fault error_exit d do_sync_read vfs_read   11.6 msec          0.5 %
sys_mmap system_call_after_swapgs                   7.8 msec          0.2 %
do_select core_sys_select sys_select system_call_a  4.9 msec          8.1 %

Process AutopanoPro (4697)                 Total: 977.5 msec
sync_page sync_page_killable __lock_page_killable  97.7 msec
76.6 %_read __xfs_file_read xfs_file_aio_read do_sync_read vfs_read
sys_read system_call_after
sync_page __lock_page find_lock_page filemap_fault 28.6 msec
5.6 %lt do_page_fault error_exit
congestion_wait __alloc_pages_internal __alloc_pag 19.5 msec
2.0 %e_fault error_exit
congestion_wait __alloc_pages_internal __alloc_pag 19.2 msec
2.0 %k_write_begin xfs_vm_write_begin generic_file_buffered_write
xfs_write __xfs_file_write
Scheduler: waiting for cpu_write vfs_write         16.5 msec          8.7 %
do_page_fault error_exit                           11.6 msec          2.3 %

--- [4]

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.26-rc7-211c #2
-------------------------------------------------------
AutopanoPro/5480 is trying to acquire lock:
 (iprune_mutex){--..}, at: [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280

but task is already holding lock:
 (&(&ip->i_iolock)->mr_lock){----}, at: [<ffffffff803a459f>]
xfs_ilock+0xbf/0x110

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&ip->i_iolock)->mr_lock){----}:
       [<ffffffff802769ad>] __lock_acquire+0xbdd/0x1020
       [<ffffffff80276e55>] lock_acquire+0x65/0x90
       [<ffffffff8026b41b>] down_write_nested+0x4b/0x90
       [<ffffffff803a45df>] xfs_ilock+0xff/0x110
       [<ffffffff803a47af>] xfs_ireclaim+0x3f/0x90
       [<ffffffff803c9169>] xfs_finish_reclaim+0x59/0x220
       [<ffffffff803cc3b5>] xfs_reclaim+0x185/0x190
       [<ffffffff803d9211>] xfs_fs_clear_inode+0xe1/0x130
       [<ffffffff802d25c7>] clear_inode+0x87/0x120
       [<ffffffff802d290a>] dispose_list+0x2a/0x100
       [<ffffffff802d2c06>] shrink_icache_memory+0x226/0x280
       [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
       [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
       [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
       [<ffffffff802981db>] __alloc_pages+0xb/0x10
       [<ffffffff802a312a>] handle_mm_fault+0x46a/0x6d0
       [<ffffffff8060510a>] do_page_fault+0x3ca/0x830
       [<ffffffff80602add>] error_exit+0x0/0xa9
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (iprune_mutex){--..}:
       [<ffffffff80276817>] __lock_acquire+0xa47/0x1020
       [<ffffffff80276e55>] lock_acquire+0x65/0x90
       [<ffffffff8060059a>] mutex_lock_nested+0xba/0x2b0
       [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
       [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
       [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
       [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
       [<ffffffff802981db>] __alloc_pages+0xb/0x10
       [<ffffffff8029a6b6>] __do_page_cache_readahead+0x136/0x230
       [<ffffffff8029aa08>] ondemand_readahead+0x128/0x1f0
       [<ffffffff8029ab45>] page_cache_async_readahead+0x75/0xa0
       [<ffffffff80293a8a>] generic_file_aio_read+0x28a/0x610
       [<ffffffff803d78c4>] xfs_read+0x124/0x270
       [<ffffffff803d4416>] __xfs_file_read+0x46/0x50
       [<ffffffff803d4451>] xfs_file_aio_read+0x11/0x20
       [<ffffffff802bc1b1>] do_sync_read+0xf1/0x130
       [<ffffffff802bca74>] vfs_read+0xc4/0x160
       [<ffffffff802bcf10>] sys_read+0x50/0x90
       [<ffffffff8022639b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by AutopanoPro/5480:
 #0:  (&(&ip->i_iolock)->mr_lock){----}, at: [<ffffffff803a459f>]
xfs_ilock+0xbf/0x110
 #1:  (shrinker_rwsem){----}, at: [<ffffffff8029d8d2>] shrink_slab+0x32/0x180

stack backtrace:
Pid: 5480, comm: AutopanoPro Not tainted 2.6.26-rc7-211c #2

Call Trace:
 [<ffffffff80274293>] print_circular_bug_tail+0x83/0x90
 [<ffffffff80276817>] __lock_acquire+0xa47/0x1020
 [<ffffffff8027588f>] ? trace_hardirqs_on+0xbf/0x150
 [<ffffffff80276e55>] lock_acquire+0x65/0x90
 [<ffffffff802d2a5d>] ? shrink_icache_memory+0x7d/0x280
 [<ffffffff8060059a>] mutex_lock_nested+0xba/0x2b0
 [<ffffffff802d2a5d>] ? shrink_icache_memory+0x7d/0x280
 [<ffffffff80602915>] ? _spin_unlock_irqrestore+0x55/0x60
 [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
 [<ffffffff8029d8d2>] ? shrink_slab+0x32/0x180
 [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
 [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
 [<ffffffff8029c570>] ? isolate_pages_global+0x0/0x40
 [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
 [<ffffffff802981db>] __alloc_pages+0xb/0x10
 [<ffffffff8029a6b6>] __do_page_cache_readahead+0x136/0x230
 [<ffffffff8029aa08>] ondemand_readahead+0x128/0x1f0
 [<ffffffff8029ab45>] page_cache_async_readahead+0x75/0xa0
 [<ffffffff80293a8a>] generic_file_aio_read+0x28a/0x610
 [<ffffffff803d78c4>] xfs_read+0x124/0x270
 [<ffffffff803d4416>] __xfs_file_read+0x46/0x50
 [<ffffffff803d4451>] xfs_file_aio_read+0x11/0x20
 [<ffffffff802bc1b1>] do_sync_read+0xf1/0x130
 [<ffffffff802673f0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8022ce57>] ? native_sched_clock+0x57/0x80
 [<ffffffff802bda68>] ? fget_light+0xe8/0x100
 [<ffffffff80272dea>] ? lock_release_holdtime+0x5a/0x80
 [<ffffffff802bca74>] vfs_read+0xc4/0x160
 [<ffffffff802bcf10>] sys_read+0x50/0x90
 [<ffffffff8022639b>] system_call_after_swapgs+0x7b/0x80
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/