[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B910D8C.30301@ge.com>
Date: Fri, 05 Mar 2010 14:56:28 +0100
From: Enrik Berkhan <Enrik.Berkhan@...com>
To: linux-ext4@...r.kernel.org
Subject: Re: possible ext4 related deadlock
Hi,
Enrik Berkhan wrote:
> currently we're experiencing some process hangs that seem to be
> ext4-related. (Kernel 2.6.28.10-Blackfin, i.e. with Analog Devices
> patches including some memory management changes for NOMMU.)
>
> The situation is as follows:
>
> We have two threads writing to an ext4-filesystem. After several hours
> and accross about 20 systems there happens one hang where
> (reconstructed from Alt-SysRq-W output):
>
> 1. pdflush waits in start_this_handle
> 2. kjournald2 waits in jdb2_journal_commit_transaction
> 3. thread 1 waits in start_this_handle
> 4. thread 2 waits in
> ext4_da_write_begin
> (start_this_handle succeeded)
> grab_cache_page_write_begin
> __alloc_pages_internal
> try_to_free_pages
> do_try_to_free_pages
> congestion_wait
>
> Actually, thread 2 shouldn't be completely blocked, because
> congestion_wait has a timeout if I understand the code correctly.
> Unfortunately, I pressed Alt-SysRq-W only once when having a chance to
> reproduce the problem on a test system with console access.
Meanwhile, I have found out that thread 2 actually isn't completely
blocked but loops in __alloc_pages_internal:
get_page_from_freelist() doesn't return a page;
try_to_free_pages() returns did_some_progress == 0;
later, do_retry == 1 and the loop restarts with goto rebalance;
I have implemented a workaround by limiting the "goto rebalance" retries
to 3 iterations and triggering a page allocation failure otherwise.
I have changed ext4_{da_,}write_begin() to retry
grab_cache_page_write_begin() 3 times, too, by calling
ext4_journal_stop(); schedule_timeout(1) and goto retry;
Using this change the system recovers from the situation on first
grab_cache_page_write_begin() retry. I have seen that 17 times during a
test last night.
Interestingly, the show_mem() call triggered in __alloc_pages_internal()
on page allocation failure shows plenty of memory free, see this example:
> Mar 5 02:26:16 bfp53 [21804.947985] manage_daq: page allocation failure. order:0, mode:0x120052
> Mar 5 02:26:16 bfp53 [21804.954607] Hardware Trace:
> Mar 5 02:26:16 bfp53 [21804.957377] 0 Target : <0x000049ac> { _dump_stack + 0x0 }
> Mar 5 02:26:16 bfp53 [21804.963139] Source : <0x00037618> { ___alloc_pages_internal + 0x1cc } CALL pcrel
> Mar 5 02:26:16 bfp53 [21804.970996] 1 Target : <0x00037618> { ___alloc_pages_internal + 0x1cc }
> Mar 5 02:26:16 bfp53 [21804.977966] Source : <0x0000fd2a> { _printk + 0x16 } RTS
> Mar 5 02:26:16 bfp53 [21804.983735] 2 Target : <0x0000fd26> { _printk + 0x12 }
> Mar 5 02:26:16 bfp53 [21804.989226] Source : <0x0001058e> { _vprintk + 0x116 } RTS
> Mar 5 02:26:16 bfp53 [21804.995165] 3 Target : <0x00010586> { _vprintk + 0x10e }
> Mar 5 02:26:16 bfp53 [21805.005705] Source : <0xffa00c76> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.013596] 4 Target : <0xffa00c14> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.022158] Source : <0xffa00a5c> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.030932] 5 Target : <0xffa00a5c> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.040640] Source : <0xffa00a32> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.048552] 6 Target : <0xffa00a04> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.057211] Source : <0xffa00c10> /* kernel dynamic memory */
> Mar 5 02:26:16 bfp53 [21805.065086] 7 Target : <0xffa00c0e> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.072947] Source : <0xffa00efe> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.081598] 8 Target : <0xffa00efa> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.090706] Source : <0xffa003d2> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.098692] 9 Target : <0xffa003ca> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.104880] Source : <0x00014024> { _irq_exit + 0x50 } RTS
> Mar 5 02:26:17 bfp53 [21805.110793] 10 Target : <0x0001401e> { _irq_exit + 0x4a }
> Mar 5 02:26:17 bfp53 [21805.116454] Source : <0x0001402c> { _irq_exit + 0x58 } IF CC JUMP
> Mar 5 02:26:17 bfp53 [21805.123013] 11 Target : <0x0001402a> { _irq_exit + 0x56 }
> Mar 5 02:26:17 bfp53 [21805.128674] Source : <0x00009d8c> { _idle_cpu + 0x1c } RTS
> Mar 5 02:26:17 bfp53 [21805.134643] 12 Target : <0x00009d70> { _idle_cpu + 0x0 }
> Mar 5 02:26:17 bfp53 [21805.140192] Source : <0x00014026> { _irq_exit + 0x52 } CALL pcrel
> Mar 5 02:26:17 bfp53 [21805.146756] 13 Target : <0x00014026> { _irq_exit + 0x52 }
> Mar 5 02:26:17 bfp53 [21805.152471] Source : <0x0001401c> { _irq_exit + 0x48 } IF !CC JUMP
> Mar 5 02:26:17 bfp53 [21805.159064] 14 Target : <0x00014006> { _irq_exit + 0x32 }
> Mar 5 02:26:17 bfp53 [21805.164783] Source : <0x00014048> { _irq_exit + 0x74 } JUMP.S
> Mar 5 02:26:17 bfp53 [21805.170932] 15 Target : <0x00014048> { _irq_exit + 0x74 }
> Mar 5 02:26:17 bfp53 [21805.176592] Source : <0x00013d6e> { ___do_softirq + 0xa2 } RTS
> Mar 5 02:26:17 bfp53 [21805.182864] Stack info:
> Mar 5 02:26:17 bfp53 [21805.191811] SP: [0x02ab9b60] <0x02ab9b60> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.198495] FP: (0x02ab9b64)
> Mar 5 02:26:17 bfp53 [21805.201403] Memory from 0x02ab9b60 to 02aba000
> Mar 5 02:26:17 bfp53 [21805.205835] 02ab9b60:[02ab9bbc](02ab9bbc)<0003761c> 03bde680 00000000 00120052 03bde880 00000000
> Mar 5 02:26:17 bfp53 [21805.215086] 02ab9b80: 00120052 00000002 00000042 02ab8000 02ab8000 00000000 00120052 00000000
> Mar 5 02:26:17 bfp53 [21805.224375] 02ab9ba0: 02ab8000 00000010 00000002 00000000 00000000 00000000 00000000 (02ab9bf0)
> Mar 5 02:26:17 bfp53 [21805.233690] 02ab9bc0:<00033300><02f4e3dc> 04a85000 <02f4e344> 00000000 00000080 00004a85 ffffff7f
> Mar 5 02:26:17 bfp53 [21805.243044] 02ab9be0: 02ab9c04 <00a8f76e> 00000000 00000000 (02ab9c44)<00a8892c> 00000000 031423d8
> Mar 5 02:26:17 bfp53 [21805.252333] 02ab9c00: 00000000 00000004 031423d8 000005cc 04a85000 000005cc 00000000 000005cc
> Mar 5 02:26:17 bfp53 [21805.261648] 02ab9c20: 04a84a34 00000000 000005cc 000005cc 04a86000 00000000 00004a85 00000003
> Mar 5 02:26:17 bfp53 [21805.271002] 02ab9c40: 00000000 (02ab9ca4)<00032b08> 02ab8000 00001000 <02f4e3dc> 000005cc 00000000
> Mar 5 02:26:17 bfp53 [21805.280292] 02ab9c60: 04a85000 00001000 0306a260 <02f4e3dc> 04a85000 00000000 00001000 00000000
> Mar 5 02:26:17 bfp53 [21805.289608] 02ab9c80: 02ab9ca0 02ab9c9c 00000000 00a9fc74 000005cc 00000000 00000000 00000000
> Mar 5 02:26:17 bfp53 [21805.298962] 02ab9ca0: 00265100 (02ab9cf0)<00033a6c> 0306a260 <02f4e3dc><02f4e344> 02ab9ce0 00000000
> Mar 5 02:26:17 bfp53 [21805.308252] 02ab9cc0: 04a84a34 00000000 0306a260 02ab9ce0 04a84a34 00000000 00000000 00a9fc74
> Mar 5 02:26:17 bfp53 [21805.317566] 02ab9ce0: 02ab9e94 00000001 000005cc 00001c24 (02ab9d60)<00033e8a><02f4e344> 000021f0
> Mar 5 02:26:17 bfp53 [21805.326919] 02ab9d00: 0306a260 00000000 02ab9e94 00000000 04a84a34 02ab9e10 02ab9dfc <000fda3a>
> Mar 5 02:26:17 bfp53 [21805.336211] 02ab9d20: 04a84a34 00000000 02ab9e58 000021f0 00000000 00000000 <02ab9d70><02f4e3dc>
> Mar 5 02:26:17 bfp53 [21805.345529] 02ab9d40: 00000000 00000001 ffffffff 00000000 00000000 00000000 00000000 000021f0
> Mar 5 02:26:17 bfp53 [21805.354882] 02ab9d60:<02ab9da0><0003440a> 02ab9e10 <02f4e344> 0306a260 02f4e3b0 02ab9e94 00000000
> Mar 5 02:26:17 bfp53 [21805.364168] 02ab9d80: 04a84a34 02ab9e10 02ab9e94 00000001 02ab9e58 <0001410e> 02ab9dbc <02f4e3dc>
> Mar 5 02:26:17 bfp53 [21805.364168] 02ab9d80: 04a84a34 02ab9e10 02ab9e94 00000001 02ab9e58 <0001410e> 02ab9dbc <02f4e3dc>
> Mar 5 02:26:17 bfp53 [21805.373488] 02ab9da0: 02ab9dd8 <00a81c28><02f4e344> 0306a260 02f4e2cc 00000000 04a84a34 00000001
> Mar 5 02:26:17 bfp53 [21805.382834] 02ab9dc0: 02ab9e94 00000000 00040000 00000001 04a84a34 00000000 02ab9e9c <00042f34>
> Mar 5 02:26:17 bfp53 [21805.392127] 02ab9de0: 0306a260 02ab9eec 00e00fb8 02ab9e10 02ab9e94 000021f0 00dda938 02ab9e10
> Mar 5 02:26:17 bfp53 [21805.401446] 02ab9e00:<000fdc24> 00bdcd40 04a84a34 00000000 <000fdc5c> 003cf20e 00000000 00000001
> Mar 5 02:26:17 bfp53 [21805.410791] 02ab9e20: ffffffff 0306a260 00000000 00000000 00000000 00000000 03bde680 00000000
> Mar 5 02:26:17 bfp53 [21805.420082] 02ab9e40: 00000000 001f8c3c 03bde680 00020280 02ab9e50 02ab9e50 04a84a34 00000000
> Mar 5 02:26:17 bfp53 [21805.429405] 02ab9e60:<00013d28> 001f18fc 02ab8000 001f1888 000021f0 02ab9e94 <00013d68> 001f18fc
> Mar 5 02:26:17 bfp53 [21805.438755] 02ab9e80: 00000000 00000000 02ab9ea4 <00008f04> 001f63f4 02b00004 000021f0 02ab9ec4
> Mar 5 02:26:17 bfp53 [21805.448041] 02ab9ea0:<00043514> 0306a260 00000004 02ab9eec 02b00004 00000000 00000000 02ab9ee0
> Mar 5 02:26:17 bfp53 [21805.457364] 02ab9ec0: 02ab9eec 02ab9ef8 <000437f8> 0306a260 02b00004 000021f0 02b00004 00000141
> Mar 5 02:26:17 bfp53 [21805.466714] 02ab9ee0: 00000000 ffffe000 02ab9eec 04a84a34 00000000 00000001 00000000 ffa008d0
> Mar 5 02:26:17 bfp53 [21805.475999] 02ab9f00: 000437c4 00000000 ffffe000 030cf8dc 00e09388 000005a8 00000000 00dda938
> Mar 5 02:26:17 bfp53 [21805.485322] 02ab9f20: 02b00004 003cf20e 00008000 00000000 00000000 02aba000 003cf20e 003cf20e
> Mar 5 02:26:17 bfp53 [21805.494675] 02ab9f40:<00dcdb62> ffa00e46 02001025 003de2bb 003e03f1 003de2ba 003e03ee 00000000
> Mar 5 02:26:17 bfp53 [21805.503958] 02ab9f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> Mar 5 02:26:17 bfp53 [21805.513281] 02ab9f80: 00000000 00000000 00000000 00000000 00000000 039a0510 039a0508 0360af3c
> Mar 5 02:26:17 bfp53 [21805.522631] 02ab9fa0: 00000020 00000000 00000000 00000000 003f7838 02ab3c7c 02ab3c88 00df09b0
> Mar 5 02:26:17 bfp53 [21805.531916] 02ab9fc0: 00000017 00e00fb8 00df13f8 003cf204 00000004 00000000 000021f0 02b00004
> Mar 5 02:26:17 bfp53 [21805.541236] 02ab9fe0: 00dda938 00000000 000021f0 02b00004 00000017 00000017 00000004 00000006
> Mar 5 02:26:17 bfp53 [21805.550731] Return addresses in stack:
> Mar 5 02:26:17 bfp53 [21805.554584] frame 1 : <0x0003761c> { ___alloc_pages_internal + 0x1d0 }
> Mar 5 02:26:17 bfp53 [21805.561534] frame 2 : <0x00033300> { _grab_cache_page_write_begin + 0x50 }
> Mar 5 02:26:17 bfp53 [21805.584838] address : <0x02f4e3dc> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.592680] address : <0x02f4e344> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.599028] address : <0x00a8f76e> { :ext4:_ext4_journal_start_sb + 0x26 }
> Mar 5 02:26:17 bfp53 [21805.606329] frame 3 : <0x00a8892c> { :ext4:_ext4_da_write_begin + 0xac }
> Mar 5 02:26:17 bfp53 [21805.613392] frame 4 : <0x00032b08> { _generic_perform_write + 0x98 }
> Mar 5 02:26:17 bfp53 [21805.622262] address : <0x02f4e3dc> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.630250] address : <0x02f4e3dc> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.636537] frame 5 : <0x00033a6c> { _generic_file_buffered_write + 0x50 }
> Mar 5 02:26:17 bfp53 [21805.645740] address : <0x02f4e3dc> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.653708] address : <0x02f4e344> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.659928] frame 6 : <0x00033e8a> { ___generic_file_aio_write_nolock + 0x1ba }
> Mar 5 02:26:17 bfp53 [21805.669659] address : <0x02f4e344> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.675988] address : <0x000fda3a> { _sock_recvmsg + 0x96 }
> Mar 5 02:26:17 bfp53 [21805.684148] address : <0x02ab9d70> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.692186] address : <0x02f4e3dc> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.700105] address : <0x02ab9da0> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.706305] address : <0x0003440a> { _generic_file_aio_write + 0x52 }
> Mar 5 02:26:17 bfp53 [21805.715083] address : <0x02f4e344> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.721392] address : <0x0001410e> { _irq_enter + 0xa }
> Mar 5 02:26:17 bfp53 [21805.728914] address : <0x02f4e3dc> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.735283] address : <0x00a81c28> { :ext4:_ext4_file_write + 0x40 }
> Mar 5 02:26:17 bfp53 [21805.744015] address : <0x02f4e344> /* kernel dynamic memory */
> Mar 5 02:26:17 bfp53 [21805.750296] address : <0x00042f34> { _do_sync_write + 0x90 }
> Mar 5 02:26:17 bfp53 [21805.756313] address : <0x000fdc24> { _sys_recvfrom + 0x64 }
> Mar 5 02:26:17 bfp53 [21805.762262] address : <0x000fdc5c> { _sys_recvfrom + 0x9c }
> Mar 5 02:26:17 bfp53 [21805.768263] address : <0x00013d28> { ___do_softirq + 0x5c }
> Mar 5 02:26:17 bfp53 [21805.774254] address : <0x00013d68> { ___do_softirq + 0x9c }
> Mar 5 02:26:17 bfp53 [21805.780216] address : <0x00008f04> { _bfin_handle_irq + 0x1c }
> Mar 5 02:26:17 bfp53 [21805.786449] address : <0x00043514> { _vfs_write + 0x60 }
> Mar 5 02:26:17 bfp53 [21805.792159] address : <0x000437f8> { _sys_write + 0x34 }
> Mar 5 02:26:17 bfp53 [21805.800542] address : <0x00dcdb62> [ /opt/nge/lib/libngus.so.0.0.0 + 0xdb62 ]
> Mar 5 02:26:17 bfp53 [21805.808171] Mem-Info:
> Mar 5 02:26:17 bfp53 [21805.810372] DMA per-cpu:
> Mar 5 02:26:17 bfp53 [21805.812915] CPU 0: hi: 18, btch: 3 usd: 15
> Mar 5 02:26:17 bfp53 [21805.817749] Active_anon:0 active_file:0 inactive_anon:0
> Mar 5 02:26:17 bfp53 [21805.817768] inactive_file:620 dirty:565 writeback:0 unstable:0
> Mar 5 02:26:17 bfp53 [21805.817786] free:7162 slab:1112 mapped:0 pagetables:0 bounce:0
> Mar 5 02:26:17 bfp53 [21805.835000] DMA free:28648kB min:984kB low:1228kB high:1476kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:2480kB present:60956kB pages_scanned:0 all_unreclaimable? no
> Mar 5 02:26:17 bfp53 [21805.851771] lowmem_reserve[]: 0 0 0
> Mar 5 02:26:17 bfp53 [21805.855236] DMA: 318*4kB 396*8kB 299*16kB 205*32kB 115*64kB 29*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 28648kB
> Mar 5 02:26:17 bfp53 [21805.868869] 620 total pagecache pages
> Mar 5 02:26:17 bfp53 [21805.876098] 15359 pages RAM
> Mar 5 02:26:17 bfp53 [21805.878803] 673 pages reserved
> Mar 5 02:26:17 bfp53 [21805.881839] 589 pages shared
> Mar 5 02:26:17 bfp53 [21805.884771] 6850 pages non-shared
> Mar 5 02:26:17 bfp53 [21805.888129] ext4_da_write_begin: retrying grab_cache_page_write_begin()
I will attach my workaround patch for reference, too.
Can anybody explain this behaviour and maybe direct me to the root cause?
Of course, this now looks more like a page allocation problem than an
ext4 one.
Thanks,
Enrik
View attachment "ext4-oom-endless-loop-workaround.diff" of type "text/x-diff" (2510 bytes)
Powered by blists - more mailing lists