lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Mar 2010 16:11:45 +0530
From:	Kailas Joshi <kailas.joshi@...il.com>
To:	Jan Kara <jack@...e.cz>
Cc:	tytso@....edu, linux-ext4@...r.kernel.org,
	Jiaying Zhang <jiayingz@...gle.com>
Subject: Re: Help on Implementation of EXT3 type Ordered Mode in EXT4

Thanks Jan. This has definitely given me some pointers to work upon.

I have Lock Debugging enables but that didn't give any warnings.
However, when I did echo "w" >/proc/sysrq-trigger after system lockup,
I got the stack trace for locked up process.

Following are the stack traces of the processes (I suspect) resulting
in total system lockup -
-----------------------------------------------------------------------------------------------------------------------------------------
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073] jbd2/sdb1-8   D
00000046     0  5913      2 0x00000000
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  c4473b90
00000046 00000001 00000046 ce9e4a00 00000000 c4473b70 00000046
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  00000000
c061de60 c061de60 c061de60 c01513bd 00000000 ce9e4a94 ce9e4a00
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  ce9e4b94
c1407e60 c01513ed 00000001 c13430bc 00000296 cfac9278 c4473b90
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073] Call Trace:
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01513bd>] ?
prepare_to_wait_exclusive+0x1d/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01513ed>] ?
prepare_to_wait_exclusive+0x4d/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c04004f5>]
io_schedule+0x35/0x50
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6f75>]
sync_page+0x35/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0400820>]
__wait_on_bit_lock+0x40/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6f40>] ?
sync_page+0x0/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6f1d>]
__lock_page+0x4d/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151250>] ?
wake_bit_function+0x0/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01ad987>]
write_cache_pages+0x437/0x5d0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0237930>] ?
__mpage_da_writepage+0x0/0x170
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01ad310>] ?
mapping_tagged+0x0/0x70
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01ad310>] ?
mapping_tagged+0x0/0x70
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c02387ec>]
ext4_da_writepages+0x2ec/0x7a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c02c514a>] ?
number+0x25a/0x270
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0328ada>] ?
vt_console_print+0x1da/0x2a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c040238d>] ?
_spin_unlock+0x1d/0x20
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0328ada>] ?
vt_console_print+0x1da/0x2a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0155aeb>] ?
up+0x2b/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01375e7>] ?
release_console_sem+0x197/0x1d0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0238500>] ?
ext4_da_writepages+0x0/0x7a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01adb6d>]
do_writepages+0x1d/0x30
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a76d6>]
__filemap_fdatawrite_range+0x66/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a81e6>]
filemap_fdatawrite+0x26/0x30
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a821c>]
filemap_write_and_wait+0x2c/0x50
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c023228a>]
ext4_sync_alloc_da_blocks+0x5a/0x90
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0244c0c>]
alloc_on_commit_callback+0x6c/0xc0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c02693a5>]
jbd2_journal_commit_transaction+0x335/0x1ae0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c012c10c>] ?
finish_task_switch+0x6c/0xe0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0143225>] ?
lock_timer_base+0x25/0x50
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c04025ad>] ?
_spin_lock_irqsave+0x4d/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0143287>] ?
try_to_del_timer_sync+0x37/0xb0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c014336a>] ?
del_timer_sync+0x6a/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0143300>] ?
del_timer_sync+0x0/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c02703d6>]
kjournald2+0xb6/0x380
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151210>] ?
autoremove_wake_function+0x0/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0270320>] ?
kjournald2+0x0/0x380
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151144>]
kthread+0x74/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01510d0>] ?
kthread+0x0/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0103a07>]
kernel_thread_helper+0x7/0x10
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073] flush-8:16    D
c47f1cc0     0  5916      2 0x00000000
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  c47f1cd4
00000046 00000002 c47f1cc0 c14031c4 00000000 c47f1cb4 00000046
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  00000000
c061de60 c061de60 c061de60 c06191c4 c1407e70 ce9e6f94 ce9e6f00
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  ce9e7094
c1407e60 000095a3 00000000 00000000 00000000 00000000 00000000
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073] Call Trace:
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c04004f5>]
io_schedule+0x35/0x50
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6f75>]
sync_page+0x35/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0400820>]
__wait_on_bit_lock+0x40/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6f40>] ?
sync_page+0x0/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6f1d>]
__lock_page+0x4d/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151250>] ?
wake_bit_function+0x0/0x60
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0238c06>]
ext4_da_writepages+0x706/0x7a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a6c30>] ?
find_get_pages_tag+0x0/0x120
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0162127>] ?
lock_release_non_nested+0x187/0x2b0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f51d5>] ?
writeback_inodes_wb+0x245/0x3b0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f4925>] ?
writeback_single_inode+0x95/0x260
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f51d5>] ?
writeback_inodes_wb+0x245/0x3b0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f4925>] ?
writeback_single_inode+0x95/0x260
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0238500>] ?
ext4_da_writepages+0x0/0x7a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01adb6d>]
do_writepages+0x1d/0x30
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f4930>]
writeback_single_inode+0xa0/0x260
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f5216>]
writeback_inodes_wb+0x286/0x3b0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f543f>]
wb_writeback+0xff/0x1a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f5606>] ?
wb_do_writeback+0x86/0x1e0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f573b>]
wb_do_writeback+0x1bb/0x1e0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f55a2>] ?
wb_do_writeback+0x22/0x1e0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01f5792>]
bdi_writeback_task+0x32/0xa0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01bca5e>]
bdi_start_fn+0x5e/0xb0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01bca00>] ?
bdi_start_fn+0x0/0xb0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151144>]
kthread+0x74/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01510d0>] ?
kthread+0x0/0x80
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0103a07>]
kernel_thread_helper+0x7/0x10
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073] write_test    D
c01272a0     0  5966      1 0x00000005
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  c2149c50
00200046 00200046 c01272a0 00000001 c4f09910 00200296 c4f09910
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  c2149c14
c061de60 c061de60 c061de60 c2149c34 c01272a0 ce9e2594 ce9e2500
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  ce9e2694
c1607e60 00200246 c0267d9c 00000001 c4f09814 c4f09800 c4f09814
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073] Call Trace:
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01272a0>] ?
__wake_up+0x40/0x50
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01272a0>] ?
__wake_up+0x40/0x50
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0267d9c>] ?
start_this_handle+0x36c/0x580
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0267da1>]
start_this_handle+0x371/0x580
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0160bac>] ?
lockdep_init_map+0x3c/0x500
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151210>] ?
autoremove_wake_function+0x0/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0268055>]
jbd2_journal_start+0xa5/0xd0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0248663>]
ext4_journal_start_sb+0x53/0xa0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0245db3>] ?
__ext4_journal_stop+0x43/0x70
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0238ef4>]
ext4_da_write_begin+0x254/0x3a0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0239470>] ?
ext4_da_get_block_prep+0x0/0x360
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a785e>]
generic_file_buffered_write+0xde/0x260
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a7e36>]
__generic_file_aio_write+0x276/0x510
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0400e04>] ?
mutex_lock_nested+0x1e4/0x270
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01a8128>]
generic_file_aio_write+0x58/0xc0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c022fa0f>]
ext4_file_write+0x3f/0xd0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0147ed5>] ?
ptrace_stop+0xa5/0xf0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01d85bd>]
do_sync_write+0xcd/0x110
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c014840a>] ?
ptrace_notify+0x9a/0xb0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0151210>] ?
autoremove_wake_function+0x0/0x40
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c028291f>] ?
security_file_permission+0xf/0x20
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01d877c>] ?
rw_verify_area+0x6c/0xe0
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01d8de6>]
vfs_write+0x96/0x190
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01d84f0>] ?
do_sync_write+0x0/0x110
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c01d950d>]
sys_write+0x3d/0x70
Mar 23 15:24:40 kailas-desktop kernel: [  532.237073]  [<c0102f55>]
syscall_call+0x7/0xb
-----------------------------------------------------------------------------------------------------------------------------------------
I am also attaching complete trace for your reference.

write_test is my testing executable that performs many sequential
writes to a file.
This stack trace confirms that lock_page() in write_cache_pages() is
resulting into this lockup.

I have few questions here.
I guess process named jbd2/sdb1-8 is kjournald thread. But what is
flush-8:16 process? Is it the kernel thread for periodically writing
dirty pages to disk?

Is it the case that these threads are running concurrently at certain
time and are trying to get lock on same pages resulting into deadlock?
If yes, what can be the reason? Am I doing mistake in calling
filemap_write_and_wait() in journal_commit flow?


Please reply.

Thanks & Regards,
Kailas

On 22 March 2010 22:22, Jan Kara <jack@...e.cz> wrote:
>
>  Hi,
>
> On Fri 19-03-10 08:53:08, Kailas Joshi wrote:
> > I am facing some problems while implementing alloc_on_commit.
> > While performing exhaustive write operations(for example using postmark),
> > system locks up after some time.
> > It runs fine for (simple)non-exhaustive write operations.
> >
> > I am using filemap_write_and_wait() in journal commit callback for
> > performing synchronous block allocation. It uses special journal handle
> > which enables use of early reservations.
> > Is it right to use this function here? If no, is there any other alternative
> > that should be used in this scenario?
> >
> > I am using following strategy -
> > 1) ext4_da_get_block_prep() marks delayed-allocation buffers with BH_DA
> > after reserving space for them.
>  We have a BH_Delay flag for this already. OK, probably you need a
> temporary flag which you can clear in ext4_da_write_begin. I'd find
> counting number of BH_Delay buffers before and after block_write_begin
> call nicer...
>
> > 2) ext4_da_write_begin() counts the number of buffers marked with BH_DA and
> > reserves credits for block allocation.
> > 3) journal_stop() accumulates the unused credits of a handle in the
> > transaction.
> > 4) journal_start() when called with nblocks=0, creates a special handle with
> > the credits accumulated by all previous handles(by step 2).
>  This is a hack. I'd rather create a separate JBD2 function for this.
>
> > 5) journal_commit() creates special handle for block allocation(as in step
> > 4) and calls filemap_write_and_wait() to perform block allocation.
> >
> > I am also sending the patch(for kernel 2.6.32.4) for my implementation (also
> > available at
> > http://www.cse.iitb.ac.in/~kailasjoshi/files/alloc_on_commit.patch).
> >
> > Being new to filesystem development, I am not able to identify the problem.
> > I will be very greatful if someone can help me out.
>  Probably you are hitting some lock inversion problem. I suggest you
> compile the kernel with lockdep enabled (in Kernel hacking -> Lock debugging
> -> Prove lock correctness or something like that) and see whether it issues
> some warnings. If not, you can get backtraces of the locked up processes
> by pressing Alt-Sysrq-w (or echo "w" >/proc/sysrq-trigger).
>
>                                                                        Honza
> --
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR

Download attachment "syslockup.log" of type "application/octet-stream" (29089 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ