lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220218004124.GC20620@X58A-UD3R>
Date:   Fri, 18 Feb 2022 09:41:24 +0900
From:   Byungchul Park <byungchul.park@....com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     torvalds@...ux-foundation.org, damien.lemoal@...nsource.wdc.com,
        linux-ide@...r.kernel.org, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        will@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
        joel@...lfernandes.org, sashal@...nel.org, daniel.vetter@...ll.ch,
        chris@...is-wilson.co.uk, duyuyang@...il.com,
        johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
        david@...morbit.com, amir73il@...il.com, bfields@...ldses.org,
        gregkh@...uxfoundation.org, kernel-team@....com,
        linux-mm@...ck.org, akpm@...ux-foundation.org, mhocko@...nel.org,
        minchan@...nel.org, hannes@...xchg.org, vdavydov.dev@...il.com,
        sj@...nel.org, jglisse@...hat.com, dennis@...nel.org, cl@...ux.com,
        penberg@...nel.org, rientjes@...gle.com, vbabka@...e.cz,
        ngupta@...are.org, linux-block@...r.kernel.org, axboe@...nel.dk,
        paolo.valente@...aro.org, josef@...icpanda.com,
        linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk,
        jack@...e.cz, jack@...e.com, jlayton@...nel.org,
        dan.j.williams@...el.com, hch@...radead.org, djwong@...nel.org,
        dri-devel@...ts.freedesktop.org, airlied@...ux.ie,
        rodrigosiqueiramelo@...il.com, melissa.srw@...il.com,
        hamohammed.sa@...il.com
Subject: Re: Report 1 in ext4 and journal based on v5.17-rc1

On Thu, Feb 17, 2022 at 01:27:42PM +0000, Matthew Wilcox wrote:
> On Thu, Feb 17, 2022 at 08:10:03PM +0900, Byungchul Park wrote:
> > [    7.009608] ===================================================
> > [    7.009613] DEPT: Circular dependency has been detected.
> > [    7.009614] 5.17.0-rc1-00014-g8a599299c0cb-dirty #30 Tainted: G        W
> > [    7.009616] ---------------------------------------------------
> > [    7.009617] summary
> > [    7.009618] ---------------------------------------------------
> > [    7.009618] *** DEADLOCK ***
> > [    7.009618]
> > [    7.009619] context A
> > [    7.009619]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> 
> Why is the context unknown here?  I don't see a way to debug this
> without knowing where we acquired the bit wait lock.

ideal view
------------------

context X			context A

request event E to context A	...
   write REQUESTEVENT		if (can see REQUESTEVENT written)
...				   notice the request from X [S]
wait for the event [W]		...
   write barrier
   write WAITSTART		
   actual wait			if (can see REQUESTEVENT written)
				   consider it's on the way to the event
...				
				...
				finally the event [E]

Dept works with the above view wrt. wait and event. [S] point varies
depending on ways to make context A notice the request so that the event
context A can start. Of course, by making more effort on identifying
each way in the kernel, we can figure out the exact point.

Here, we can also check WAITSTART with a read barrier instead of
REQUESTEVENT in context A conservatively. That's how Dept works.

Dept's view
------------------

context X			context A

request event E to context A	...
   write REQUESTEVENT		if (can see REQUESTEVENT written)
...				   notice the request from X [S]
wait for the event [W]		...
   write barrier
   write WAITSTART		read barrier
   actual wait			if (can see WAITSTART written)
				   consider it's on the way to the event
...				
				...
				finally the event [E]
				   consider all waits in context A so far
				   that could see WAITSTART written

Thanks,
Byungchul

> > [    7.009621]     [W] down_write(&ei->i_data_sem:0)
> > [    7.009623]     [E] event(&(bit_wait_table + i)->dmap:0)
> > [    7.009624]
> > [    7.009625] context B
> > [    7.009625]     [S] down_read(&ei->i_data_sem:0)
> > [    7.009626]     [W] wait(&(bit_wait_table + i)->dmap:0)
> > [    7.009627]     [E] up_read(&ei->i_data_sem:0)
> > [    7.009628]
> > [    7.009629] [S]: start of the event context
> > [    7.009629] [W]: the wait blocked
> > [    7.009630] [E]: the event not reachable
> > [    7.009631] ---------------------------------------------------
> > [    7.009631] context A's detail
> > [    7.009632] ---------------------------------------------------
> > [    7.009632] context A
> > [    7.009633]     [S] (unknown)(&(bit_wait_table + i)->dmap:0)
> > [    7.009634]     [W] down_write(&ei->i_data_sem:0)
> > [    7.009635]     [E] event(&(bit_wait_table + i)->dmap:0)
> > [    7.009636]
> > [    7.009636] [S] (unknown)(&(bit_wait_table + i)->dmap:0):
> > [    7.009638] (N/A)
> > [    7.009638]
> > [    7.009639] [W] down_write(&ei->i_data_sem:0):
> > [    7.009639] ext4_truncate (fs/ext4/inode.c:4187) 
> > [    7.009645] stacktrace:
> > [    7.009646] down_write (kernel/locking/rwsem.c:1514) 
> > [    7.009648] ext4_truncate (fs/ext4/inode.c:4187) 
> > [    7.009650] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    7.009652] generic_perform_write (mm/filemap.c:3784) 
> > [    7.009654] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    7.009657] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    7.009659] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    7.009662] vfs_write (fs/read_write.c:590) 
> > [    7.009663] ksys_write (fs/read_write.c:644) 
> > [    7.009664] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009667] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009669]
> > [    7.009670] [E] event(&(bit_wait_table + i)->dmap:0):
> > [    7.009671] __wake_up_common (kernel/sched/wait.c:108) 
> > [    7.009673] stacktrace:
> > [    7.009674] dept_event (kernel/dependency/dept.c:2337) 
> > [    7.009677] __wake_up_common (kernel/sched/wait.c:109) 
> > [    7.009678] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    7.009679] __wake_up_bit (kernel/sched/wait_bit.c:127) 
> > [    7.009681] ext4_orphan_del (fs/ext4/orphan.c:282) 
> > [    7.009683] ext4_truncate (fs/ext4/inode.c:4212) 
> > [    7.009685] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    7.009687] generic_perform_write (mm/filemap.c:3784) 
> > [    7.009688] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    7.009690] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    7.009692] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    7.009694] vfs_write (fs/read_write.c:590) 
> > [    7.009695] ksys_write (fs/read_write.c:644) 
> > [    7.009696] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009700] ---------------------------------------------------
> > [    7.009700] context B's detail
> > [    7.009701] ---------------------------------------------------
> > [    7.009702] context B
> > [    7.009702]     [S] down_read(&ei->i_data_sem:0)
> > [    7.009703]     [W] wait(&(bit_wait_table + i)->dmap:0)
> > [    7.009704]     [E] up_read(&ei->i_data_sem:0)
> > [    7.009705]
> > [    7.009706] [S] down_read(&ei->i_data_sem:0):
> > [    7.009707] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
> > [    7.009709] stacktrace:
> > [    7.009709] down_read (kernel/locking/rwsem.c:1461) 
> > [    7.009711] ext4_map_blocks (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 fs/ext4/ext4.h:1918 fs/ext4/inode.c:562) 
> > [    7.009712] ext4_getblk (fs/ext4/inode.c:851) 
> > [    7.009714] ext4_bread (fs/ext4/inode.c:903) 
> > [    7.009715] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> > [    7.009718] dx_probe (fs/ext4/namei.c:789) 
> > [    7.009720] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> > [    7.009722] __ext4_find_entry (fs/ext4/namei.c:1571) 
> > [    7.009723] ext4_lookup (fs/ext4/namei.c:1770) 
> > [    7.009725] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
> > [    7.009727] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
> > [    7.009729] do_filp_open (fs/namei.c:3637) 
> > [    7.009731] do_sys_openat2 (fs/open.c:1215) 
> > [    7.009732] do_sys_open (fs/open.c:1231) 
> > [    7.009734] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009736] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009738]
> > [    7.009738] [W] wait(&(bit_wait_table + i)->dmap:0):
> > [    7.009739] prepare_to_wait (kernel/sched/wait.c:275) 
> > [    7.009741] stacktrace:
> > [    7.009741] __schedule (kernel/sched/sched.h:1318 kernel/sched/sched.h:1616 kernel/sched/core.c:6213) 
> > [    7.009743] schedule (kernel/sched/core.c:6373 (discriminator 1)) 
> > [    7.009744] io_schedule (./arch/x86/include/asm/current.h:15 kernel/sched/core.c:8392 kernel/sched/core.c:8418) 
> > [    7.009745] bit_wait_io (./arch/x86/include/asm/current.h:15 kernel/sched/wait_bit.c:210) 
> > [    7.009746] __wait_on_bit (kernel/sched/wait_bit.c:49) 
> > [    7.009748] out_of_line_wait_on_bit (kernel/sched/wait_bit.c:65) 
> > [    7.009749] ext4_read_bh (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 ./include/linux/buffer_head.h:120 fs/ext4/super.c:201) 
> > [    7.009752] __read_extent_tree_block (fs/ext4/extents.c:545) 
> > [    7.009754] ext4_find_extent (fs/ext4/extents.c:928) 
> > [    7.009756] ext4_ext_map_blocks (fs/ext4/extents.c:4099) 
> > [    7.009757] ext4_map_blocks (fs/ext4/inode.c:563) 
> > [    7.009759] ext4_getblk (fs/ext4/inode.c:851) 
> > [    7.009760] ext4_bread (fs/ext4/inode.c:903) 
> > [    7.009762] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> > [    7.009764] dx_probe (fs/ext4/namei.c:789) 
> > [    7.009765] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> > [    7.009767]
> > [    7.009768] [E] up_read(&ei->i_data_sem:0):
> > [    7.009769] ext4_map_blocks (fs/ext4/inode.c:593) 
> > [    7.009771] stacktrace:
> > [    7.009771] up_read (kernel/locking/rwsem.c:1556) 
> > [    7.009774] ext4_map_blocks (fs/ext4/inode.c:593) 
> > [    7.009775] ext4_getblk (fs/ext4/inode.c:851) 
> > [    7.009777] ext4_bread (fs/ext4/inode.c:903) 
> > [    7.009778] __ext4_read_dirblock (fs/ext4/namei.c:117) 
> > [    7.009780] dx_probe (fs/ext4/namei.c:789) 
> > [    7.009782] ext4_dx_find_entry (fs/ext4/namei.c:1721) 
> > [    7.009784] __ext4_find_entry (fs/ext4/namei.c:1571) 
> > [    7.009786] ext4_lookup (fs/ext4/namei.c:1770) 
> > [    7.009788] lookup_open (./include/linux/dcache.h:361 fs/namei.c:3310) 
> > [    7.009789] path_openat (fs/namei.c:3401 fs/namei.c:3605) 
> > [    7.009791] do_filp_open (fs/namei.c:3637) 
> > [    7.009792] do_sys_openat2 (fs/open.c:1215) 
> > [    7.009794] do_sys_open (fs/open.c:1231) 
> > [    7.009795] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009797] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009799] ---------------------------------------------------
> > [    7.009800] information that might be helpful
> > [    7.009800] ---------------------------------------------------
> > [    7.009801] CPU: 0 PID: 611 Comm: rs:main Q:Reg Tainted: G        W         5.17.0-rc1-00014-g8a599299c0cb-dirty #30
> > [    7.009804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [    7.009805] Call Trace:
> > [    7.009806]  <TASK>
> > [    7.009807] dump_stack_lvl (lib/dump_stack.c:107) 
> > [    7.009809] print_circle (./arch/x86/include/asm/atomic.h:108 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:157 kernel/dependency/dept.c:762) 
> > [    7.009812] ? print_circle (kernel/dependency/dept.c:1086) 
> > [    7.009814] cb_check_dl (kernel/dependency/dept.c:1104) 
> > [    7.009815] bfs (kernel/dependency/dept.c:860) 
> > [    7.009818] add_dep (kernel/dependency/dept.c:1423) 
> > [    7.009820] do_event.isra.25 (kernel/dependency/dept.c:1650) 
> > [    7.009822] ? __wake_up_common (kernel/sched/wait.c:108) 
> > [    7.009824] dept_event (kernel/dependency/dept.c:2337) 
> > [    7.009826] __wake_up_common (kernel/sched/wait.c:109) 
> > [    7.009828] __wake_up_common_lock (./include/linux/spinlock.h:428 (discriminator 1) kernel/sched/wait.c:141 (discriminator 1)) 
> > [    7.009830] __wake_up_bit (kernel/sched/wait_bit.c:127) 
> > [    7.009832] ext4_orphan_del (fs/ext4/orphan.c:282) 
> > [    7.009835] ? dept_ecxt_exit (./arch/x86/include/asm/current.h:15 kernel/dependency/dept.c:241 kernel/dependency/dept.c:999 kernel/dependency/dept.c:1043 kernel/dependency/dept.c:2478) 
> > [    7.009837] ext4_truncate (fs/ext4/inode.c:4212) 
> > [    7.009839] ext4_da_write_begin (./include/linux/fs.h:827 fs/ext4/truncate.h:23 fs/ext4/inode.c:2963) 
> > [    7.009842] generic_perform_write (mm/filemap.c:3784) 
> > [    7.009845] ext4_buffered_write_iter (fs/ext4/file.c:269) 
> > [    7.009848] ext4_file_write_iter (fs/ext4/file.c:677) 
> > [    7.009851] new_sync_write (fs/read_write.c:504 (discriminator 1)) 
> > [    7.009854] vfs_write (fs/read_write.c:590) 
> > [    7.009856] ksys_write (fs/read_write.c:644) 
> > [    7.009857] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:65) 
> > [    7.009860] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) 
> > [    7.009862] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
> > [    7.009865] RIP: 0033:0x7f3b160b335d
> > [ 7.009867] Code: e1 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce fa ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 17 fb ff ff 48 89 d0 48 83 c4 08 48 3d 01
> > All code
> > ========
> >    0:	e1 20                	loope  0x22
> >    2:	00 00                	add    %al,(%rax)
> >    4:	75 10                	jne    0x16
> >    6:	b8 01 00 00 00       	mov    $0x1,%eax
> >    b:	0f 05                	syscall 
> >    d:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
> >   13:	73 31                	jae    0x46
> >   15:	c3                   	retq   
> >   16:	48 83 ec 08          	sub    $0x8,%rsp
> >   1a:	e8 ce fa ff ff       	callq  0xfffffffffffffaed
> >   1f:	48 89 04 24          	mov    %rax,(%rsp)
> >   23:	b8 01 00 00 00       	mov    $0x1,%eax
> >   28:	0f 05                	syscall 
> >   2a:*	48 8b 3c 24          	mov    (%rsp),%rdi		<-- trapping instruction
> >   2e:	48 89 c2             	mov    %rax,%rdx
> >   31:	e8 17 fb ff ff       	callq  0xfffffffffffffb4d
> >   36:	48 89 d0             	mov    %rdx,%rax
> >   39:	48 83 c4 08          	add    $0x8,%rsp
> >   3d:	48                   	rex.W
> >   3e:	3d                   	.byte 0x3d
> >   3f:	01                   	.byte 0x1
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >    0:	48 8b 3c 24          	mov    (%rsp),%rdi
> >    4:	48 89 c2             	mov    %rax,%rdx
> >    7:	e8 17 fb ff ff       	callq  0xfffffffffffffb23
> >    c:	48 89 d0             	mov    %rdx,%rax
> >    f:	48 83 c4 08          	add    $0x8,%rsp
> >   13:	48                   	rex.W
> >   14:	3d                   	.byte 0x3d
> >   15:	01                   	.byte 0x1
> > [    7.009869] RSP: 002b:00007f3b1340f180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > [    7.009871] RAX: ffffffffffffffda RBX: 00007f3b040010a0 RCX: 00007f3b160b335d
> > [    7.009873] RDX: 0000000000000300 RSI: 00007f3b040010a0 RDI: 0000000000000001
> > [    7.009874] RBP: 0000000000000000 R08: fffffffffffffa15 R09: fffffffffffffa05
> > [    7.009875] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f3b04000df0
> > [    7.009876] R13: 00007f3b1340f1a0 R14: 0000000000000220 R15: 0000000000000300
> > [    7.009879]  </TASK>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ