linux-ext4 - Re: v4.19-rc2: ext4 direct I/O recursive locking complaint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e8b36186-51b3-7688-eb4d-43f36e3f1045@acm.org>
Date:   Mon, 17 Sep 2018 10:27:22 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: v4.19-rc2: ext4 direct I/O recursive locking complaint

On 9/15/18 3:44 PM, Theodore Y. Ts'o wrote:
> Do you know which test triggered the lockdep failure?

Hi Ted,

This morning when I reran two specific blktests as follows:
./check -q srp/002
./check -q srp/003

a different complaint appeared:

======================================================
WARNING: possible circular locking dependency detected
4.19.0-rc2-dbg+ #1 Not tainted
------------------------------------------------------
kworker/1:2/189 is trying to acquire lock:
00000000fcf597b9 (&sb->s_type->i_mutex_key#16){+.+.}, at: 
__generic_file_fsync+0x6a/0xf0

but task is already holding lock:
000000008a4b0074 ((work_completion)(&dio->complete_work)){+.+.}, at: 
process_one_work+0x3f6/0xa70

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(&dio->complete_work)){+.+.}:
        worker_thread+0x65/0x5a0
        kthread+0x1c5/0x1f0
        ret_from_fork+0x24/0x30

-> #1 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
        drain_workqueue+0xe5/0x210
        destroy_workqueue+0x1c/0x320
        sb_init_dio_done_wq+0x65/0x80
        do_blockdev_direct_IO+0x3b8/0x3d00
        ext4_direct_IO+0x5d9/0xbd0
        generic_file_direct_write+0x10f/0x210
        __generic_file_write_iter+0x117/0x2a0
        ext4_file_write_iter+0x394/0x670
        aio_write+0x235/0x330
        io_submit_one+0x510/0xeb0
        __x64_sys_io_submit+0x122/0x340
        do_syscall_64+0x6c/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&sb->s_type->i_mutex_key#16){+.+.}:
        down_write+0x39/0x70
        __generic_file_fsync+0x6a/0xf0
        ext4_sync_file+0x3c4/0x770
        dio_complete+0x2f6/0x360
        process_one_work+0x4bc/0xa70
        worker_thread+0x65/0x5a0
        kthread+0x1c5/0x1f0
        ret_from_fork+0x24/0x30

other info that might help us debug this:

Chain exists of:
   &sb->s_type->i_mutex_key#16 --> (wq_completion)"dio/%s"sb->s_id --> 
(work_completion)(&dio->complete_work)

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock((work_completion)(&dio->complete_work));
                                lock((wq_completion)"dio/%s"sb->s_id);
 
lock((work_completion)(&dio->complete_work));
   lock(&sb->s_type->i_mutex_key#16);

  *** DEADLOCK ***

2 locks held by kworker/1:2/189:
  #0: 00000000ae279ea2 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: 
process_one_work+0x3f6/0xa70
  #1: 000000008a4b0074 ((work_completion)(&dio->complete_work)){+.+.}, 
at: process_one_work+0x3f6/0xa70

stack backtrace:
CPU: 1 PID: 189 Comm: kworker/1:2 Not tainted 4.19.0-rc2-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 
04/01/2014
Workqueue: dio/dm-0 dio_aio_complete_work
Call Trace:
  dump_stack+0x7c/0xbb
  print_circular_bug.isra.32+0x1fc/0x209
  __lock_acquire+0x1a54/0x1b20
  lock_acquire+0xbd/0x200
  down_write+0x39/0x70
  __generic_file_fsync+0x6a/0xf0
  ext4_sync_file+0x3c4/0x770
  dio_complete+0x2f6/0x360
  process_one_work+0x4bc/0xa70
  worker_thread+0x65/0x5a0
  kthread+0x1c5/0x1f0
  ret_from_fork+0x24/0x30

Bart.