lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 Jul 2015 00:05:56 -0400
From:	Ming Lei <ming.lei@...onical.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Michal Hocko <mhocko@...e.cz>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>, xfs@....sgi.com
Subject: Re: [regression 4.2-rc3] loop: xfstests xfs/073 deadlocked in low
 memory conditions

On Mon, Jul 20, 2015 at 9:59 PM, Dave Chinner <david@...morbit.com> wrote:
> Hi Ming,
>
> With the recent merge of the loop device changes, I'm now seeing
> XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
>
> The deadlocked is as follows:
>
> kloopd1: loop_queue_read_work
>         xfs_file_iter_read
>         lock XFS inode XFS_IOLOCK_SHARED (on image file)
>         page cache read (GFP_KERNEL)
>         radix tree alloc
>         memory reclaim
>         reclaim XFS inodes
>         log force to unpin inodes
>         <wait for log IO completion>
>
> xfs-cil/loop1: <does log force IO work>
>         xlog_cil_push
>         xlog_write
>         <loop issuing log writes>
>                 xlog_state_get_iclog_space()
>                 <blocks due to all log buffers under write io>
>                 <waits for IO completion>
>
> kloopd1: loop_queue_write_work
>         xfs_file_write_iter
>         lock XFS inode XFS_IOLOCK_EXCL (on image file)
>         <wait for inode to be unlocked>
>
> [The full stack traces are below].
>
> i.e. the kloopd, with it's split read and write work queues, has
> introduced a dependency through memory reclaim. i.e. that writes
> need to be able to progress for reads make progress.

This kind of change just makes READ vs READ OR WRITE submitted
to fs concurrently, and the use case should have been simulated from
user space on one regular XFS file too?

>
> The problem, fundamentally, is that mpage_readpages() does a
> GFP_KERNEL allocation, rather than paying attention to the inode's
> mapping gfp mask, which is set to GFP_NOFS.

That looks the root cause, and I guess the issue is just triggered
after commit aa4d86163e4(block: loop: switch to VFS ITER_BVEC)
which changes splice to bvec iterator.

>
> The didn't used to happen, because the loop device used to issue
> reads through the splice path and that does:
>
>         error = add_to_page_cache_lru(page, mapping, index,
>                         GFP_KERNEL & mapping_gfp_mask(mapping));
>
> i.e. it pays attention to the allocation context placed on the
> inode and so is doing GFP_NOFS allocations here and avoiding the
> recursion problem.
>
> [ CC'd Michal Hocko and the mm list because it's a clear exaple of
> why ignoring the mapping gfp mask on any page cache allocation is
> a landmine waiting to be tripped over. ]
>
> Cheers,
>
> Dave.
>
> [81248.855166] kworker/u3:0    D ffff88003a8fadc8 11304   930      2 0x00000000
> [81248.855166] Workqueue: kloopd1 loop_queue_read_work
> [81248.855166]  ffff88003a8fadc8 ffff880003e48000 ffff8800332fdd00 ffff88003a8fadb8
> [81248.855166]  ffff88003a8fc000 ffff88003a8faf18 ffff8800332fdd00 ffff88003a8faf10
> [81248.855166]  ffffffffffffffff ffff88003a8fade8 ffffffff81db608e ffff88003fc153c0
> [81248.855166] Call Trace:
> [81248.855166]  [<ffffffff81db608e>] schedule+0x3e/0x90
> [81248.855166]  [<ffffffff81dba38f>] schedule_timeout+0x1cf/0x240
> [81248.855166]  [<ffffffff810cce16>] ? try_to_wake_up+0x1f6/0x330
> [81248.855166]  [<ffffffff81db7534>] wait_for_completion+0xb4/0x120
> [81248.855166]  [<ffffffff810cd010>] ? wake_up_q+0x70/0x70
> [81248.855166]  [<ffffffff810bc2d9>] flush_work+0xf9/0x170
> [81248.855166]  [<ffffffff810ba0d0>] ? destroy_worker+0x90/0x90
> [81248.855166]  [<ffffffff81513a2c>] xlog_cil_force_lsn+0xcc/0x250
> [81248.855166]  [<ffffffff815126f2>] _xfs_log_force_lsn+0x82/0x2f0
> [81248.855166]  [<ffffffff8151298e>] xfs_log_force_lsn+0x2e/0xa0
> [81248.855166]  [<ffffffff81501e49>] ? xfs_iunpin_wait+0x19/0x20
> [81248.855166]  [<ffffffff814fe686>] __xfs_iunpin_wait+0xb6/0x170
> [81248.855166]  [<ffffffff810e0f90>] ? autoremove_wake_function+0x40/0x40
> [81248.855166]  [<ffffffff81501e49>] xfs_iunpin_wait+0x19/0x20
> [81248.855166]  [<ffffffff814f6df2>] xfs_reclaim_inode+0x72/0x350
> [81248.855166]  [<ffffffff814f72e2>] xfs_reclaim_inodes_ag+0x212/0x350
> [81248.855166]  [<ffffffff81dbb07e>] ? _raw_spin_unlock+0xe/0x20
> [81248.855166]  [<ffffffff81dbb07e>] ? _raw_spin_unlock+0xe/0x20
> [81248.855166]  [<ffffffff814f74b3>] xfs_reclaim_inodes_nr+0x33/0x40
> [81248.855166]  [<ffffffff81507f19>] xfs_fs_free_cached_objects+0x19/0x20
> [81248.855166]  [<ffffffff811cf791>] super_cache_scan+0x191/0x1a0
> [81248.855166]  [<ffffffff811912d4>] shrink_slab.part.62.constprop.80+0x1b4/0x380
> [81248.855166]  [<ffffffff81194720>] shrink_zone+0x90/0xa0
> [81248.855166]  [<ffffffff811948b0>] do_try_to_free_pages+0x180/0x2c0
> [81248.855166]  [<ffffffff81194aaa>] try_to_free_pages+0xba/0x160
> [81248.855166]  [<ffffffff81189509>] __alloc_pages_nodemask+0x499/0x840
> [81248.855166]  [<ffffffff811c3d2f>] new_slab+0x6f/0x2c0
> [81248.855166]  [<ffffffff811c5baa>] __slab_alloc.constprop.75+0x3fa/0x580
> [81248.855166]  [<ffffffff817be438>] ? __radix_tree_preload+0x48/0xc0
> [81248.855166]  [<ffffffff817be438>] ? __radix_tree_preload+0x48/0xc0
> [81248.855166]  [<ffffffff811c620e>] kmem_cache_alloc+0x12e/0x160
> [81248.855166]  [<ffffffff817be438>] __radix_tree_preload+0x48/0xc0
> [81248.855166]  [<ffffffff817be519>] radix_tree_maybe_preload+0x19/0x20
> [81248.855166]  [<ffffffff81181b69>] __add_to_page_cache_locked+0x39/0x1f0
> [81248.855166]  [<ffffffff81181d68>] add_to_page_cache_lru+0x28/0x80
> [81248.855166]  [<ffffffff81209fc9>] mpage_readpages+0xb9/0x130
> [81248.855166]  [<ffffffff814e68e0>] ? __xfs_get_blocks+0x840/0x840
> [81248.855166]  [<ffffffff814e68e0>] ? __xfs_get_blocks+0x840/0x840
> [81248.855166]  [<ffffffff814e409d>] xfs_vm_readpages+0x1d/0x20
> [81248.855166]  [<ffffffff8118cfc9>] __do_page_cache_readahead+0x1b9/0x250
> [81248.855166]  [<ffffffff8118d13f>] ondemand_readahead+0xdf/0x270
> [81248.855166]  [<ffffffff81180fc6>] ? find_get_entry+0x66/0xb0
> [81248.855166]  [<ffffffff8118d362>] page_cache_async_readahead+0x92/0xc0
> [81248.855166]  [<ffffffff81182c3c>] generic_file_read_iter+0x41c/0x650
> [81248.855166]  [<ffffffff81db9664>] ? down_read+0x24/0x40
> [81248.855166]  [<ffffffff814f1d66>] xfs_file_read_iter+0xe6/0x2d0
> [81248.855166]  [<ffffffff811cba22>] vfs_iter_read+0x62/0xa0
> [81248.855166]  [<ffffffff81a98fcd>] loop_handle_cmd.isra.28+0x6dd/0xaa0
> [81248.855166]  [<ffffffff810cce16>] ? try_to_wake_up+0x1f6/0x330
> [81248.855166]  [<ffffffff81a99472>] loop_queue_read_work+0x12/0x20
> [81248.855166]  [<ffffffff810bd02e>] process_one_work+0x14e/0x410
> [81248.855166]  [<ffffffff810bd63e>] worker_thread+0x4e/0x460
> [81248.855166]  [<ffffffff810bd5f0>] ? rescuer_thread+0x300/0x300
> [81248.855166]  [<ffffffff810c288c>] kthread+0xec/0x110
> [81248.855166]  [<ffffffff810c27a0>] ? kthread_create_on_node+0x1b0/0x1b0
> [81248.855166]  [<ffffffff81dbba5f>] ret_from_fork+0x3f/0x70
> [81248.855166]  [<ffffffff810c27a0>] ? kthread_create_on_node+0x1b0/0x1b0
>
> [81248.855166] kworker/0:0     D ffff88002ed8bb88 13896   948      2 0x00000000
> [81248.855166] Workqueue: xfs-cil/loop1 xlog_cil_push_work
> [81248.855166]  ffff88002ed8bb88 ffffffff82368540 ffff880003e4be00 ffff88001a467cf0
> [81248.855166]  ffff88002ed8c000 ffff880005123cf8 ffff88001a467cf0 ffff880003e4be00
> [81248.855166]  ffff880005123cc0 ffff88002ed8bba8 ffffffff81db608e ffff88002ed8bba8
> [81248.855166] Call Trace:
> [81248.855166]  [<ffffffff81db608e>] schedule+0x3e/0x90
> [81248.855166]  [<ffffffff81511266>] xlog_state_get_iclog_space+0xe6/0x330
> [81248.855166]  [<ffffffff810cd010>] ? wake_up_q+0x70/0x70
> [81248.855166]  [<ffffffff81511666>] xlog_write+0x1b6/0x870
> [81248.855166]  [<ffffffff8151316d>] xlog_cil_push+0x24d/0x400
> [81248.855166]  [<ffffffff81513335>] xlog_cil_push_work+0x15/0x20
> [81248.855166]  [<ffffffff810bd02e>] process_one_work+0x14e/0x410
> [81248.855166]  [<ffffffff810bd63e>] worker_thread+0x4e/0x460
> [81248.855166]  [<ffffffff810bd5f0>] ? rescuer_thread+0x300/0x300
> [81248.855166]  [<ffffffff810bd5f0>] ? rescuer_thread+0x300/0x300
> [81248.855166]  [<ffffffff810c288c>] kthread+0xec/0x110
> [81248.855166]  [<ffffffff810c27a0>] ? kthread_create_on_node+0x1b0/0x1b0
> [81248.855166]  [<ffffffff81dbba5f>] ret_from_fork+0x3f/0x70
> [81248.855166]  [<ffffffff810c27a0>] ? kthread_create_on_node+0x1b0/0x1b0
>
> [81248.855166] kworker/u3:4    D ffff8800290439f8 12456  1066      2 0x00000000
> [81248.855166] Workqueue: kloopd1 loop_queue_write_work
> [81248.855166]  ffff8800290439f8 ffffffff82368540 ffff8800332f9f00 ffff8800290439e8
> [81248.855166]  ffff880029044000 ffffffff00000000 ffff88001366cfd8 ffffffff00000002
> [81248.855166]  ffff8800332f9f00 ffff880029043a18 ffffffff81db608e ffff880029043a18
> [81248.855166] Call Trace:
> [81248.855166]  [<ffffffff81db608e>] schedule+0x3e/0x90
> [81248.855166]  [<ffffffff81db9f3d>] rwsem_down_write_failed+0x13d/0x340
> [81248.855166]  [<ffffffff810cce16>] ? try_to_wake_up+0x1f6/0x330
> [81248.855166]  [<ffffffff814f2e46>] ? xfs_file_buffered_aio_write+0x66/0x250
> [81248.855166]  [<ffffffff817c76a3>] call_rwsem_down_write_failed+0x13/0x20
> [81248.855166]  [<ffffffff81db96bf>] ? down_write+0x3f/0x60
> [81248.855166]  [<ffffffff814fdea4>] xfs_ilock+0x154/0x1d0
> [81248.855166]  [<ffffffff814f2e46>] xfs_file_buffered_aio_write+0x66/0x250
> [81248.855166]  [<ffffffff8179264d>] ? kblockd_schedule_work+0x1d/0x30
> [81248.855166]  [<ffffffff8179dbc5>] ? blk_mq_kick_requeue_list+0x15/0x20
> [81248.855166]  [<ffffffff814f3136>] xfs_file_write_iter+0x106/0x120
> [81248.855166]  [<ffffffff811cbac3>] vfs_iter_write+0x63/0xa0
> [81248.855166]  [<ffffffff81a97828>] lo_write_bvec+0x58/0x100
> [81248.855166]  [<ffffffff81a99248>] loop_handle_cmd.isra.28+0x958/0xaa0
> [81248.855166]  [<ffffffff81a9941f>] loop_queue_write_work+0x8f/0xd0
> [81248.855166]  [<ffffffff810bd02e>] process_one_work+0x14e/0x410
> [81248.855166]  [<ffffffff810bd63e>] worker_thread+0x4e/0x460
> [81248.855166]  [<ffffffff810bd5f0>] ? rescuer_thread+0x300/0x300
> [81248.855166]  [<ffffffff810c288c>] kthread+0xec/0x110
> [81248.855166]  [<ffffffff810c27a0>] ? kthread_create_on_node+0x1b0/0x1b0
> [81248.855166]  [<ffffffff81dbba5f>] ret_from_fork+0x3f/0x70
> [81248.855166]  [<ffffffff810c27a0>] ? kthread_create_on_node+0x1b0/0x1b0
> --
> Dave Chinner
> david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ