lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Oct 2016 12:58:08 -0400
From:   Tejun Heo <tj@...nel.org>
To:     torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
        mingo@...hat.com, peterz@...radead.org, axboe@...nel.dk,
        tytso@....edu, jack@...e.com, adilger.kernel@...ger.ca
Cc:     linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, kernel-team@...com, mingbo@...com
Subject: [PATCHSET RFC] sched, jbd2: mark sleeps on journal->j_checkpoint_mutex as iowait

Hello,

When there's heavy metadata operation traffic on ext4, the journal
gets filled soon and majority of filesystem users end up blocking on
journal->j_checkpoint_mutex with a stacktrace similar to the
following.

 [<ffffffff8c32e758>] __jbd2_log_wait_for_space+0xb8/0x1d0
 [<ffffffff8c3285f6>] add_transaction_credits+0x286/0x2a0
 [<ffffffff8c32876c>] start_this_handle+0x10c/0x400
 [<ffffffff8c328c5b>] jbd2__journal_start+0xdb/0x1e0
 [<ffffffff8c30ee5d>] __ext4_journal_start_sb+0x6d/0x120
 [<ffffffff8c2d713e>] __ext4_new_inode+0x64e/0x1330
 [<ffffffff8c2e9bf0>] ext4_create+0xc0/0x1c0
 [<ffffffff8c2570fd>] path_openat+0x124d/0x1380
 [<ffffffff8c258501>] do_filp_open+0x91/0x100
 [<ffffffff8c2462d0>] do_sys_open+0x130/0x220
 [<ffffffff8c2463de>] SyS_open+0x1e/0x20
 [<ffffffff8c7ec5b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
 [<ffffffffffffffff>] 0xffffffffffffffff

Because the sleeps on the mutex aren't accounted as iowait, the system
doesn't show the usual signs of being bogged down by IOs - both iowait
and /proc/stat:procs_blocked stay misleadingly low.  While propagation
of iowait through locking constructs is far from being strict, heavy
contention on j_checkpoint_mutex is easy to trigger, obviously iowait
and getting it right can help users in tracking down the issue quite a
bit.

Due to the way io_schedule() is implemented, it currently is hairy to
add an io variant to an existing interface - the schedule() call
itself, which is usually buried deep, should be replaced with
io_schedule().  As we already have current->in_iowait to mark the task
as sleeping for iowait, this can be made easy by breaking up
io_schedule() into multiple steps so that the preparation and marking
can be done before calling an existing interafce and the actual iowait
accounting can be done from inside the scheduler.

What do you think?

This patch contains the following four patches.

 0001-sched-move-IO-scheduling-accounting-from-io_schedule.patch
 0002-sched-separate-out-io_schedule_prepare-and-io_schedu.patch
 0003-mutex-add-mutex_lock_io.patch
 0004-jbd2-use-mutex_lock_io-for-journal-j_checkpoint_mute.patch

0001-0002 implement io_schedule_prepare/finish().
0003 implements mutex_lock_io() using io_schedule_prepare/finish().
0004 uses mutex_lock_io() on journal->j_checkpoint_mutex.

This patchset is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-mutex_lock_io

Thanks, diffstat follows.

 fs/jbd2/commit.c       |    2 -
 fs/jbd2/journal.c      |   14 ++++++-------
 include/linux/mutex.h  |    4 +++
 include/linux/sched.h  |    8 ++-----
 kernel/locking/mutex.c |   24 ++++++++++++++++++++++
 kernel/sched/core.c    |   52 +++++++++++++++++++++++++++++++++++++------------
 6 files changed, 79 insertions(+), 25 deletions(-)

--
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ