linux-kernel - Re: Local DoS through write heavy I/O on CFQ & Deadline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121014211735.GU2739@dastard>
Date:	Mon, 15 Oct 2012 08:17:35 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Alex Bligh <alex@...x.org.uk>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Local DoS through write heavy I/O on CFQ & Deadline

On Thu, Oct 11, 2012 at 01:23:32PM +0100, Alex Bligh wrote:
> We have noticed significant I/O scheduling issues on both the CFQ and the
> deadline scheduler where a non-root user can starve any other process of
> any I/O for minutes at a time. The problem is more serious using CFQ but is
> still an effective local DoS vector using Deadline.
> 
> A simple way to generate the problem is:
> 
>   dd if=/dev/zero of=- bs=1M count=50000 | dd if=- of=myfile bs=1M count=50000
> 
> (note use of 2 dd's is to avoid alleged optimisation of the writing dd
> from /dev/zero). zcat-ing a large file with stout redirected to a file
> produces a similar error. Using ionice to set idle priority makes no
> difference.
> 
> To instrument the problem we produced a python script which does a MySQL
> select and update every 10 seconds, and time the execution of the update.
> This is normally milliseconds, but under user generated load conditions, we
> can take this to indefinite (on CFQ) and over a minute (on deadline).
> Postgres is affected in a similar manner (i.e. it is not MySQL specific).
> Simultaneously we have captured the output of 'vmstat 1 2' and
> /proc/meminfo, with appropriate timestamps.

Well, mysql is stuck in fsync(), so of course it's going to have
problems with write latency:

[ 3840.268303] [<ffffffff812650d5>] jbd2_log_wait_commit+0xb5/0x130
[ 3840.268308] [<ffffffff8108aa50>] ? add_wait_queue+0x60/0x60
[ 3840.268313] [<ffffffff81211248>] ext4_sync_file+0x208/0x2d0

And postgres gets stuck there too. So what you are seeing is likely
an ext4 problem, not an IO scheduler problem.

Suggestion: try the same test with XFS. If the problem still exists,
then it *might* be an ioscheduler problem. If it goes away, then
it's an ext4 problem.

Cheers,

Dave.

-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/