linux-kernel - Re: [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150325160115.GR3880@htj.duckdns.org>
Date:	Wed, 25 Mar 2015 12:01:15 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	axboe@...nel.dk, linux-kernel@...r.kernel.org, jack@...e.cz,
	hch@...radead.org, hannes@...xchg.org,
	linux-fsdevel@...r.kernel.org, lizefan@...wei.com,
	cgroups@...r.kernel.org, linux-mm@...ck.org, mhocko@...e.cz,
	clm@...com, fengguang.wu@...el.com, david@...morbit.com,
	gthelen@...gle.com
Subject: Re: [PATCHSET 1/3 v2 block/for-4.1/core] writeback: cgroup writeback
 support

Hello, Vivek.

On Wed, Mar 25, 2015 at 11:40:22AM -0400, Vivek Goyal wrote:
> I have 32G of RAM on my system and I setup a write bandwidth of 1MB/s
> on the disk and allowed a dd to run. That dd quickly consumed 5G of
> page cache before it reached to a steady state. Sounds like too much
> of cache consumption which will be drained at a speed of 1MB/s. Not
> sure if this is expected or bdi back-pressure is not being applied soon
> enough.

Ooh, the system will happily dirty certain amount of memory regardless
of the writeback speed.  The default is bg_thresh 10% and thresh 20%
which puts the target ratio at 15%.  On a 32G system this is ~4.8G, so
sounds about right.  This is intentional as otherwise we may end up
threshing worloads which can perfectly fit in the memory due to slow
backing device.  e.g. a workload which has 4G dirty footprint would
work perfectly fine in the above setup regardless of the speed of the
backing device.  If we capped dirty memory at, say, 120s of write
bandwidth, which is 120MB in this case, that workload would suffer
horribly for no good reason.

The proportional distribution of dirty pages is really just
proportional.  If you don't have higher bw backing device active on
the system, whatever is active, however slow that may be, get to
consume the entirety of the allowable dirty memory.  This doesn't
necessarily make sense for things like USB sticks, so we have per-bdi
max_ratio which can be set from userland for devices which aren't
supposed to host those sort of workloads (as you aren't gonna run a DB
workload on your thumbdrive).

So, that's where that 5G amount came from, but while you're
excercising cgroup writeback path, it isn't really doing anything
differently from before if you don't configure memcg limits.  This is
the same behavior which would happen in the global case.  Try to
configure different cgroups w/ different memory limits and write to
devices with differing write speeds.  They all will converge to ~15%
of the allowable memory in each cgroup and the dirty pages in each
cgroup will be distributed according to each device's writeback speed
in that cgroup.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/