[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1582581887.git.schatzberg.dan@gmail.com>
Date: Mon, 24 Feb 2020 17:17:44 -0500
From: Dan Schatzberg <schatzberg.dan@...il.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Dan Schatzberg <schatzberg.dan@...il.com>,
Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
Li Zefan <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>, Roman Gushchin <guro@...com>,
Shakeel Butt <shakeelb@...gle.com>,
Chris Down <chris@...isdown.name>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-block@...r.kernel.org (open list:BLOCK LAYER),
linux-kernel@...r.kernel.org (open list),
cgroups@...r.kernel.org (open list:CONTROL GROUP (CGROUP)),
linux-mm@...ck.org (open list:CONTROL GROUP - MEMORY RESOURCE
CONTROLLER (MEMCG))
Subject: [PATCH v3 0/3] Charge loop device i/o to issuing cgroup
Changes since V3:
* Fix race on loop device destruction and deferred worker cleanup
* Ensure charge on shmem_swapin_page works just like getpage
* Minor style changes
Changes since V2:
* Deferred destruction of workqueue items so in the common case there
is no allocation needed
Changes since V1:
* Split out and reordered patches so cgroup charging changes are
separate from kworker -> workqueue change
* Add mem_css to struct loop_cmd to simplify logic
The loop device runs all i/o to the backing file on a separate kworker
thread which results in all i/o being charged to the root cgroup. This
allows a loop device to be used to trivially bypass resource limits
and other policy. This patch series fixes this gap in accounting.
A simple script to demonstrate this behavior on cgroupv2 machine:
'''
#!/bin/bash
set -e
CGROUP=/sys/fs/cgroup/test.slice
LOOP_DEV=/dev/loop0
if [[ ! -d $CGROUP ]]
then
sudo mkdir $CGROUP
fi
grep oom_kill $CGROUP/memory.events
# Set a memory limit, write more than that limit to tmpfs -> OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
dd if=/dev/zero of=/tmp/file bs=1M count=256" || true
grep oom_kill $CGROUP/memory.events
# Set a memory limit, write more than that limit through loopback
# device -> no OOM kill
sudo unshare -m bash -c "
echo \$\$ > $CGROUP/cgroup.procs;
echo 0 > $CGROUP/memory.swap.max;
echo 64M > $CGROUP/memory.max;
mount -t tmpfs -o size=512m tmpfs /tmp;
truncate -s 512m /tmp/backing_file
losetup $LOOP_DEV /tmp/backing_file
dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
losetup -D $LOOP_DEV" || true
grep oom_kill $CGROUP/memory.events
'''
Naively charging cgroups could result in priority inversions through
the single kworker thread in the case where multiple cgroups are
reading/writing to the same loop device. This patch series does some
minor modification to the loop driver so that each cgroup can make
forward progress independently to avoid this inversion.
With this patch series applied, the above script triggers OOM kills
when writing through the loop device as expected.
Dan Schatzberg (3):
loop: Use worker per cgroup instead of kworker
mm: Charge active memcg when no mm is set
loop: Charge i/o to mem and blk cg
drivers/block/loop.c | 246 +++++++++++++++++++++++++++++++------
drivers/block/loop.h | 14 ++-
include/linux/memcontrol.h | 6 +
kernel/cgroup/cgroup.c | 1 +
mm/memcontrol.c | 11 +-
mm/shmem.c | 4 +-
6 files changed, 235 insertions(+), 47 deletions(-)
--
2.17.1
Powered by blists - more mailing lists