linux-kernel - [RFC PATCH 0/2] Remove shrinker's nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200916185823.5347-1-shy828301@gmail.com>
Date:   Wed, 16 Sep 2020 11:58:21 -0700
From:   Yang Shi <shy828301@...il.com>
To:     linux-mm@...ck.org, linux-fsdevel@...r.kernel.org
Cc:     shy828301@...il.com, linux-kernel@...r.kernel.org
Subject: [RFC PATCH 0/2] Remove shrinker's nr_deferred

Recently huge amount one-off slab drop was seen on some vfs metadata heavy workloads,
it turned out there were huge amount accumulated nr_deferred objects seen by the
shrinker.

I managed to reproduce this problem with kernel build workload plus negative dentry
generator.

First step, run the below kernel build test script:

NR_CPUS=`cat /proc/cpuinfo | grep -e processor | wc -l`

cd /root/Buildarea/linux-stable

for i in `seq 1500`; do
        cgcreate -g memory:kern_build
        echo 4G > /sys/fs/cgroup/memory/kern_build/memory.limit_in_bytes

        echo 3 > /proc/sys/vm/drop_caches
        cgexec -g memory:kern_build make clean > /dev/null 2>&1
        cgexec -g memory:kern_build make -j$NR_CPUS > /dev/null 2>&1

        cgdelete -g memory:kern_build
done

That would generate huge amount deferred objects due to __GFP_NOFS allocations.

Then run the below negative dentry generator script:

NR_CPUS=`cat /proc/cpuinfo | grep -e processor | wc -l`

mkdir /sys/fs/cgroup/memory/test
echo $$ > /sys/fs/cgroup/memory/test/tasks

for i in `seq $NR_CPUS`; do
        while true; do
                FILE=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 64`
                cat $FILE 2>/dev/null
        done &
done

Then kswapd will shrink half of dentry cache in just one loop as the below tracing result
showed:

	kswapd0-475   [028] .... 305968.252561: mm_shrink_slab_start: super_cache_scan+0x0/0x190 0000000024acf00c: nid: 0
objects to shrink 4994376020 gfp_flags GFP_KERNEL cache items 93689873 delta 45746 total_scan 46844936 priority 12
	kswapd0-475   [021] .... 306013.099399: mm_shrink_slab_end: super_cache_scan+0x0/0x190 0000000024acf00c: nid: 0 unused
scan count 4994376020 new scan count 4947576838 total_scan 8 last shrinker return val 46844928

There were huge deferred objects before the shrinker was called, the behavior does match the code
but it might be not desirable from the user's stand of point.

IIUC the deferred objects were used to make balance between slab and page cache, but since commit
9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use sc->priority for slab shrink targets") they
were decoupled.  And as that commit stated "these two things have nothing to do with each other".

So why do we have to still keep it around?  I can think of there might be huge slab accumulated
without taking into account deferred objects, but nowadays the most workloads are constrained by
memcg which could limit the usage of kmem (by default now), so it seems maintaining deferred
objects is not that useful anymore.  It seems we could remove it to simplify the shrinker logic
a lot.

I may overlook some other important usecases of nr_deferred, comments are much appreciated.