[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkrOS12pi8WEXyUgYEQ4gy0S9iVrEeBp-2Ypyn=1bthZRA@mail.gmail.com>
Date: Wed, 20 Apr 2022 15:24:49 -0700
From: Yang Shi <shy828301@...il.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Linux MM <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Chinner <dchinner@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Shakeel Butt <shakeelb@...gle.com>
Subject: Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
On Fri, Apr 15, 2022 at 5:28 PM Roman Gushchin <roman.gushchin@...ux.dev> wrote:
>
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
>
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
>
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
>
> For each shrinker registered in the system a folder is created. The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.
>
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that sysfs entries can have more meaningful names.
>
> Usage examples:
Thanks, Roman. A follow-up question, why do we have to implement this
in kernel if we just count the objects? It seems userspace tools could
achieve it too, for example, drgn :-). Actually I did write a drgn
script for debugging a problem a few months ago, which iterates
specific memcg's lru_list to count the objects by their state.
>
> 1) List registered shrinkers:
> $ cd /sys/kernel/shrinker/
> $ ls
> dqcache-16 sb-cgroup2-30 sb-hugetlbfs-33 sb-proc-41 sb-selinuxfs-22 sb-tmpfs-40 sb-zsmalloc-19
> kfree_rcu-0 sb-configfs-23 sb-iomem-12 sb-proc-44 sb-sockfs-8 sb-tmpfs-42 shadow-18
> sb-aio-20 sb-dax-11 sb-mqueue-21 sb-proc-45 sb-sysfs-26 sb-tmpfs-43 thp_deferred_split-10
> sb-anon_inodefs-15 sb-debugfs-7 sb-nsfs-4 sb-proc-47 sb-tmpfs-1 sb-tmpfs-46 thp_zero-9
> sb-bdev-3 sb-devpts-28 sb-pipefs-14 sb-pstore-31 sb-tmpfs-27 sb-tmpfs-49 xfs_buf-37
> sb-bpf-32 sb-devtmpfs-5 sb-proc-25 sb-rootfs-2 sb-tmpfs-29 sb-tracefs-13 xfs_inodegc-38
> sb-btrfs-24 sb-hugetlbfs-17 sb-proc-39 sb-securityfs-6 sb-tmpfs-35 sb-xfs-36 zspool-34
>
> 2) Get information about a specific shrinker:
> $ cd sb-btrfs-24/
> $ ls
> count count_memcg count_memcg_node count_node scan scan_memcg scan_memcg_node scan_node
>
> 3) Count objects on the system/root cgroup level
> $ cat count
> 212
>
> 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
> $ cat count_node
> 209 3
>
> 5) Count objects for each memcg (output format: cgroup inode, count)
> $ cat count_memcg
> 1 212
> 20 96
> 53 817
> 2297 2
> 218 13
> 581 30
> 911 124
> <CUT>
>
> 6) Same but with a per-node output
> $ cat count_memcg_node
> 1 209 3
> 20 96 0
> 53 810 7
> 2297 2 0
> 218 13 0
> 581 30 0
> 911 124 0
> <CUT>
>
> 7) Don't display cgroups with less than 500 attached objects
> $ echo 500 > count_memcg
> $ cat count_memcg
> 53 817
> 1868 886
> 2396 799
> 2462 861
>
> 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
> $ echo "500" > count_memcg_node
> $ cat count_memcg_node
> 53 810 7
> 1868 886 0
> 2396 799 0
> 2462 861 0
>
> 9) Scan system/root shrinker
> $ cat count
> 212
> $ echo 100 > scan
> $ cat scan
> 97
> $ cat count
> 115
>
> 10) Scan individual memcg
> $ echo "1868 500" > scan_memcg
> $ cat scan_memcg
> 193
>
> 11) Scan individual node
> $ echo "1 200" > scan_node
> $ cat scan_node
> 2
>
> 12) Scan individual memcg and node
> $ echo "1868 0 500" > scan_memcg_node
> $ cat scan_memcg_node
> 435
>
> If the output doesn't fit into a single page, "...\n" is printed at the end of
> output.
>
>
> Roman Gushchin (5):
> mm: introduce sysfs interface for debugging kernel shrinker
> mm: memcontrol: introduce mem_cgroup_ino() and
> mem_cgroup_get_from_ino()
> mm: introduce memcg interfaces for shrinker sysfs
> mm: introduce numa interfaces for shrinker sysfs
> mm: provide shrinkers with names
>
> arch/x86/kvm/mmu/mmu.c | 2 +-
> drivers/android/binder_alloc.c | 2 +-
> drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 3 +-
> drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +-
> .../gpu/drm/panfrost/panfrost_gem_shrinker.c | 2 +-
> drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
> drivers/md/bcache/btree.c | 2 +-
> drivers/md/dm-bufio.c | 2 +-
> drivers/md/dm-zoned-metadata.c | 2 +-
> drivers/md/raid5.c | 2 +-
> drivers/misc/vmw_balloon.c | 2 +-
> drivers/virtio/virtio_balloon.c | 2 +-
> drivers/xen/xenbus/xenbus_probe_backend.c | 2 +-
> fs/erofs/utils.c | 2 +-
> fs/ext4/extents_status.c | 3 +-
> fs/f2fs/super.c | 2 +-
> fs/gfs2/glock.c | 2 +-
> fs/gfs2/main.c | 2 +-
> fs/jbd2/journal.c | 2 +-
> fs/mbcache.c | 2 +-
> fs/nfs/nfs42xattr.c | 7 +-
> fs/nfs/super.c | 2 +-
> fs/nfsd/filecache.c | 2 +-
> fs/nfsd/nfscache.c | 2 +-
> fs/quota/dquot.c | 2 +-
> fs/super.c | 2 +-
> fs/ubifs/super.c | 2 +-
> fs/xfs/xfs_buf.c | 2 +-
> fs/xfs/xfs_icache.c | 2 +-
> fs/xfs/xfs_qm.c | 2 +-
> include/linux/memcontrol.h | 9 +
> include/linux/shrinker.h | 25 +-
> kernel/rcu/tree.c | 2 +-
> lib/Kconfig.debug | 9 +
> mm/Makefile | 1 +
> mm/huge_memory.c | 4 +-
> mm/memcontrol.c | 23 +
> mm/shrinker_debug.c | 792 ++++++++++++++++++
> mm/vmscan.c | 66 +-
> mm/workingset.c | 2 +-
> mm/zsmalloc.c | 2 +-
> net/sunrpc/auth.c | 2 +-
> 42 files changed, 957 insertions(+), 47 deletions(-)
> create mode 100644 mm/shrinker_debug.c
>
> --
> 2.35.1
>
Powered by blists - more mailing lists