lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <0414cab3-32d6-c60a-d3c8-96fc72064ba0@openvz.org>
Date:   Sun, 31 Jul 2022 18:37:15 +0300
From:   Vasily Averin <vvs@...nvz.org>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tejun Heo <tj@...nel.org>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        linux-kernel@...r.kernel.org, kernel@...nvz.org,
        Shakeel Butt <shakeelb@...gle.com>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Michal Koutný <mkoutny@...e.com>,
        Muchun Song <songmuchun@...edance.com>,
        Michal Hocko <mhocko@...e.com>
Subject: [PATCH 0/3] enable memcg accounting for kernfs objects

This patch set enables memcg accounting for kernfs-related objects.

Originally it was a part of patch set
"memcg: accounting for objects allocated by mkdir cgroup"
https://lore.kernel.org/all/0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org/

The patches have received approval from several developers, 
however respected Michal Hocko pointed out that, if neccesary,
cgroups consumption can be restricted via cgroup.max.descendants
limit without additional accounting of allocated memory.
I still disagree with him, I think that memory limits works better,
but I could not give any new substantial arguments, so discussion
was stalled and patches was frozen in limbo until better times.

However 3 of these patches affect not only cgroups, and I hope
to get help from kernfs maintainers.

Kernfs nodes are quite small kernel objects, however there are few
scenarios where it consumes significant piece of all allocated memory.
I am aware of the following cases, but I am sure there are many other
ones.

1) creating a new netdevice allocates ~50Kb of memory, where ~10Kb
   was allocated for 80+ kernfs nodes.

2) cgroupv2 mkdir allocates ~60Kb of memory, ~10Kb of them are kernfs
   structures.

3) Shakeel Butt reports that Google has workloads which create 100s
   of subcontainers and they have observed high system overhead
   without memcg accounting of kernfs.

My experimets with LXC conrainer on Fedora node show that
usually new kernfs node creates few other objects:

Allocs  Alloc   Allocation
number  size
--------------------------------------------
1   +  128      (__kernfs_new_node+0x4d)        kernfs node
1   +   88      (__kernfs_iattrs+0x57)          kernfs iattrs
1   +   96      (simple_xattr_alloc+0x28)       simple_xattr(*)
1       32      (simple_xattr_set+0x59)
1       8       (__kernfs_new_node+0x30)

'+' -- to be accounted

(*) simple_xattr in this scenaro was allocated directly during
kernfs creation for selinux label. Even here it consumes noticeable
part of newly allocated object.
However please keep in mind that xattr can be allocated later,
via setxattr system calls, its size is controlled by userspace
and can reach 64Kb per call. kernfs objects lives in memory,
so it is improtant to account it.

Originally the patches was splitted to simplify their rewiev,
however if required I can merge them together.

Vasily Averin (3):
  memcg: enable accounting for kernfs nodes
  memcg: enable accounting for kernfs iattrs
  memcg: enable accounting for struct simple_xattr

 fs/kernfs/mount.c | 6 ++++--
 fs/xattr.c        | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ