linux-kernel - Re: [External] Re: [PATCH 0/9] Shrink the list lru size on memory cgroup removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210505011331.GM1872259@dread.disaster.area>
Date:   Wed, 5 May 2021 11:13:31 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Muchun Song <songmuchun@...edance.com>
Cc:     Roman Gushchin <guro@...com>, Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Yang Shi <shy828301@...il.com>, alexs@...nel.org,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        Wei Yang <richard.weiyang@...il.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [External] Re: [PATCH 0/9] Shrink the list lru size on memory
 cgroup removal

On Mon, May 03, 2021 at 02:33:21PM +0800, Muchun Song wrote:
> On Mon, May 3, 2021 at 7:58 AM Dave Chinner <david@...morbit.com> wrote:
> > > If the user wants to insert the allocated object to its lru list in
> > > the feature. The
> > > user should use list_lru_kmem_cache_alloc() instead of kmem_cache_alloc().
> > > I have looked at the code closely. There are 3 different kmem_caches that
> > > need to use this new API to allocate memory. They are inode_cachep,
> > > dentry_cache and radix_tree_node_cachep. I think that it is easy to migrate.
> >
> > It might work, but I think you may have overlooked the complexity
> > of inode allocation for filesystems. i.e.  alloc_inode() calls out
> > to filesystem allocation functions more often than it allocates
> > directly from the inode_cachep.  i.e.  Most filesystems provide
> > their own ->alloc_inode superblock operation, and they allocate
> > inodes out of their own specific slab caches, not the inode_cachep.
> 
> I didn't realize this before. You are right. Most filesystems
> have their own kmem_cache instead of inode_cachep.
> We need a lot of filesystems special to be changed.
> Thanks for your reminder.
> 
> >
> > And then you have filesystems like XFS, where alloc_inode() will
> > never be called, and implement ->alloc_inode as:
> >
> > /* Catch misguided souls that try to use this interface on XFS */
> > STATIC struct inode *
> > xfs_fs_alloc_inode(
> >         struct super_block      *sb)
> > {
> >         BUG();
> >         return NULL;
> > }
> >
> > Because all the inode caching and allocation is internal to XFS and
> > VFS inode management interfaces are not used.
> >
> > So I suspect that an external wrapper function is not the way to go
> > here - either internalising the LRU management into the slab
> > allocation or adding the memcg code to alloc_inode() and filesystem
> > specific routines would make a lot more sense to me.
> 
> Sure. If we introduce kmem_cache_alloc_lru, all filesystems
> need to migrate to kmem_cache_alloc_lru. I cannot figure out
> an approach that does not need to change filesystems code.

Right, I don't think there's a way to avoid changing all the
filesystem code if we are touching the cache allocation routines.
However, if we hide it all inside the allocation routine, then
the changes to each filesystem is effectively just a 1-liner like:

-	inode = kmem_cache_alloc(inode_cache, GFP_NOFS);
+	inode = kmem_cache_alloc_lru(inode_cache, sb->s_inode_lru, GFP_NOFS);

Or perhaps, define a generic wrapper function like:

static inline void *
alloc_inode_sb(struct superblock *sb, struct kmem_cache *cache, gfp_flags_t gfp)
{
	return kmem_cache_alloc_lru(cache, sb->s_inode_lru, gfp);
}

And then each filesystem ends up with:

-	inode = kmem_cache_alloc(inode_cache, GFP_NOFS);
+	inode = alloc_inode_sb(sb, inode_cache, GFP_NOFS);

so that all the superblock LRU stuff is also hidden from the
filesystems...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com