linux-kernel - Re: [PATCHv1 1/6] zsmalloc: factor out pool locking helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z5un85j5P2Z7B9sg@google.com>
Date: Thu, 30 Jan 2025 16:25:23 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Minchan Kim <minchan@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>, Nhat Pham <nphamcs@...il.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv1 1/6] zsmalloc: factor out pool locking helpers

On Thu, Jan 30, 2025 at 01:01:15PM +0900, Sergey Senozhatsky wrote:
> On (25/01/29 16:59), Yosry Ahmed wrote:
> > On Wed, Jan 29, 2025 at 03:43:47PM +0900, Sergey Senozhatsky wrote:
> > > We currently have a mix of migrate_{read,write}_lock() helpers
> > > that lock zspages, but it's zs_pool that actually has a ->migrate_lock
> > > access to which is opene-coded.  Factor out pool migrate locking
> > > into helpers, zspage migration locking API will be renamed to
> > > reduce confusion.
> > 
> > But why did we remove "migrate" from the helpers' names if this is the
> > real migrate lock? It seems like the real problem is how the zspage lock
> > helpers are named, which is addressed later.
> 
> So this is deliberately.  I guess, just like with page-faults in
> zs_map_object(), this naming comes from the time when we had fewer
> locks and less functionality.  These days we have 3 zsmalloc locks that
> can "prevent migration" at different points.  Even size class spin-lock.
> But it's not only about migration anymore.  We also now have compaction
> (defragmentation) that these locks synchronize.  So I want to have a
> reader-write naming scheme.

In this case shouldn't we also rename the lock itself?

> 
> > >  struct zs_pool {
> > > -	const char *name;
> > > +	/* protect page/zspage migration */
> > > +	rwlock_t migrate_lock;
> > >  
> > >  	struct size_class *size_class[ZS_SIZE_CLASSES];
> > >  	struct kmem_cache *handle_cachep;
> > > @@ -213,6 +214,7 @@ struct zs_pool {
> > >  	atomic_long_t pages_allocated;
> > >  
> > >  	struct zs_pool_stats stats;
> > > +	atomic_t compaction_in_progress;
> > >  
> > >  	/* Compact classes */
> > >  	struct shrinker *shrinker;
> > > @@ -223,11 +225,35 @@ struct zs_pool {
> > >  #ifdef CONFIG_COMPACTION
> > >  	struct work_struct free_work;
> > >  #endif
> > > -	/* protect page/zspage migration */
> > > -	rwlock_t migrate_lock;
> > > -	atomic_t compaction_in_progress;
> > > +
> > > +	const char *name;
> > 
> > Are the struct moves relevant to the patch or just to improve
> > readability?
> 
> Relevance is relative :)  That moved from the patch which also
> converted the lock to rwsem.  I'll move this to a separate
> patch.
> 
> > I am generally scared to move hot members around unnecessarily
> > due to cache-line sharing problems -- although I don't know if
> > these are really hot.
> 
> Size classes are basically static - once we init the array it
> becomes RO.  Having migrate_lock at the bottom forces us to
> jump over a big RO zs_pool region, besides we never use ->name
> (unlike migrate_lock) so having it at offset 0 is unnecessary.

What's special about offset 0 though?

My understanding is that we want to put RW-mostly and RO-mostly members
apart to land in different cachelines to avoid unnecessary bouncing.
Also we want the elements that are usually accessed together next to one
another.

Placing the RW lock right next to the RO size classes seems like it will
result in unnecessary cacheline bouncing. For example, if a CPU holds
the lock and dirties the cacheline then all other CPUs have to drop it
and fetch it again when accessing the size classes.

> zs_pool_stats and compaction_in_progress are modified only
> during compaction, which we do not run in parallel (only one
> CPU can do compaction at a time), so we can do something like

But other CPUs can read compaction_in_progress, even if they don't
modify it.

> 
> ---
> 
>  struct zs_pool {
> -       const char *name;
> +       /* protect page/zspage migration */
> +       rwlock_t migrate_lock;
>  
>         struct size_class *size_class[ZS_SIZE_CLASSES];
>
>
> -       struct kmem_cache *handle_cachep;
> -       struct kmem_cache *zspage_cachep;
>  
>         atomic_long_t pages_allocated;
>  
> -       struct zs_pool_stats stats;
> +       struct kmem_cache *handle_cachep;
> +       struct kmem_cache *zspage_cachep;
>  
>         /* Compact classes */
>         struct shrinker *shrinker;
> @@ -223,9 +223,9 @@ struct zs_pool {
>  #ifdef CONFIG_COMPACTION
>         struct work_struct free_work;
>  #endif
> -       /* protect page/zspage migration */
> -       rwlock_t migrate_lock;
> +       const char *name;
>         atomic_t compaction_in_progress;
> +       struct zs_pool_stats stats;
>  };