[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6vtpamir4bvn3snlj36tfmnmpcbd6ks6m3sdn7ewmoles7jhau@nbezqbnoukzv>
Date: Wed, 5 Feb 2025 11:43:16 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>, Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible
On (25/02/04 17:19), Yosry Ahmed wrote:
> > sizeof(struct zs_page) change is one thing. Another thing is that
> > zspage->lock is taken from atomic sections, pretty much everywhere.
> > compaction/migration write-lock it under pool rwlock and class spinlock,
> > but both compaction and migration now EAGAIN if the lock is locked
> > already, so that is sorted out.
> >
> > The remaining problem is map(), which takes zspage read-lock under pool
> > rwlock. RFC series (which you hated with passion :P) converted all zsmalloc
> > into preemptible ones because of this - zspage->lock is a nested leaf-lock,
> > so it cannot schedule unless locks it's nested under permit it (needless to
> > say neither rwlock nor spinlock permit it).
>
> Hmm, so we want the lock to be preemtible, but we don't want to use an
> existing preemtible lock because it may be held it from atomic context.
>
> I think one problem here is that the lock you are introducing is a
> spinning lock but the lock holder can be preempted. This is why spinning
> locks do not allow preemption. Others waiting for the lock can spin
> waiting for a process that is scheduled out.
>
> For example, the compaction/migration code could be sleeping holding the
> write lock, and a map() call would spin waiting for that sleeping task.
write-lock holders cannot sleep, that's the key part.
So the rules are:
1) writer cannot sleep
- migration/compaction runs in atomic context and grabs
write-lock only from atomic context
- write-locking function disables preemption before lock(), just to be
safe, and enables it after unlock()
2) writer does not spin waiting
- that's why there is only write_try_lock function
- compaction and migration bail out when they cannot lock the
zspage
3) readers can sleep and can spin waiting for a lock
- other (even preempted) readers don't block new readers
- writers don't sleep, they always unlock
> I wonder if there's a way to rework the locking instead to avoid the
> nesting. It seems like sometimes we lock the zspage with the pool lock
> held, sometimes with the class lock held, and sometimes with no lock
> held.
>
> What are the rules here for acquiring the zspage lock?
Most of that code is not written by me, but I think the rule is to disable
"migration" be it via pool lock or class lock.
> Do we need to hold another lock just to make sure the zspage does not go
> away from under us?
Yes, the page cannot go away via "normal" path:
zs_free(last object) -> zspage becomes empty -> free zspage
so when we have active mapping() it's only migration and compaction
that can free zspage (its content is migrated and so it becomes empty).
> Can we use RCU or something similar to do that instead?
Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data
patterns the clients have. I suspect we'd need to synchronize RCU every
time a zspage is freed: zs_free() [this one is complicated], or migration,
or compaction? Sounds like anti-pattern for RCU?
Powered by blists - more mailing lists