[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <etumn4tax7g5c3wygn2aazmo5m7f4ydfji7ehno5i6jckkf27e@mu3fisrw5jcc>
Date: Thu, 13 Feb 2025 10:20:26 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>, Kairui Song <ryncsn@...il.com>, Minchan Kim <minchan@...nel.org>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 12/18] zsmalloc: make zspage lock preemptible
On (25/02/12 17:14), Yosry Ahmed wrote:
> On Wed, Feb 12, 2025 at 03:27:10PM +0900, Sergey Senozhatsky wrote:
> > Switch over from rwlock_t to a atomic_t variable that takes negative
> > value when the page is under migration, or positive values when the
> > page is used by zsmalloc users (object map, etc.) Using a rwsem
> > per-zspage is a little too memory heavy, a simple atomic_t should
> > suffice.
>
> We should also explain that rwsem cannot be used due to the locking
> context (we need to hold it in an atomic context). Basically what you
> explained to me before :)
>
> > zspage lock is a leaf lock for zs_map_object(), where it's read-acquired.
> > Since this lock now permits preemption extra care needs to be taken when
> > it is write-acquired - all writers grab it in atomic context, so they
> > cannot spin and wait for (potentially preempted) reader to unlock zspage.
> > There are only two writers at this moment - migration and compaction. In
> > both cases we use write-try-lock and bail out if zspage is read locked.
> > Writers, on the other hand, never get preempted, so readers can spin
> > waiting for the writer to unlock zspage.
>
> The details are important, but I think we want to concisely state the
> problem statement either before or after. Basically we want a lock that
> we *never* sleep while acquiring but *can* sleep while holding in read
> mode. This allows holding the lock from any context, but also being
> preemptible if the context allows it.
Ack.
[..]
> > +/*
> > + * zspage locking rules:
>
> Also here we need to state our key rule:
> Never sleep while acquiring, preemtible while holding (if possible). The
> following rules are basically how we make sure we keep this true.
>
> > + *
> > + * 1) writer-lock is exclusive
> > + *
> > + * 2) writer-lock owner cannot sleep
> > + *
> > + * 3) writer-lock owner cannot spin waiting for the lock
> > + * - caller (e.g. compaction and migration) must check return value and
> > + * handle locking failures
> > + * - there is only TRY variant of writer-lock function
> > + *
> > + * 4) reader-lock owners (multiple) can sleep
> > + *
> > + * 5) reader-lock owners can spin waiting for the lock, in any context
> > + * - existing readers (even preempted ones) don't block new readers
> > + * - writer-lock owners never sleep, always unlock at some point
>
>
> May I suggest something more concise and to the point?
>
> /*
> * The zspage lock can be held from atomic contexts, but it needs to remain
> * preemptible when held for reading because it remains held outside of those
> * atomic contexts, otherwise we unnecessarily lose preemptibility.
> *
> * To achieve this, the following rules are enforced on readers and writers:
> *
> * - Writers are blocked by both writers and readers, while readers are only
> * blocked by writers (i.e. normal rwlock semantics).
> *
> * - Writers are always atomic (to allow readers to spin waiting for them).
> *
> * - Writers always use trylock (as the lock may be held be sleeping readers).
> *
> * - Readers may spin on the lock (as they can only wait for atomic writers).
> *
> * - Readers may sleep while holding the lock (as writes only use trylock).
> */
Looks good, thanks.
> > + */
> > +static void zspage_read_lock(struct zspage *zspage)
> > +{
> > + atomic_t *lock = &zspage->lock;
> > + int old = atomic_read_acquire(lock);
> > +
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > + rwsem_acquire_read(&zspage->lockdep_map, 0, 0, _RET_IP_);
> > +#endif
> > +
> > + do {
> > + if (old == ZS_PAGE_WRLOCKED) {
> > + cpu_relax();
> > + old = atomic_read_acquire(lock);
> > + continue;
> > + }
> > + } while (!atomic_try_cmpxchg_acquire(lock, &old, old + 1));
> > +}
> > +
> > +static void zspage_read_unlock(struct zspage *zspage)
> > +{
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > + rwsem_release(&zspage->lockdep_map, _RET_IP_);
> > +#endif
> > + atomic_dec_return_release(&zspage->lock);
> > +}
> > +
> > +static __must_check bool zspage_try_write_lock(struct zspage *zspage)
>
> I believe zspage_write_trylock() would be closer to the normal rwlock
> naming.
It derived its name from rwsem "age". Can rename.
> > +{
> > + atomic_t *lock = &zspage->lock;
> > + int old = ZS_PAGE_UNLOCKED;
> > +
> > + WARN_ON_ONCE(preemptible());
>
> Hmm I know I may have been the one suggesting this, but do we actually
> need it? We disable preemption explicitly anyway before holding the
> lock.
This is just to make sure that the precondition for
"writer is always atomic" is satisfied. But I can drop it.
> > size_class_lock(class);
> > - /* the migrate_write_lock protects zpage access via zs_map_object */
> > - migrate_write_lock(zspage);
> > + /* the zspage write_lock protects zpage access via zs_map_object */
> > + if (!zspage_try_write_lock(zspage)) {
> > + size_class_unlock(class);
> > + pool_write_unlock(pool);
> > + return -EINVAL;
> > + }
> > +
> > + /* We're committed, tell the world that this is a Zsmalloc page. */
> > + __zpdesc_set_zsmalloc(newzpdesc);
>
> We used to do this earlier on, before any locks are held. Why is it
> moved here?
I want to do that only if zspaage write-trylock has succeeded (we didn't
have any error out paths before).
Powered by blists - more mailing lists