linux-kernel - Re: [PATCH] * dlm: improve lock management and concurrency control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK-6q+hYROHr2eEyJ+j5VWpmwDQd+fA+ZmDTps4rDKXnCq6tHg@mail.gmail.com>
Date: Thu, 11 Sep 2025 10:38:33 -0400
From: Alexander Aring <aahringo@...hat.com>
To: Alessio Attilio <alessio.attilio.dev@...il.com>
Cc: teigland@...hat.com, gfs2@...ts.linux.dev, linux-kernel@...r.kernel.org, 
	Alessio Attilio <alessio.attilio@...ineer.com>
Subject: Re: [PATCH] * dlm: improve lock management and concurrency control

Hi,

On Wed, Sep 10, 2025 at 1:23 PM Alessio Attilio
<alessio.attilio.dev@...il.com> wrote:
>
> From: Alessio Attilio <alessio.attilio@...ineer.com>
>
> This patch introduces several improvements to lock handling in the DLM
> subsystem, focusing on thread safety, correctness, and code clarity.
>
> - Added explicit locking (spin_lock_bh/spin_unlock_bh) around accesses
>   to proc->locks and proc->asts in dlm_clear_proc_locks, ensuring safe
>   concurrent operations during lock cleanup.
> - Replaced del_proc_lock with direct, lock-protected list operations
>   for improved clarity and correctness.
> - Updated send_unlock to set RSB_MASTER_UNCERTAIN only when releasing
>   the last lock on an rsb, ensuring proper master confirmation.
> - Improved handling of persistent and non-persistent locks by setting
>   appropriate flags (DLM_DFL_ORPHAN_BIT or DLM_IFL_DEAD_BIT) before
>   orphaning or unlocking.
> - Removed outdated comments related to mutex protection and serialization
>   assumptions, reflecting the updated concurrency model.
>
> Signed-off-by: Alessio Attilio <alessio.attilio.dev@...il.com>

The patch doesn't compile for me as well. Please also make sure to run
"scripts/checkpatch.pl" on your patch before sending it.

> ---
>  fs/dlm/lock.c | 99 ++++++++++++++++++++++++---------------------------
>  1 file changed, 46 insertions(+), 53 deletions(-)
>
> diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
> index 6dd3a524cd35..9170b5c09823 100644
> --- a/fs/dlm/lock.c
> +++ b/fs/dlm/lock.c
> @@ -3654,12 +3654,33 @@ static int send_convert(struct dlm_rsb *r, struct dlm_lkb *lkb)
>         return error;
>  }
>
> -/* FIXME: if this lkb is the only lock we hold on the rsb, then set
> -   MASTER_UNCERTAIN to force the next request on the rsb to confirm
> -   that the master is still correct. */
> -
>  static int send_unlock(struct dlm_rsb *r, struct dlm_lkb *lkb)
>  {
> +       struct dlm_lkb *tmp;
> +       int count = 0;
> +
> +       list_for_each_entry(tmp, &r->res_grantqueue, lkb_statequeue) {
> +               if (is_process_copy(tmp))
> +                       count++;
> +       }
> +       list_for_each_entry(tmp, &r->res_convertqueue, lkb_statequeue) {
> +               if (is_process_copy(tmp))
> +                       count++;
> +       }
> +       list_for_each_entry(tmp, &r->res_waitqueue, lkb_statequeue) {
> +               if (is_process_copy(tmp))
> +                       count++;
> +       }
> +
> +/*
> + * When releasing the last lock on the rsb, we mark the master as uncertain.
> + * This ensures that the next lock request will verify the master node,
> + * maintaining consistency across the cluster.
> + */

After unlocking the rsb should be moved to the inactive/toss state, if
it's activated again it should be that there is a new lookup being
done when it's necessary.
This however is not being done at the state here. As you describe the
problem it is about multiple concurrent requests the problem might be
something different.
There should be no cancel/unlock requests at the same time, cancel is
only for requests or converts not for unlock, maybe what's missing
here is to check on an invalid API usage at [0]?

As far as I understood from your problem specification, a
stacktrace/reproducer/coredump would be much more helpful here.

...
> -       spin_lock_bh(&ls->ls_clear_proc_locks);
> -
> +       spin_lock_bh(&proc->locks_spin);
>         /* in-progress unlocks */
>         list_for_each_entry_safe(lkb, safe, &proc->unlocking, lkb_ownqueue) {
>                 list_del_init(&lkb->lkb_ownqueue);
>                 set_bit(DLM_IFL_DEAD_BIT, &lkb->lkb_iflags);
>                 dlm_put_lkb(lkb);
>         }
> +       spin_unlock_bh(&proc->locks_spin);
>
> +       spin_lock_bh(&proc->asts_spin);
>         list_for_each_entry_safe(cb, cb_safe, &proc->asts, list) {
>                 list_del(&cb->list);
>                 dlm_free_cb(cb);
>         }
> +       spin_unlock_bh(&proc->asts_spin);
>
> -       spin_unlock_bh(&ls->ls_clear_proc_locks);

This lock has after your patch only one lock/unlock user in user.c
which makes this lock obsolete, however it seems your user.c changes
are not part of your patch and got somehow missed.

- Alex

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/lock.c?h=v6.17-rc5#n2892