[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK-6q+hV41=KuZfFUpmJgm8YX8GVxoWZKiA_gGNHWXopt2VpBA@mail.gmail.com>
Date: Mon, 6 Jan 2025 11:14:35 -0500
From: Alexander Aring <aahringo@...hat.com>
To: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
Cc: gfs2@...ts.linux.dev, David Teigland <teigland@...hat.com>, linux-kernel@...r.kernel.org,
kernel-dev@...lia.com
Subject: Re: [PATCH] dlm: prevent NPD when writing a positive value to event_done
Hi,
On Fri, Dec 20, 2024 at 9:28 AM Thadeu Lima de Souza Cascardo
<cascardo@...lia.com> wrote:
>
> do_uevent returns the value written to event_done. In case it is a positive
> value, new_lockspace would undo all the work, and lockspace would not be
> set. __dlm_new_lockspace, however, would treat that positive value as a
> success due to commit 8511a2728ab8 ("dlm: fix use count with multiple
> joins").
>
> Down the line, device_create_lockspace would pass that NULL lockspace to
> dlm_find_lockspace_local, leading to a NULL pointer dereference:
>
This issue starts to be an issue since commit 4db41bf4f04f ("dlm:
remove ls_local_handle from struct dlm_ls"). Before
dlm_find_lockspace_local() was then returning NULL and
device_create_lockspace() returned -ENOENT as it wasn't found a NULL
lockspace handle, see [0].
After the patch it dlm_find_lockspace_local() will end in a
null-pointer dereference because we don't do this lookup of pointers
when we already should have this pointer already.
This behaviour is kind of weird (as it makes no sense to lookup NULL)
and yes, I didn't see that when I wrote commit 4db41bf4f04f ("dlm:
remove ls_local_handle from struct dlm_ls"). Just this commit signals
us we do something stupid with the error handling.
May I ask you which user space software do you use? As for
dlm_controld I can't see that there is a positive number case for
telling the event_done attribute. [1]
Your patch is fine anyway as a positive event_done should not be
handled as an error (so far I can say that from the current DLM UAPI
spec). We might need to review the whole error handling here again as
the caller is only interested if it failed or not. Even kernel
lockspaces clear positive error codes to zero whereas user space (our
case) doesn't do that.
- Alex
[0] https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/tree/fs/dlm/user.c?h=dlm-6.9#n431
[1] https://pagure.io/dlm/blob/main/f/dlm_controld/cpg.c#_1811
Powered by blists - more mailing lists