linux-kernel - Re: [PATCH] nfs: handle failure of nfs_get_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21817f2c-2971-4568-9ae4-1ccc25f7f1ef@huawei.com>
Date: Sat, 19 Apr 2025 16:28:41 +0800
From: Li Lingfeng <lilingfeng3@...wei.com>
To: Jeff Layton <jlayton@...nel.org>, <trondmy@...nel.org>, <anna@...nel.org>,
	<bcodding@...hat.com>
CC: <linux-nfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<yukuai1@...weicloud.com>, <houtao1@...wei.com>, <yi.zhang@...wei.com>,
	<yangerkun@...wei.com>, <lilingfeng@...weicloud.com>
Subject: Re: [PATCH] nfs: handle failure of nfs_get_lock_context in unlock
 path


在 2025/4/17 20:43, Jeff Layton 写道:
> On Thu, 2025-04-17 at 20:24 +0800, Li Lingfeng wrote:
>> 在 2025/4/17 18:29, Jeff Layton 写道:
>>> On Thu, 2025-04-17 at 15:25 +0800, Li Lingfeng wrote:
>>>> When memory is insufficient, the allocation of nfs_lock_context in
>>>> nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat
>>>> an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM)
>>>> as valid and proceed to execute rpc_run_task(), this will trigger a NULL
>>>> pointer dereference in nfs4_locku_prepare. For example:
>>>>
>>>> BUG: kernel NULL pointer dereference, address: 000000000000000c
>>>> PGD 0 P4D 0
>>>> Oops: Oops: 0000 [#1] SMP PTI
>>>> CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40
>>>> Workqueue: rpciod rpc_async_schedule
>>>> RIP: 0010:nfs4_locku_prepare+0x35/0xc2
>>>> Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3
>>>> RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246
>>>> RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000
>>>> RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40
>>>> RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38
>>>> R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030
>>>> R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30
>>>> FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0
>>>> Call Trace:
>>>>    <TASK>
>>>>    __rpc_execute+0xbc/0x480
>>>>    rpc_async_schedule+0x2f/0x40
>>>>    process_one_work+0x232/0x5d0
>>>>    worker_thread+0x1da/0x3d0
>>>>    ? __pfx_worker_thread+0x10/0x10
>>>>    kthread+0x10d/0x240
>>>>    ? __pfx_kthread+0x10/0x10
>>>>    ret_from_fork+0x34/0x50
>>>>    ? __pfx_kthread+0x10/0x10
>>>>    ret_from_fork_asm+0x1a/0x30
>>>>    </TASK>
>>>> Modules linked in:
>>>> CR2: 000000000000000c
>>>> ---[ end trace 0000000000000000 ]---
>>>>
>>>> Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and
>>>> return NULL to terminate subsequent rpc_run_task, preventing NULL pointer
>>>> dereference.
>>>>
>>>> Fixes: f30cb757f680 ("NFS: Always wait for I/O completion before unlock")
>>>> Signed-off-by: Li Lingfeng <lilingfeng3@...wei.com>
>>>> ---
>>>>    fs/nfs/nfs4proc.c | 9 ++++++++-
>>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index 970f28dbf253..9f5689c43a50 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -7074,10 +7074,18 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
>>>>    	struct nfs4_unlockdata *p;
>>>>    	struct nfs4_state *state = lsp->ls_state;
>>>>    	struct inode *inode = state->inode;
>>>> +	struct nfs_lock_context *l_ctx;
>>>>    
>>>>    	p = kzalloc(sizeof(*p), GFP_KERNEL);
>>>>    	if (p == NULL)
>>>>    		return NULL;
>>>> +	l_ctx = nfs_get_lock_context(ctx);
>>>> +	if (!IS_ERR(l_ctx)) {
>>>> +		p->l_ctx = l_ctx;
>>>> +	} else {
>>>> +		kfree(p);
>>>> +		return NULL;
>>>> +	}
>>>>    	p->arg.fh = NFS_FH(inode);
>>>>    	p->arg.fl = &p->fl;
>>>>    	p->arg.seqid = seqid;
>>>> @@ -7085,7 +7093,6 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
>>>>    	p->lsp = lsp;
>>>>    	/* Ensure we don't close file until we're done freeing locks! */
>>>>    	p->ctx = get_nfs_open_context(ctx);
>>> Not exactly the same problem, but get_nfs_open_context() can fail too.
>>> Does it need error handling for that as well?
>> Hi,
>>
>> IIUC, nfs_open_context is allocated during file open and attached to
>> filp->private_data. Upon successful file opening, the context remains valid.
>> Post-lock acquisition, nfs_open_context can be retrieved via
>> file_lock->file->nfs_open_context chain. Thus get_nfs_open_context() here
>> should have non-failure guarantee in standard code paths.
>
> I'm not so sure. This function can get called from the rpc_release
> callback for a LOCK request:
>
> ->rpc_release
>      nfs4_lock_release
> 	nfs4_do_unlck
> 	    nfs4_alloc_unlockdata
>
> Can that happen after the open_ctx->lock_context.count goes to 0?
>
> Given that we have a safe failure path in this code, it seems like we
> ought to check for that here, just to be safe. If it really shouldn't
> happen like you say, then we could throw in a WARN_ON_ONCE() too.
Thank you for raising this concern.
During file open, the nfs_open_context is allocated, and
open_ctx->lock_context.count is initialized to 1. Based on the current
flow, I think it's unlikely for this counter to reach 0 during lock/unlock
operations since its decrement is tied to file closure.

However, I agree with your suggestion to add checks when
get_nfs_open_context fails. Furthermore, this check might also be
necessary not only in the unlock path but potentially in the lock path if
get_nfs_open_contextb fails there as well.

Additionally, I noticed that both the lock and unlock release callbacks
dereference nfs_open_context. If get_nfs_open_context were to fail
(assuming such a scenario is possible), this could lead to a NULL pointer
dereference. Instead of relying solely on WARN_ON_ONCE(), it might be
safer to halt the operation immediately upon detecting a failure in
get_nfs_open_context.

// unlock
nfs4_locku_release_calldata
  put_nfs_open_context
    __put_nfs_open_context
     // dereference nfs_open_context

// lock
nfs4_lock_release
  nfs4_do_unlck
   // dereference nfs_open_context
  put_nfs_open_context
   // dereference nfs_open_context

I'll incorporate your feedback and send a patchset soon.
>
>>>> -	p->l_ctx = nfs_get_lock_context(ctx);
>>>>    	locks_init_lock(&p->fl);
>>>>    	locks_copy_lock(&p->fl, fl);
>>>>    	p->server = NFS_SERVER(inode);
>>> Good catch:
>>>
>>> Reviewed-by: Jeff Layton <jlayton@...nel.org>
>>>