[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM5tNy4AYfJ+AhX-UJ_orvuOkv=ctg=oCHrrOjTcfLz+rRrCsg@mail.gmail.com>
Date: Mon, 2 Dec 2024 09:06:06 -0800
From: Rick Macklem <rick.macklem@...il.com>
To: Chuck Lever <chuck.lever@...cle.com>
Cc: lilingfeng@...weicloud.com, "zhangjian (CG)" <zhangjian496@...wei.com>, jlayton@...nel.org,
neilb@...e.de, okorniev@...hat.com, kolga@...app.com, Dai.Ngo@...cle.com,
tom@...pey.com, Trond.Myklebust@...app.com, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org, yukuai1@...weicloud.com, houtao1@...wei.com,
yi.zhang@...wei.com, yangerkun@...wei.com, lilingfeng3@...wei.com
Subject: Re: [PATCH] nfsd: set acl_access/acl_default after getting successful
On Mon, Dec 2, 2024 at 8:33 AM Chuck Lever <chuck.lever@...cle.com> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@...uelph.ca.
>
>
> On Wed, Nov 27, 2024 at 07:37:42PM -0800, Rick Macklem wrote:
> > On Wed, Nov 27, 2024 at 7:14 PM Rick Macklem <rick.macklem@...il.com> wrote:
> > >
> > > On Wed, Nov 27, 2024 at 5:18 PM zhangjian (CG) <zhangjian496@...wei.com> wrote:
> > > >
> > > > there is one case when disk error cause get_inode_acl(inode,
> > > > ACL_TYPE_DEFAULT) return EIO,
> > > > resp->acl_access will not be null. posix_acl_release(resp->acl_default)
> > > > will trigger this warning.
> > > >
> > > >
> > > > > If getting acl_default fails, acl_access and acl_default will be released
> > > > > simultaneously. However, acl_access will still retain a pointer pointing
> > > > > to the released posix_acl, which will trigger a WARNING in
> > > > > nfs3svc_release_getacl like this:
> > > > >
> > > > > ------------[ cut here ]------------
> > > > > refcount_t: underflow; use-after-free.
> > > > > WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28
> > > > > refcount_warn_saturate+0xb5/0x170
> > > > > Modules linked in:
> > > > > CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted
> > > > > 6.12.0-rc6-00079-g04ae226af01f-dirty #8
> > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > > > 1.16.1-2.fc37 04/01/2014
> > > > > RIP: 0010:refcount_warn_saturate+0xb5/0x170
> > > > > Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75
> > > > > e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb
> > > > > cd 0f b6 1d 8a3
> > > > > RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde
> > > > > RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380
> > > > > RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56
> > > > > R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001
> > > > > R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0
> > > > > FS: 0000000000000000(0000) GS:ffff88871ed00000(0000)
> > > > > knlGS:0000000000000000
> > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0
> > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > Call Trace:
> > > > > <TASK>
> > > > > ? refcount_warn_saturate+0xb5/0x170
> > > > > ? __warn+0xa5/0x140
> > > > > ? refcount_warn_saturate+0xb5/0x170
> > > > > ? report_bug+0x1b1/0x1e0
> > > > > ? handle_bug+0x53/0xa0
> > > > > ? exc_invalid_op+0x17/0x40
> > > > > ? asm_exc_invalid_op+0x1a/0x20
> > > > > ? tick_nohz_tick_stopped+0x1e/0x40
> > > > > ? refcount_warn_saturate+0xb5/0x170
> > > > > ? refcount_warn_saturate+0xb5/0x170
> > > > > nfs3svc_release_getacl+0xc9/0xe0
> > > > > svc_process_common+0x5db/0xb60
> > > > > ? __pfx_svc_process_common+0x10/0x10
> > > > > ? __rcu_read_unlock+0x69/0xa0
> > > > > ? __pfx_nfsd_dispatch+0x10/0x10
> > > > > ? svc_xprt_received+0xa1/0x120
> > > > > ? xdr_init_decode+0x11d/0x190
> > > > > svc_process+0x2a7/0x330
> > > > > svc_handle_xprt+0x69d/0x940
> > > > > svc_recv+0x180/0x2d0
> > > > > nfsd+0x168/0x200
> > > > > ? __pfx_nfsd+0x10/0x10
> > > > > kthread+0x1a2/0x1e0
> > > > > ? kthread+0xf4/0x1e0
> > > > > ? __pfx_kthread+0x10/0x10
> > > > > ret_from_fork+0x34/0x60
> > > > > ? __pfx_kthread+0x10/0x10
> > > > > ret_from_fork_asm+0x1a/0x30
> > > > > </TASK>
> > > > > Kernel panic - not syncing: kernel: panic_on_warn set ...
> > > > >
> > > > > Clear acl_access/acl_default first and set both of them only when both
> > > > > ACLs are successfully obtained.
> > > > >
> > > > > Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.")
> > > > > Signed-off-by: Li Lingfeng <lilingfeng@...weicloud.com>
> > > > > ---
> > > > > fs/nfsd/nfs3acl.c | 14 ++++++++------
> > > > > 1 file changed, 8 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
> > > > > index 5e34e98db969..17579a032a5b 100644
> > > > > --- a/fs/nfsd/nfs3acl.c
> > > > > +++ b/fs/nfsd/nfs3acl.c
> > > > > @@ -29,10 +29,12 @@ static __be32 nfsd3_proc_getacl(struct svc_rqst *rqstp)
> > > > > {
> > > > > struct nfsd3_getaclargs *argp = rqstp->rq_argp;
> > > > > struct nfsd3_getaclres *resp = rqstp->rq_resp;
> > > > > - struct posix_acl *acl;
> > > > > + struct posix_acl *acl = NULL, *dacl = NULL;
> > > > > struct inode *inode;
> > > > > svc_fh *fh;
> > > > >
> > > > > + resp->acl_access = NULL;
> > > > > + resp->acl_default = NULL;
> > > (A) These two lines fix the bug, without other changes needed, I think...
> > Oops, I was wrong w.r.t. this. These two lines need to be repeated after the
> > posix_acl_relase() calls under "fail:".
> > > > > fh = fh_copy(&resp->fh, &argp->fh);
> > > > > resp->status = fh_verify(rqstp, &resp->fh, 0, NFSD_MAY_NOP);
> > > > > if (resp->status != nfs_ok)
> > > > > @@ -56,19 +58,19 @@ static __be32 nfsd3_proc_getacl(struct svc_rqst *rqstp)
> > > > > resp->status = nfserrno(PTR_ERR(acl));
> > > > > goto fail;
> > > > > }
> > > > > - resp->acl_access = acl;
> > > Because you deleted this line...
> > > > > }
> > > > > if (resp->mask & (NFS_DFACL|NFS_DFACLCNT)) {
> > > > > /* Check how Solaris handles requests for the Default ACL
> > > > > of a non-directory! */
> > > > > - acl = get_inode_acl(inode, ACL_TYPE_DEFAULT);
> > > > > - if (IS_ERR(acl)) {
> > > > > - resp->status = nfserrno(PTR_ERR(acl));
> > > > > + dacl = get_inode_acl(inode, ACL_TYPE_DEFAULT);
> > > > > + if (IS_ERR(dacl)) {
> > > > > + resp->status = nfserrno(PTR_ERR(dacl));
> > > > > goto fail;
> > > The goto fail here will not release the access acl, if I read the code
> > > correctly.
> > > > > }
> > > > > - resp->acl_default = acl;
> > > > > }
> > > > >
> > > > > + resp->acl_access = acl;
> > > > > + resp->acl_default = dacl;
> > > > > /* resp->acl_{access,default} are released in nfs3svc_release_getacl. */
> > > > > out:
> > > > > return rpc_success;
> > > >
> > > Actually, all that is needed to fix the bug is adding the two lines
> > > that initialize
> > > them both NUL, I think.. I marked that change (A) above.
> > Nope, I was wrong w.r.t. this part. You either need to set
> > resp->acl_access = acl;
> > resp->acl_default = dacl;
> > after the posix_acl_relase() calls or switch to using the local
> > acl and dacl variables for these posix_acl_release calls and stick
> > with what you did above w.r.t. resp->acl_access and resp->acl_default.
> >
> > Anyhow, I think the case I noted above where get_inode_acl(inode,
> > ACL_TYPE_DEFAULT)
> > fails will not release acl with your patch.
> >
> > rick
>
> Howdy -
>
> This one didn't make it into v6.13 because there are some
> outstanding (and ambiguous, at least to me) review comments. Can you
> address the comments, update the patch, and post it again?
In an effort to disambiguate my comments, I'll try again.
(And, yes, I did not do a good job last time;-)
1 - I think your patch fails for the case where the acl_access is acquired,
but the acl_default fails. For this case, I do not see how acl_access
would be posix_acl_release()'d and would leak the acl structure.
If you look at your patched version, resp->acl_access is not set when
the "goto fail" is executed in the default acl if block.
2 - Instead of your patch, the bug can be fixed by simply adding the two
lines:
resp->acl_access = NULL;
resp->acl_default = NULL;
after the posix_acl_release() calls below the "fail" label.
No other changes are required, from what I can see.
rick
>
> --
> Chuck Lever
>
Powered by blists - more mailing lists