[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99567c82-3d15-4ffa-9430-f51cf456ffbc@huawei.com>
Date: Wed, 22 Jan 2025 09:43:44 +0800
From: Li Lingfeng <lilingfeng3@...wei.com>
To: NeilBrown <neilb@...e.de>, Jeff Layton <jlayton@...nel.org>
CC: Chuck Lever <chuck.lever@...cle.com>, <okorniev@...hat.com>,
<Dai.Ngo@...cle.com>, <tom@...pey.com>, <linux-nfs@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <yukuai1@...weicloud.com>,
<houtao1@...wei.com>, <yi.zhang@...wei.com>, <yangerkun@...wei.com>,
<lilingfeng@...weicloud.com>
Subject: Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
在 2025/1/22 9:15, NeilBrown 写道:
> On Wed, 22 Jan 2025, Jeff Layton wrote:
>> On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
>>> On 1/14/25 2:39 PM, Jeff Layton wrote:
>>>> On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
>>>>> On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
>>>>>> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
>>>>>> list, gc may be triggered in another thread and immediately release this
>>>>>> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
>>>>>>
>>>>>> All the places where unhash is done will also perform lru_remove, so there
>>>>>> is no need to do lru_remove separately here. After inserting the nfsd_file
>>>>>> into the nfsd_file_lru list, it can be released by relying on gc.
>>>>>>
>>>>>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
>>>>>> Signed-off-by: Li Lingfeng <lilingfeng3@...wei.com>
>>>>>> ---
>>>>>> fs/nfsd/filecache.c | 12 ++----------
>>>>>> 1 file changed, 2 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>>> index a1cdba42c4fa..37b65cb1579a 100644
>>>>>> --- a/fs/nfsd/filecache.c
>>>>>> +++ b/fs/nfsd/filecache.c
>>>>>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>>>> /* Try to add it to the LRU. If that fails, decrement. */
>>>>>> if (nfsd_file_lru_add(nf)) {
>>>>>> /* If it's still hashed, we're done */
>>>>>> - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>>>>> + if (list_lru_count(&nfsd_file_lru))
>>>>>> nfsd_file_schedule_laundrette();
>>>>>> - return;
>>>>>> - }
>>>>>>
>>>>>> - /*
>>>>>> - * We're racing with unhashing, so try to remove it from
>>>>>> - * the LRU. If removal fails, then someone else already
>>>>>> - * has our reference.
>>>>>> - */
>>>>>> - if (!nfsd_file_lru_remove(nf))
>>>>>> - return;
>>>>>> + return;
>>>>>> }
>>>>>> }
>>>>>> if (refcount_dec_and_test(&nf->nf_ref))
>>>>> I think this looks OK. Filecache bugs are particularly nasty though, so
>>>>> let's run this through a nice long testing cycle.
>>>>>
>>>>> Reviewed-by: Jeff Layton <jlayton@...nel.org>
>>>> Actually, I take it back. This is problematic in another way.
>>>>
>>>> In this case, we're racing with another task that is unhashing the
>>>> object, but we've put it on the LRU ourselves. What guarantee do we
>>>> have that the unhashing and removal from the LRU didn't occur before
>>>> this task called nfsd_file_lru_add()? That's why we attempt to remove
>>>> it here -- we can't rely on the task that unhashed it to do so at that
>>>> point.
>>>>
>>>> What might be best is to take and hold the rcu_read_lock() before doing
>>>> the nfsd_file_lru_add, and just release it after we do these racy
>>>> checks. That should make it safe to access the object.
>>>>
>>>> Thoughts?
>>> Holding the RCU read lock will keep the dereferences safe since
>>> nfsd_file objects are freed only after an RCU grace period. But will the
>>> logic in nfsd_file_put() work properly on totally dead nfsd_file
>>> objects? I don't see a specific failure mode there, but I'm not very
>>> imaginative.
>>>
>>> Overall, I think RCU would help.
>>>
>>>
>> To be clear, I think we need to drop e57420be100ab from your nfsd-
>> testing branch. The race I identified above is quite likely to occur
>> and could lead to leaks.
>>
>> If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
>> think the RCU approach is safe.
> I'm not convinced this is the right approach.
> I cannot see how nfsd_file_put() can race with unhashing. If it cannot
> then we can simply unconditionally call nfsd_file_schedule_laundrette().
>
> Can describe how the race can happen - if indeed it can.
At the beginning, our testing team discovered this issue on 5.10 using
their own test cases (I'm not clear on how their test cases were
constructed).
Then, I found that there might be a problem here, so I reproduced the
phenomenon in the above way.
I will ask if their test cases can be shared externally or if they can
describe what their test cases did.
> Note that we also need rcu protection in nfsd_file_lru_add() so that the
> nf doesn't get freed after it is added the the lru and before the trace
> point. If we don't end up needing rcu round the call, we will need it
> in the call.
>
> Thanks,
> NeilBrown
Powered by blists - more mailing lists