[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d98f9dc4-1122-5e39-c09a-05c403b5a163@gmail.com>
Date: Wed, 4 Nov 2020 23:51:05 +0800
From: Wenle Chen <solomonchenclever@...il.com>
To: Olga Kornievskaia <aglo@...ch.edu>
Cc: Trond Myklebust <trondmy@...merspace.com>,
"anna.schumaker@...app.com" <anna.schumaker@...app.com>,
"chenwenle@...wei.com" <chenwenle@...wei.com>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"nixiaoming@...wei.com" <nixiaoming@...wei.com>
Subject: Re: [PATCH 2/2] NFS: Limit the number of retries
Olga Kornievskaia 於 2020/11/4 下午9:22 寫道:
> On Wed, Nov 4, 2020 at 6:36 AM Wenle Chen <solomonchenclever@...il.com> wrote:
>>
>>
>>
>> Trond Myklebust 於 2020/11/3 上午1:45 寫道:
>>> On Tue, 2020-11-03 at 00:24 +0800, Wenle Chen wrote:
>>>> We can't wait forever, even if the state
>>>> is always delayed.
>>>>
>>>> Signed-off-by: Wenle Chen <chenwenle@...wei.com>
>>>> ---
>>>> fs/nfs/nfs4proc.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index f6b5dc792b33..bb2316bf13f6 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -7390,15 +7390,17 @@ int nfs4_lock_delegation_recall(struct
>>>> file_lock *fl, struct nfs4_state *state,
>>>> {
>>>> struct nfs_server *server = NFS_SERVER(state->inode);
>>>> int err;
>>>> + int retry = 3;
>>>>
>>>> err = nfs4_set_lock_state(state, fl);
>>>> if (err != 0)
>>>> return err;
>>>> do {
>>>> err = _nfs4_do_setlk(state, F_SETLK, fl,
>>>> NFS_LOCK_NEW);
>>>> - if (err != -NFS4ERR_DELAY)
>>>> + if (err != -NFS4ERR_DELAY || retry == 0)
>>>> break;
>>>> ssleep(1);
>>>> + --retry;
>>>> } while (1);
>>>> return nfs4_handle_delegation_recall_error(server, state,
>>>> stateid, fl, err);
>>>> }
>>>
>>> This patch will just cause the locks to be silently lost, no?
>>>
>> This loop was introduced in commit 3d7a9520f0c3e to simplify the delay
>> retry loop. Before this, the function nfs4_lock_delegation_recall would
>> return a -EAGAIN to do a whole retry loop.
>
> This commit was not simplifying retry but actually handling the error.
> Without it the error isn't handled and client falsely thinks it holds
> the lock. Limiting the number of retries as Trond points out would
> lead to the same problem which in the end is data corruption.
> Alternative would be to fail the application. However ERR_DELAY is a
> transient error and the server would, when ready, return something
> else. If server is broken and continues to do so then the server needs
> to be fix (client isn't coded to the broken server). I don't see a
> good argument for limiting the number of re-tries.
>
>> When we retried three times and waited three seconds, it was still in
>> delay. I think we can get a whole loop and check the other points if it
>> was changed or not. It is just a proposal.
In the function nfs_end_delegation_return, it would get the return
err=-EAGAIN and check the client is active and get a retry. I has so
thought. Maybe I think wrong. I will understand more carefully. Thinks.
Powered by blists - more mailing lists