linux-kernel - Re: [PATCH 2/2] NFS: Limit the number of retries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d98f9dc4-1122-5e39-c09a-05c403b5a163@gmail.com>
Date:   Wed, 4 Nov 2020 23:51:05 +0800
From:   Wenle Chen <solomonchenclever@...il.com>
To:     Olga Kornievskaia <aglo@...ch.edu>
Cc:     Trond Myklebust <trondmy@...merspace.com>,
        "anna.schumaker@...app.com" <anna.schumaker@...app.com>,
        "chenwenle@...wei.com" <chenwenle@...wei.com>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "nixiaoming@...wei.com" <nixiaoming@...wei.com>
Subject: Re: [PATCH 2/2] NFS: Limit the number of retries



Olga Kornievskaia 於 2020/11/4 下午9:22 寫道:
> On Wed, Nov 4, 2020 at 6:36 AM Wenle Chen <solomonchenclever@...il.com> wrote:
>>
>>
>>
>> Trond Myklebust 於 2020/11/3 上午1:45 寫道:
>>> On Tue, 2020-11-03 at 00:24 +0800, Wenle Chen wrote:
>>>>     We can't wait forever, even if the state
>>>> is always delayed.
>>>>
>>>> Signed-off-by: Wenle Chen <chenwenle@...wei.com>
>>>> ---
>>>>    fs/nfs/nfs4proc.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index f6b5dc792b33..bb2316bf13f6 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -7390,15 +7390,17 @@ int nfs4_lock_delegation_recall(struct
>>>> file_lock *fl, struct nfs4_state *state,
>>>>    {
>>>>           struct nfs_server *server = NFS_SERVER(state->inode);
>>>>           int err;
>>>> +       int retry = 3;
>>>>
>>>>           err = nfs4_set_lock_state(state, fl);
>>>>           if (err != 0)
>>>>                   return err;
>>>>           do {
>>>>                   err = _nfs4_do_setlk(state, F_SETLK, fl,
>>>> NFS_LOCK_NEW);
>>>> -               if (err != -NFS4ERR_DELAY)
>>>> +               if (err != -NFS4ERR_DELAY || retry == 0)
>>>>                           break;
>>>>                   ssleep(1);
>>>> +               --retry;
>>>>           } while (1);
>>>>           return nfs4_handle_delegation_recall_error(server, state,
>>>> stateid, fl, err);
>>>>    }
>>>
>>> This patch will just cause the locks to be silently lost, no?
>>>
>> This loop was introduced in commit 3d7a9520f0c3e to simplify the delay
>> retry loop. Before this, the function nfs4_lock_delegation_recall would
>> return a -EAGAIN to do a whole retry loop.
> 
> This commit was not simplifying retry but actually handling the error.
> Without it the error isn't handled and client falsely thinks it holds
> the lock. Limiting the number of retries as Trond points out would
> lead to the same problem which in the end is data corruption.
> Alternative would be to fail the application. However ERR_DELAY is a
> transient error and the server would, when ready, return something
> else. If server is broken and continues to do so then the server needs
> to be fix (client isn't coded to the broken server). I don't see a
> good argument for limiting the number of re-tries.
> 
>> When we retried three times and waited three seconds, it was still in
>> delay. I think we can get a whole loop and check the other points if it
>> was changed or not. It is just a proposal.
In the function nfs_end_delegation_return, it would get the return 
err=-EAGAIN and check the client is active and get a retry. I has so 
thought. Maybe I think wrong. I will understand more carefully. Thinks.