linux-kernel - Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors from getattr call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F959A36.2080402@RedHat.com>
Date:	Mon, 23 Apr 2012 14:06:46 -0400
From:	Steve Dickson <SteveD@...hat.com>
To:	Jeff Layton <jlayton@...hat.com>
CC:	linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org,
	linux-kernel@...r.kernel.org, miklos@...redi.hu,
	viro@...IV.linux.org.uk, hch@...radead.org,
	michael.brantley@...haw.com, sven.breuner@...m.fraunhofer.de,
	chuck.lever@...cle.com, pstaubach@...grid.com, malahal@...ibm.com,
	bfields@...ldses.org, trond.myklebust@....uio.no, rees@...ch.edu
Subject: Re: [PATCH RFC v3] vfs: make fstatat retry once on ESTALE errors
 from getattr call



On 04/23/2012 11:32 AM, Jeff Layton wrote:
> On Mon, 23 Apr 2012 10:55:24 -0400
> Steve Dickson <SteveD@...hat.com> wrote:
> 
>>
>>
>> On 04/20/2012 05:13 PM, Jeff Layton wrote:
>>> On Fri, 20 Apr 2012 16:18:37 -0400
>>> Steve Dickson <SteveD@...hat.com> wrote:
>>>
>>>> On 04/20/2012 10:40 AM, Jeff Layton wrote:
>>>>> I guess the questions at this point is:
>>>>>
>>>>> 1) How representative is Peter's mkdir_test() of a real-world workload?
>>>> Reading your email I had to wonder the same thing... What application 
>>>> removes hierarchy of directories in a loop from two different clients?
>>>> I would suspect not many, if any... esp over NFS... 
>>>>  
>>>
>>> Peter's test just happens to demonstrate the problem well, but one
>>> could envision someone removing a heirarchy of directories on the
>>> server while we're trying to do other operations in it. At that point,
>>> we can easily end up hitting an ESTALE twice while doing the lookup and
>>> returning ESTALE back to userspace.
>> Just curious, what happens when you run Peter's mkdir_test() on a
>> local file system? Any errors returned? 
>>
>> I would think removing hierarchy of directories while they are being 
>> accessed has to even cause local fs some type of havoc
>>
> 
> Peter's test only treats an ESTALE error as a failure since it was
> specifically designed to ensure that those didn't make it in to
> userspace.
> 
> If you run 2 copies on the same local fs and strace it, then you'll see
> the syscalls get back things like ENOENT or EEXIST as they step on each
> others' toes in the mkdir()/rmdir() calls.
I figured as much... I just don't see any real world applications remove
directory hierarchies without some type of synchronization locking...
 
> 
>>>
>>>>>
>>>>> 2) if we assume that it is fairly representative of one, how can we
>>>>> achieve retrying indefinitely with NFS, or at least some large finite
>>>>> amount?
>>>> The amount of looping would be peer speculation. If the problem can
>>>> not be handled by one simple retry I would say we simply pass the
>>>> error up to the app... Its an application issue... 
>>>>  
>>>
>>> It's not an application issue. The application just asked the kernel
>>> to do an operation on a pathname. The only reason you're getting an
>>> ESTALE back in this situation is a shortcoming of the implementation.
>>>
>>> We passed it a pathname after all, not a filehandle. ESTALE really has
>>> no place as a return code in that situation...
>> We'll have to agree to disagree... I think any application that is 
>> removing hierarchies of file and directory w/out taking any 
>> precautionary locking is a shortcoming of the application
>> implementation.
>>
> 
> I'm not saying they should never get an error in that situation. I'm
> just saying that an ESTALE return in this situation is wrong (or at
> least not helpful) since the syscall was provided a pathname not a
> filehandle or open fd or anything. When we still have the pathname,
> then we have the ability to reattempt on an ESTALE, and it would be
> preferable to do so.
Point. But if the reestablishment can not be done in one try, the
I say we punt... 

> 
>>>
>>>>>
>>>>> I have my doubts as to whether it would really be as big a problem for
>>>>> other filesystems as Miklos and others have asserted, but I'll take
>>>>> their word for it at the moment. What's the best way to contain this
>>>>> behavior to just those filesystems that want to retry indefinitely when
>>>>> they get an ESTALE? Would we need to go with an entirely new
>>>>> ESTALERETRY after all?
>>>>>
>>>> Introducing a new errno to handle this problem would be overkill IMHO...
>>>>
>>>> If we have to go to the looping approach, I would strong suggest we
>>>> make the file systems register for this type of behavior...
>>>>
>>>
>>> Returning ESTALERETRY would be registering for it in a way and it is
>>> somewhat cleaner than having to go all the way back up to the fstype to
>>> figure out whether you want to retry it or not.
>> How would legacy apps handle this new errno, esp if they have logic
>> to take care of ESTALE errors?
>>
> 
> Userspace should never see that error. 
Why do you say this?  ESTALE the errno has been around forever... 
Its defined in the errno man page "ESTALE - Stale file handle (POSIX.1)" 

> The idea is that this would be a
> kernel-internal error code that indicates to the VFS that it should
> retry the lookup and operation. If the kernel decides to give up after
> the FS returns ESTALERETRY, then we'd have to convert that error
> into ESTALE.
Yeah... I understand the idea... I just don't think another error
code is needed to handle this problem... 

> 
> It'd be preferable to me if we didn't require a new error code, but if
> different filesystems require different semantics from the VFS on an
> ESTALE return, then that is one way to achieve it.
> 
Well I thought the use of the fs_flags to register for this type
of semantics was a good one...

steved.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/