lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 11 Sep 2008 12:55:47 -0400
From:	Chuck Lever <chuck.lever@...cle.com>
To:	Aaron Straus <aaron@...finllc.com>
Cc:	Neil Brown <neilb@...e.de>,
	Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
	Trond Myklebust <trond.myklebust@....uio.no>,
	LKML Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

On Sep 9, 2008, at Sep 9, 2008, 3:46 PM, Aaron Straus wrote:
> Hi,
>
> On Sep 08 05:15 PM, Chuck Lever wrote:
>> I think I saw some recent work in Trond's development branch that
>> makes some changes in this area.  I will wait for him to respond to
>> this thread.
>
> One other piece of information.
>
> Of the bisected offending commit:
>
> commit e261f51f25b98c213e0b3d7f2109b117d714f69d
> Author: Trond Myklebust <Trond.Myklebust@...app.com>
> Date:   Tue Dec 5 00:35:41 2006 -0500
>
>    NFS: Make nfs_updatepage() mark the page as dirty.
>
>    This will ensure that we can call set_page_writeback() from within
>    nfs_writepage(), which is always called with the page lock set.
>
>    Signed-off-by: Trond Myklebust <Trond.Myklebust@...app.com>
>
>
> It seems to be this hunk which introduces the problem:
>
>
> @@ -628,7 +667,6 @@ static struct nfs_page *  
> nfs_update_request(struct nfs_open_context* ctx,
> 				return ERR_PTR(error);
> 			}
> 			spin_unlock(&nfsi->req_lock);
> -			nfs_mark_request_dirty(new);
> 			return new;
> 		}
> 		spin_unlock(&nfsi->req_lock);
>
>
>
> If I add that function call back in... the problem disappears.  I  
> don't
> know if this just papers over the real problem though?

It's possible that the NFS write path as it exists after this commit  
is not marking certain pages dirty at the same time as others.  An  
asynchronous flush call from the VM would catch some of the dirty  
pages, then later, others, resulting in the pages flushing to the  
server out of order.

However, I poked around a little bit in late model kernels yesterday  
and found that the changes from Trond I referred to earlier are  
already in 2.6.27.  These rework this area significantly, so reverting  
this hunk alone will be more difficult in recent kernels.

In addition I notice that one purpose of this commit is to change the  
page locking scheme in the NFS client's write path.  I wouldn't advise  
simply adding this nfs_mark_request_dirty() call back in.  It could  
result in deadlocks for certain workloads (perhaps not your specific  
workload, though).

A more thorough review of the NFS write and flush logic that exists in  
2.6.27 is needed if we choose to recognize this issue as a real problem.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ