lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1185985900.6700.132.camel@localhost>
Date:	Wed, 01 Aug 2007 12:31:40 -0400
From:	Trond Myklebust <trond.myklebust@....uio.no>
To:	Raphael Manfredi <Raphael_Manfredi@...ox.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: [2.6.22.1] Copying to full NFS dir

On Wed, 2007-08-01 at 17:00 +0200, Raphael Manfredi wrote:
> I've stumbled into a problem running 2.6.22.1 on both my NFS client and
> my NFS server.  I've just upgraded from 2.4.31, so I have no idea whether
> this is a new problem or if it is known in the 2.6.x series.
> 
> Here's a high-level description of the context:
> 
> * The NFS server has a directory which is full.
> * That directory is mounted on the NFS client.
> * The NFS client tries to "mv local-file /nfs/remote-dir/"
> * local-file is big (typically 700 MiB).
> 
> What happens is:
> 
> * The "mv" takes a long long time and eventually fails, of course.
> * The load on the NFS server (initially at 0) increases to about 8.
> * Any access to the NFS-mounted dir from the client whilst "mv" is in
>   progress stalls (e.g. ls -l /nfs/remote-dir).
> 
> I've tried to write my own "mv" in C to see which syscalls were involved.
> What happens is:
> 
> * All the write() succeed with no error.
> * The final close() returns -1 with either EINTR or ENOSPC.
> 
> I could not determine what makes close return EINTR or ENOSPC.
> 
> Problem is, under 2.4.31, the write() was immediately failing when writing
> to a full NFS partition.
> 
> This looks like an important bug, but I don't know if it is in the NFS-client
> or NFS-server side.  I'm tempted to say NFS-server, but that's more a hunch.

The answer appears to be that some filesystems really _suck_ when they
have to return errors: they take forever to return to the user. When the
client then tries with several WRITE requests (it can cache huge numbers
of requests) then the cumulative effect of the delays are quite
noticeable as you can see above.

I've got a tentative client-side patch to deal with this sort of server.
Basically, when the client sees that a cached write returns an error,
then it will stop caching, and start doing O_SYNC-style writes until the
error conditions stop. That won't fix the server side problem, but it
does ensure that the application gets notified of the error as soon as
possible.

Cheers
  Trond

Download attachment "linux-2.6.23-011-osync_on_error.dif" of type "message/rfc822" (5972 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ