[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tzpsh4xk.fsf@hades.wkstn.nix>
Date: Tue, 18 Sep 2007 07:18:47 +0100
From: Nix <nix@...eri.org.uk>
To: "J. Bruce Fields" <bfields@...ldses.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [2.6.22.6] nfsd: fh_verify() `malloc failure' with lots of free memory leads to NFS hang
On 18 Sep 2007, J. Bruce Fields stated:
> On Tue, Sep 18, 2007 at 12:54:07AM +0100, Nix wrote:
>> The code which calls new_do_write() looks like this:
>>
>> ,----[ libio/fileops.c:_IO_new_file_xsputn() ]
>> | if (do_write)
>> | {
>> | count = new_do_write (f, s, do_write);
>> | to_do -= count;
>> | if (count < do_write)
>> | return n - to_do;
>> | }
>> `----
>>
>> This code handles partial writes followed by errors by returning a
>> suitable nonzero value, and immediate errors by returning -1.
>>
>> In either case the buffer will have been filled as much as possible by
>> that point, and will still be filled when (vf)printf() is next called.
>
> OK, I'm a little lost at this point (what's n? What's to_do?), but I'll
> take your word for it.
n is the total amount to write: to_do is the amount still unwritten. The
rest is buffered but written.
> I'd be kinda curious when exactly the behavior changed and why.
Same here. I'm sort of surprised that it *did* change. The last change
to anything in that function was in 2005, and the fragment shown is
ten years old.
I suspect some other change made it easier to see this pre-existing
behaviour. I wonder if something in glibc used to call __fpurge() for
you?
> Also I suppose we should check which version of nfs-utils that fix is in
> and make sure distributions are getting the fixed nfs-utils before they
> get the new libc, or we're going to see this bug a lot....
And since it looks like a kernel bug idiots like me are going to keep on
bugging the l-k list with a non-kernel bug.
> Let me know if the problem's fixed.
Well, the machine's still running after more than six hours, where before
it would freeze solid in half an hour or less.
So I'd say it's fixed :)
--
`Some people don't think performance issues are "real bugs", and I think
such people shouldn't be allowed to program.' --- Linus Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists