linux-kernel - Re: 2.6.29.1: nfsd: page allocation failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090617224553.GQ24040@fieldses.org>
Date:	Wed, 17 Jun 2009 18:45:53 -0400
From:	"J. Bruce Fields" <bfields@...ldses.org>
To:	Michael Tokarev <mjt@....msk.ru>
Cc:	Justin Piszcz <jpiszcz@...idpixels.com>,
	linux-kernel@...r.kernel.org
Subject: Re: 2.6.29.1: nfsd: page allocation failure - nfsd or kernel
	problem?

On Thu, Jun 18, 2009 at 12:24:57AM +0400, Michael Tokarev wrote:
> J. Bruce Fields wrote:
>> On Wed, Jun 17, 2009 at 02:39:06PM +0400, Michael Tokarev wrote:
>>> Justin Piszcz wrote:
>>>>
>>>> On Wed, 17 Jun 2009, Michael Tokarev wrote:
>>>>
>>>>> Michael Tokarev wrote:
>>>>>> Justin Piszcz wrote:
>>>>> ...
>>>>>
>>>>> Justin, by the way, what's the underlying filesystem on the server?
>>>>>
>>>>> I've seen this error on 2 machines already (both running 2.6.29.x 
>>>>>  x86-64),
>>>>> and in both cases the filesystem on the server was xfs.  May this be
>>>>> related somehow to http://bugzilla.kernel.org/show_bug.cgi?id=13375 ?
>>>>> That one is different, but also about xfs and nfs.  I'm trying to
>>>>> reproduce the problem on different filesystem...
>>>> Hello, I am also running XFS on 2.6.29.x x86-64.
>>>>
>>>> For me, the error happened when I was running an XFSDUMP from a 
>>>> client  (and dumping) the stream over NFS to the XFS 
>>>> server/filesystem.  This is typically when the error occurs or 
>>>> during heavy I/O.
>>> Very similar load was here -- not xfsdump but tar and dump of an ext3
>>> filesystems.
>>>
>>> And no, it's NOT xfs-related: I can trigger the same issue easily on
>
> Note the NOT, in upper case ;)
>
>>> ext4 as well.  About 20 minutes of running 'dump' of another fs
>>> to the nfs mount and voila, nfs server reports the same page allocation
>>> failure.  Note that all file operations are still working, i.e. it
>>> produces good (not corrupted) files on the server.
>>
>> There's a possibly related report for 2.6.30 here:
>>
>> 	http://bugzilla.kernel.org/show_bug.cgi?id=13518
>
> Does not look similar.
>
> I repeated the issue here.  The slab which is growing here is buffer_head.
> It's growing slowly -- right now, after ~5 minutes of constant writes over
> nfs, its size is 428423 objects, growing at about 5000 objects/minute rate.
> When stopping writing, the cache shrinks slowly back to an acceptable
> size, probably when the data gets actually written to disk.

OK, so if it eventually shrinks back to normal then it's not really a
leak--perhaps there's some bad interaction between nfsd and the vm.

Could you explain in more detail what the symptoms are (other than just
a message in the logs).

--b.

>
> It looks like we need a bug entry for this :)
>
> I'll re-try 2.6.30 hopefully tomorrow.
>
> /mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/