linux-kernel - Re: [Bug 9182] Critical memory leak (dirty pages)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0712161016290.25065@bizon.gios.gov.pl>
Date:	Sun, 16 Dec 2007 10:33:20 +0100 (CET)
From:	Krzysztof Oledzki <olel@....pl>
To:	Andrew Morton <akpm@...ux-foundation.org>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Osterried <osterried@...se.de>, protasnb@...il.com,
	bugme-daemon@...zilla.kernel.org
Subject: Re: [Bug 9182] Critical memory leak (dirty pages)



On Sat, 15 Dec 2007, Andrew Morton wrote:

> On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki <olel@....pl> wrote:
>
>>
>>
>> On Sat, 15 Dec 2007, bugme-daemon@...zilla.kernel.org wrote:
>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=9182
>>>
>>>
>>> ------- Comment #33 from protasnb@...il.com  2007-12-15 14:19 -------
>>> Krzysztof, I'd hate point you to a hard path (at least time consuming), but
>>> you've done a lot of digging by now anyway. How about git bisecting between
>>> 2.6.20-rc2 and rc1? Here is great info on bisecting:
>>> http://www.kernel.org/doc/local/git-quick.html
>>
>> As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
>> as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
>> So it took me only 2 reboots. ;)
>>
>> The guilty patch is the one I proposed just an hour ago:
>>   http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9
>>
>> So:
>>   - 2.6.20-rc1: OK
>>   - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
>>   - 2.6.20-rc1-git8: very BAD
>>   - 2.6.20-rc2: very BAD
>>   - 2.6.20-rc4: very BAD
>>   - >= 2.6.20: BAD (but not *very* BAD!)
>>
>
> well..  We have code which has been used by *everyone* for a year and it's
> misbehaving for you alone.

No, not for me alone. Probably only I and Thomas Osterried have systems 
where it is so easy to reproduce. Please note that the problem exists on 
my all systems, but only on one it is critical. It is enough to run
"sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo" to be sure. 
With =>2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only 
on one it goes to ~200MB in about 2 weeks and then everything dies:
http://bugzilla.kernel.org/attachment.cgi?id=13824
http://bugzilla.kernel.org/attachment.cgi?id=13825
http://bugzilla.kernel.org/attachment.cgi?id=13826
http://bugzilla.kernel.org/attachment.cgi?id=13827

>  I wonder what you're doing that is different/special.
Me to. :|

> Which filesystem, which mount options

  - ext3 on RAID1 (MD): / - rootflags=data=journal
  - ext3 on LVM on RAID5 (MD)
  - nfs

/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 (rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 (rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 (rw,nosuid,nodev,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs (ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)


> what sort of workload?
Different, depending on a host: mail (postfix + amavisd + spamassasin + 
clamav + sqlgray), squid, mysql, apache, nfs, rsync, .... But it seems 
that the biggest problem is on the host running mentioned mail service.

Thanks.

Best regards,

 				Krzysztof Olędzki