lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090129100509.6969970f.akpm@linux-foundation.org>
Date:	Thu, 29 Jan 2009 10:05:09 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Adam Tkac <vonsch@...il.com>
Cc:	Peter Palfrader <weasel@...ian.org>, linux-kernel@...r.kernel.org,
	debian-admin@...ts.debian.org, team@...urity.debian.org,
	libpam-modules@...kages.debian.org, stable@...nel.org
Subject: Re: 2.6.28, rlimits, performance and debian etch

On Thu, 29 Jan 2009 13:19:00 +0100 Adam Tkac <vonsch@...il.com> wrote:

> On Tue, Jan 27, 2009 at 03:17:03PM -0800, Andrew Morton wrote:
> > On Wed, 21 Jan 2009 12:52:19 +0100
> > Peter Palfrader <weasel@...ian.org> wrote:
> > 
> > > Hi,
> > > 
> > > I spent several hours trying to get to the bottom of a serious
> > > performance issue that appeared on one of our servers after upgrading to
> > > 2.6.28.  In the end it's what could be considered a userspace bug that
> > > was triggered by a change in 2.6.28.  Since this might also affect other
> > > people I figured I'd at least document what I found here, and maybe we
> > > can even do something about it:
> > > 
> > > 
> > > So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
> > > the team maintaining our ftp archive complained that one of their
> > > scripts that previously ran in a few minutes still hadn't even come
> > > close to being done after an hour or so.  Downgrading to 2.6.27 fixed
> > > that.
> > > 
> > > Turns out that script is forking a lot and something in it or python or
> > > whereever closes all the file descriptors it doesn't want to pass on.
> > > That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
> > > closes them all with a few exceptions.
> > > 
> > > Turns out that takes a long time when your limit -n is now 2^20 (1048576).
> > > 
> > > With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
> > > now a thousand times that.
> > > 
> > > 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
> > > RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
> > > allows, as the title implies, to set the limit for number of files to
> > > infinity.
> > > 
> > > Closer investigation showed that the broken default ulimit did not apply
> > > to "system" processes (like stuff started from init).  In the end I
> > > could establish that all processes that passed through pam_limit at one
> > > point had the bad resource limit.
> > > 
> > > Apparently the pam library in Debian etch (4.0) initializes the limits
> > > to some default values when it doesn't have any settings in limit.conf
> > > to override them.  Turns out that for nofiles this is RLIM_INFINITY.
> > > Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
> > > package version 0.79-5 fixes that - tho I'm not sure what side effects
> > > that has.
> > > 
> > > Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
> > > uses a different pam (version).
> > > 
> > > 
> > > I'm a bit unsure where to go from here.  Maybe the pam library in etch
> > > should be fixed.  Maybe the patch should be reverted (but then it may be
> > > more correct now and that's what the changelog entry suggests).
> > > As a stopgap measure I could also just define nofile in limits.conf.
> > > 
> > > Thanks for listening.  Also thanks to Rik and Nocholas who helped track
> > > some of this down.
> > > 
> > > Cheers,
> > > Peter
> > > 1. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
> > >    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
> > 
> > Ho hum, thanks.
> > 
> > Well, I think we just revert it for now.  We can bring it back later
> > if someone is thus inclined.  Along with some sort of opt-in control,
> > perhaps in /proc.  Which defaults to "off".
> > 
> 
> Hi all,
> 
> I don't think the "rlim_infinity" patch should be reverted. Let me try
> to explain why.
> 
> First, code which sets limits to RLIM_INFINITY is very bad idea and
> that code is Debian specific. I downloaded original pam 0.79 (stripped):
> 
> for(i = 0; i < RLIM_NLIMITS; i++) {
> 	...
> 	int r = getrlimit(i, &pl->limits[i].limit);
> 	...
> 
> as you can see original pam sets limits to inherited defaults. After
> code written above Debian adds their patch:
> 
> if (limits_not_defined_in_limits_conf) {
> 	...
> 	case RLIMIT_NOFILE:
> 	...
> 		pl->limits[i].limit.rlim_cur = RLIM_INFINITY;
> 		pl->limits[i].limit.rlim_max = RLIM_INFINITY;
> 	...
> }
> 
> so as you can see inherited default limits are overriden to infinity.
> In my opinion Debian should revert their patch which is, at least,
> pretty incorrect and unsecure.
> 
> Next argument is POSIX compatibility (from setrlimit() specification):
> 
> "The value RLIM_INFINITY, defined in <sys/resource.h>, shall be
> considered to be larger than any other limit value. If a call to
> getrlimit() returns RLIM_INFINITY for a resource, it means the
> implementation shall not enforce limits on that resource. Specifying
> RLIM_INFINITY as any resource limit value on a successful call to
> setrlimit() shall inhibit enforcement of that resource limit."
> 
> So kernel does what is expected. If you want "unlimited" number of
> descriptors, you have it.
> 
> Please consider again where exactly problem is, if in Debian patch or
> in kernel patch. From my point of view Debian patch should be
> reverted, not the kernel one.
> 

Sure, debian might well be wrong.  But the bottom line is that the
kernel changed, and people's machines broke.

If the kernel change was really really important then we might just
grit our teeth and live with the breakage.  But this change _wasn't_ a
terribly important one.  So I think we should back it out while we find
another way of implementing it which does not break currently deployed
installations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ