lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20080619184626.GH18557@fieldses.org>
Date:	Thu, 19 Jun 2008 14:46:26 -0400
From:	"J. Bruce Fields" <bfields@...ldses.org>
To:	"Weathers, Norman R." <Norman.R.Weathers@...ocophillips.com>
Cc:	Jeff Layton <jlayton@...chiereds.net>,
	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org,
	Neil Brown <neilb@...e.de>
Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?

On Thu, Jun 19, 2008 at 10:53:28AM -0500, Weathers, Norman R. wrote:
> The kernel that we were really seeing the problem with was 2.6.25.4, but
> I think we may have figured out the 4096 problem, and it was probably a
> mistake on my part, but it is important for the NFS users to see it so
> they don't make the same mistake.  I had found some performance tuning
> guides, and in trying some of the suggestions, found that the setting
> changes did seem to help on some things, but of course I never got to
> run a check under full load (800 + clients).  A suggestion was to change
> the tcp_reordering tunable under /proc/sys/net/ipv4 from the default 3
> to 127.  We think that this was actually causing the issue.  I was able
> to trace back through all of the changes, and I changed this setting
> back to the default 3, and it immediately fixed the size-4096 hell.  It
> appears that the reordering just eats into the memory, especially in
> high demand situations, and I guess that should make perfect sense if we
> are actually buffering up packets for reorder, and we are slamming the
> box with thousands of requests per minute.

OK, sounds plausible, though I won't pretend to understand exactly how
that reordering code is using memory.

> We still have other performance issues now, but it appears to be more of
> a bottleneck, the nodes do not appear to be backing off when the servers
> are becoming congested.
...
> > So with that many clients all making requests to the server at once,
> > we'd start hitting that (serv->sv_nrthreads+3)*20 limit when 
> > the number
> > of threads was set to less than 30-50.  That doesn't seem to be the
> > point where you're seeing a change in behavior, though.
> > 
> 
> We were estimating between 40 and 50 threads was the cut off for being
> able to service all of the (current) requests at once.  I haven't ramped
> back up to that level yet.  I wasn't comfortable yet with letting it all
> hang back out just in case we get into that hellish mode again, it can
> be a pain to try and get into those systems once they are overloaded
> (even over serial, sometimes it can just timeout the login).  We had to
> actually bring online a second option to help alleviate some of the back
> congestion because the servers couldn't handle the workload.  

Thanks for the update, and let us know if you figure out anything more.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ