lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 16 Jun 2010 12:39:05 -0700
From:	Mitchell Erblich <erblichs@...thlink.net>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Subject: Re: Proposed linux kernel changes : scaling  tcp/ip stack


On Jun 16, 2010, at 2:10 AM, Andi Kleen wrote:

> Mitchell Erblich <erblichs@...thlink.net> writes:
>> 
>> Summary: Don't use last free pages for TCP ACKs with GFP_ATOMIC for our
>> sk buf allocs. 1 line change in tcp_output.c with a new gfp.h arg, and a change
>> in the generic kernel. TBD.
>> 
>> This change should have no effect with normal available kernel mem allocs.
>> 
>> Assuming memory pressure ( WAITING for clean memory) we should be allocating
>> our last pages for input skbufs and not for xmit allocs.
> 
> How about you instrument a kernel and measure if this really happens
> frequently under reasonable loads?  That is you can probably
> use the existing dropped page counters in netstat 
> Stephen added some time ago.
> 
> Since soft irqs cannot really wait exhausted GFP_ATOMIC would normally
> lead to dropped packets. FWIW I am not aware of any serious dropped
> packets problem on normal loads.
> 
> Running a kernel with nearly zero free memory is dangerous anyways
> -- pretty much any kernel service can fail arbitarily --
> if this happened frequently I suspect we would need generic
> VM solution for it.
> 
> -Andi
> 
> -- 
> ak@...ux.intel.com -- Speaking for myself only.
> --

Andi Kleen and group,

I actually did instrument memory years ago an older Linux kernel for a 
multiple core system/server. Also, threw out the oom killer as a last item
when it wasn't need via a /proc value. These changes were for a now
defunct Linux OS company that built a hi-end  Linux NAS server.

In general, an increasing larger percentage of memory is cached and 
fragmented over time. So, buddy algors tend to fail if the mem is continually
held and over time smaller and smaller page order allocs fail.

The instrumenting was to be able to repeat a condition to verify
that the changes were mostly transparent and added only minimal
load when the system was experiencing a lull.

A problem found was that Linux tracks free pages and not dirty pages.

However, I am starting small and simply say that:

Can we agree that the GFP_NOWAIT is atomic, but just doesn't grab the
last pages?
#define GFP_NOWAIT	(GFP_ATOMIC & ~_GFP_HIGH)

Thus, in the general case of an atomic kernel memory consumer,
the GFP_NOWAIT SHOULD be used.

And where a safety valve to be able to clean or free kernel memory the
GFP_ATOMIC should be used.

Later, I will suggest changes changes to clean kernel memory when low
I/O is being done, so if the memory then later needs to be freed, it can be
done quickly.

Later, a /proc percentage  variable that reps a percent of memory is 
marked/saved/separated for rotating hi-order page allocs for consumers
after the system has been up for weeks/months. This work was initially
done at another UNIX company, based on an Internal public paper.


Mitchell Erblich

	
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ