netdev - Re: Proposed linux kernel changes : scaling tcp/ip stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <E4C28CAD-8179-4B4F-9F87-07F2D4587EC0@earthlink.net>
Date:	Wed, 16 Jun 2010 12:39:05 -0700
From:	Mitchell Erblich <erblichs@...thlink.net>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Subject: Re: Proposed linux kernel changes : scaling  tcp/ip stack

On Jun 16, 2010, at 2:10 AM, Andi Kleen wrote:

> Mitchell Erblich <erblichs@...thlink.net> writes:
>> 
>> Summary: Don't use last free pages for TCP ACKs with GFP_ATOMIC for our
>> sk buf allocs. 1 line change in tcp_output.c with a new gfp.h arg, and a change
>> in the generic kernel. TBD.
>> 
>> This change should have no effect with normal available kernel mem allocs.
>> 
>> Assuming memory pressure ( WAITING for clean memory) we should be allocating
>> our last pages for input skbufs and not for xmit allocs.
> 
> How about you instrument a kernel and measure if this really happens
> frequently under reasonable loads?  That is you can probably
> use the existing dropped page counters in netstat 
> Stephen added some time ago.
> 
> Since soft irqs cannot really wait exhausted GFP_ATOMIC would normally
> lead to dropped packets. FWIW I am not aware of any serious dropped
> packets problem on normal loads.
> 
> Running a kernel with nearly zero free memory is dangerous anyways
> -- pretty much any kernel service can fail arbitarily --
> if this happened frequently I suspect we would need generic
> VM solution for it.
> 
> -Andi
> 
> -- 
> ak@...ux.intel.com -- Speaking for myself only.
> --

Andi Kleen and group,

I actually did instrument memory years ago an older Linux kernel for a 
multiple core system/server. Also, threw out the oom killer as a last item
when it wasn't need via a /proc value. These changes were for a now
defunct Linux OS company that built a hi-end  Linux NAS server.

In general, an increasing larger percentage of memory is cached and 
fragmented over time. So, buddy algors tend to fail if the mem is continually
held and over time smaller and smaller page order allocs fail.

The instrumenting was to be able to repeat a condition to verify
that the changes were mostly transparent and added only minimal
load when the system was experiencing a lull.

A problem found was that Linux tracks free pages and not dirty pages.

However, I am starting small and simply say that:

Can we agree that the GFP_NOWAIT is atomic, but just doesn't grab the
last pages?
#define GFP_NOWAIT	(GFP_ATOMIC & ~_GFP_HIGH)

Thus, in the general case of an atomic kernel memory consumer,
the GFP_NOWAIT SHOULD be used.

And where a safety valve to be able to clean or free kernel memory the
GFP_ATOMIC should be used.

Later, I will suggest changes changes to clean kernel memory when low
I/O is being done, so if the memory then later needs to be freed, it can be
done quickly.

Later, a /proc percentage  variable that reps a percent of memory is 
marked/saved/separated for rotating hi-order page allocs for consumers
after the system has been up for weeks/months. This work was initially
done at another UNIX company, based on an Internal public paper.

Mitchell Erblich

> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html