lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1206663095.4122.82.camel@calx>
Date:	Thu, 27 Mar 2008 19:11:35 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	David Miller <davem@...emloft.net>
Cc:	joe@...ches.com, ilpo.jarvinen@...sinki.fi,
	akpm@...ux-foundation.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, acme@...hat.com
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot


On Thu, 2008-03-27 at 15:04 -0700, David Miller wrote:
> From: Joe Perches <joe@...ches.com>
> Date: Thu, 27 Mar 2008 12:10:50 -0700
> 
> > On Thu, 2008-03-27 at 14:38 +0200, Ilpo Järvinen wrote:
> > > Allyesconfig (v2.6.24-mm1):
> > 
> > I think this change is only good in severely memory
> > limited uses.  This will very likely negatively impact
> > high speed networking.  It's a speed/size trade off.
> 
> I severely doubt this, the bulk of the overhead of
> skb_put() is the atomic operation, not whether the
> instructions get executed inline or not.

More generally, we have to weigh the cost of a function call against the
cost of a cache miss here or -somewhere else-. That is, running multiple
copies of this code inline means that much other code gets pushed out of
cache. Further, consolidating multiple copies of this code into one
means it's that much more likely to already be in cache when we hit it.

In the 486 era, when CPU performance was close to 1:1 with memory,
branches were more expensive than sequential memory fetches, and
registers were scarce, inlining made a fair amount of sense.

But now we've moved very far away from that indeed: CPU is orders of
magnitude faster than memory, branches are quite cheap, and register
are.. well, not quite as scarce. All of that means that a cache miss is
much more expensive than a function call.

Inlining typical only makes sense for fairly trivial transformations
(something you'd consider doing with a macro) where the code to set up a
function call is comparable to the size of the function itself and code
in innermost loops. And if the inline is in a header file, it's probably
not in the latter class.

In the case of this patch, removing 60-100k from the network stack means
we're almost certainly avoiding a lot of cache misses in the big picture
while taking a few cycle hit per packet in the smallest scale.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ