lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Oct 2017 13:10:07 +0200
From:   Phil Sutter <phil@....cc>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     Michal Kubecek <mkubecek@...e.cz>, Hangbin Liu <haliu@...hat.com>,
        netdev@...r.kernel.org, Hangbin Liu <liuhangbin@...il.com>
Subject: Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to
 support malloc buff at run time

On Tue, Oct 10, 2017 at 09:47:43AM -0700, Stephen Hemminger wrote:
> On Tue, 10 Oct 2017 08:41:17 +0200
> Michal Kubecek <mkubecek@...e.cz> wrote:
> 
> > On Mon, Oct 09, 2017 at 10:25:25PM +0200, Phil Sutter wrote:
> > > Hi Stephen,
> > > 
> > > On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote:  
> > > > On Thu, 28 Sep 2017 21:33:46 +0800
> > > > Hangbin Liu <haliu@...hat.com> wrote:
> > > >   
> > > > > From: Hangbin Liu <liuhangbin@...il.com>
> > > > > 
> > > > > This is an update for 460c03f3f3cc ("iplink: double the buffer size also in
> > > > > iplink_get()"). After update, we will not need to double the buffer size
> > > > > every time when VFs number increased.
> > > > > 
> > > > > With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
> > > > > length parameter.
> > > > > 
> > > > > With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
> > > > > answer to avoid overwrite data in nlh, because it may has more info after
> > > > > nlh. also this will avoid nlh buffer not enough issue.
> > > > > 
> > > > > We need to free answer after using.
> > > > > 
> > > > > Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
> > > > > Signed-off-by: Phil Sutter <phil@....cc>
> > > > > ---  
> > > > 
> > > > Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing.
> > > > Can only those places that need that be targeted?  
> > > 
> > > We could probably do that, by having a buffer on stack in __rtnl_talk()
> > > which will be used instead of the allocated one if 'answer' is NULL. Or
> > > maybe even introduce a dedicated API call for the dynamically allocated
> > > receive buffer. But I really doubt that's feasible: AFAICT, that stack
> > > buffer still needs to be reasonably sized since the reply might be
> > > larger than the request (reusing the request buffer would be the most
> > > simple way to tackle this), also there is support for extack which may
> > > bloat the response to arbitrary size. Hangbin has shown in his benchmark
> > > that the overhead of the second syscall is negligible, so why care about
> > > that and increase code complexity even further?
> > > 
> > > Not saying it's not possible, but I just doubt it's worth the effort.  
> > 
> > Agreed. Current code is based on the assumption that we can estimate the
> > maximum reply length in advance and the reason for this series is that
> > this assumption turned out to be wrong. I'm afraid that if we replace
> > it by an assumption that we can estimate the maximum reply length for
> > most requests with only few exceptions, it's only matter of time for us
> > to be proven wrong again.
> > 
> > Michal Kubecek
> > 
> 
> For query responses, yes the response may be large. But for the common cases of
> add address or add route, the response should just be ack or error.

And with extack, error is comprised of the original request plus an
arbitrarily sized error message, so we can't just reuse the request
buffer and are back to "guessing" the right length again.

To get an idea of what we're talking about, I wrote a simple benchmark
which adds 256 * 254 (= 65024) addresses to an interface, then removes
them again one by one and measured the time that takes for binaries with
and without Hangbin's patches:

OP	Vanilla		Hangbin		Delta
--------------------------------------------------------
add	real 2m16.244s	real 2m27.964s	+11.72s	(108.6%)
	user 0m15.241s	user 0m17.295s	+2.054s	(113.5%)
	sys  1m40.229s	sys  1m48.239s	+8.01s	(108.0%)

remove	real 1m44.950s	real 1m47.044s	+2.094s	(102.0%)
	user 0m13.899s	user 0m14.723s	+0.824s (105.9%)
	sys  1m30.798s	sys  1m31.938s	+1.140s (101.3%)

So the overhead of the second syscall and dynamic memory allocation is
less than 10% overall. Given the short time a single call to 'ip'
typically takes, I don't think the difference is noticeable even in
highly performance critical applications.

Cheers, Phil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ