lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <512F223B.4080801@redhat.com>
Date:	Thu, 28 Feb 2013 10:24:11 +0100
From:	Paolo Bonzini <pbonzini@...hat.com>
To:	Rusty Russell <rusty@...tcorp.com.au>
CC:	"Michael S. Tsirkin" <mst@...hat.com>,
	linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple
 sgs.

Il 27/02/2013 12:21, Rusty Russell ha scritto:
>>> >> Baseline (before add_sgs):
>>> >>         2.840000-3.040000(2.927292)user
>>> >> 
>>> >> After add_sgs:
>>> >>         2.970000-3.150000(3.053750)user
>>> >> 
>>> >> After simplifying add_buf a little:
>>> >>         2.950000-3.210000(3.081458)user
>>> >> 
>>> >> After inlining virtqueue_add/vring_add_indirect:
>>> >>         2.920000-3.150000(3.026875)user
>>> >> 
>>> >> After passing in iteration functions (chained vs unchained):
>>> >>         2.760000-2.970000(2.883542)user
> Oops.  This result (and the next) is bogus.  I was playing with -O3, and
> accidentally left that in :(

Did you check what actually happened that improved speed so much?  Can
we do it ourselves, or use a GCC attribute to turn it on?  Looking at
the GCC manual and source, there's just a bunch of optimizations enabled
by -O3:

    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },

`-ftree-loop-distribute-patterns'
     This pass distributes the initialization loops and generates a
     call to memset zero.  For example, the loop

Doesn't matter.

    { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },

Also doesn't matter.

    { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },

Can be done by us at the source level.

    { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },

Probably doesn't matter.

    { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },

`-fipa-cp-clone'
     Perform function cloning to make interprocedural constant
     propagation stronger.  When enabled, interprocedural constant
     propagation will perform function cloning when externally visible
     function can be called with constant arguments.

Can be done by adding new external APIs or marking functions as
always_inline.

    { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },

`-fgcse-after-reload'
     When `-fgcse-after-reload' is enabled, a redundant load elimination
     pass is performed after reload.  The purpose of this pass is to
     cleanup redundant spilling.

Never saw it have any substantial effect.

    { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },

Can be done by us simply by adding more "inline" keywords.

Plus, -O3 will make *full* loop unrolling a bit more aggressive.  But
full loop unrolling requires compile-time-known loop bounds, so I doubt
this is the case.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ