lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 27 Mar 2008 14:37:59 +0200
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Cc:	Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: [PATCH 0/7]: uninline some net related static inline in .h bloaters

Hi all,

As suggested by Andrew, I rerun the uninlining of all what's in .h
(v2.6.25-rc2-mm1 this time) to fix the (un)likely profiling overhead
which showed up in the allyesconfig results. The config was manually
tweaked from allyesconfig to not include a number of debug related
options [1]. Other setup: 32-bit x86, gcc (GCC) 4.1.2 20070626
(Red Hat 4.1.2-13). I also tweaked the uninlining machinery a bit and
got even better coverage than last time.

IS_ERR is now successfully off the top :-). The numbers are smaller
than what was previously measured, esp. when a function has a large
number of call-sites and has (un)likely, nevertheless, mostly the
same functions show up among top bloaters as with plain allyesconfig.
Also my earlier, smaller scale tests yielded similar conclusion.
Interesting new comers, however, are those list related functions
which had non-inlined debug versions with allyesconfig.

Ok, here's the top of the list (5000+ bytes):

-60744  855 funcs, 861 +, 61605 -, diff: -60744 --- skb_put 
-33129  42 funcs, 199 +, 33328 -, diff: -33129 --- cfi_build_cmd 
-32338  1182 funcs, 594 +, 32932 -, diff: -32338 --- atomic_dec_and_test 
-22906  1208 funcs, 462 +, 23368 -, diff: -22906 --- list_del 
-18399  468 funcs, 282 +, 18681 -, diff: -18399 --- netif_wake_queue 
-14283  14 funcs, 365 +, 14648 -, diff: -14283 --- cfi_send_gen_cmd 
-13890  341 funcs, 189 +, 14079 -, diff: -13890 --- skb_push 
-12853  35 funcs, 114 +, 12967 -, diff: -12853 --- le_key_k_type 
-12178  382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb 
-11675  1178 funcs, 1471 +, 13146 -, diff: -11675 --- list_add_tail 
-10996  1688 funcs, 1144 +, 12140 -, diff: -10996 --- raw_local_irq_enable 
-10838  1832 funcs, 3763 +, 14601 -, diff: -10838 --- __list_add 
-10283  401 funcs, 208 +, 10491 -, diff: -10283 --- __dev_alloc_skb 
-9697  338 funcs, 221 +, 9918 -, diff: -9697 --- skb_pull 
-8963  836 funcs, 1903 +, 10866 -, diff: -8963 --- list_add 
-7939  29 funcs, 304 +, 8243 -, diff: -7939 --- __vlan_hwaccel_rx 
-7937  238 funcs, 314 +, 8251 -, diff: -7937 --- i_size_read 
-7768  106 funcs, 254 +, 8022 -, diff: -7768 --- __nlmsg_put 
-7569  376 funcs, 246 +, 7815 -, diff: -7569 --- __skb_pull 
-7467  658 funcs, 503 +, 7970 -, diff: -7467 --- list_del_init 
-7360  192 funcs, 131 +, 7491 -, diff: -7360 --- skb_trim 
-7257  186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release 
-6943  234 funcs, 109 +, 7052 -, diff: -6943 --- __skb_trim 
-6923  71 funcs, 296 +, 7219 -, diff: -6923 --- nlmsg_put 
-6662  475 funcs, 410 +, 7072 -, diff: -6662 --- add_timer 
-6535  380 funcs, 1491 +, 8026 -, diff: -6535 --- ___arch__swab64 
-6457  372 funcs, 1429 +, 7886 -, diff: -6457 --- __fswab64 
-5625  48 funcs, 79 +, 5704 -, diff: -5625 --- tty_insert_flip_char 
-5615  25 funcs, 284 +, 5899 -, diff: -5615 --- vlan_hwaccel_receive_skb 
-5404  23 funcs, 509 +, 5913 -, diff: -5404 --- jhash 
-5396  214 funcs, 155 +, 5551 -, diff: -5396 --- dev_to_shost 
-5310  110 funcs, 133 +, 5443 -, diff: -5310 --- skb_header_pointer 
-5169  403 funcs, 131 +, 5300 -, diff: -5169 --- pci_write_config_byte 

Please note that because one inline was tested (function uninlined)
at a time, the actual benefits of removing multiple inlines may well
be below what the sum of those individually is (especially when
something calls __-func with equal name, e.g., dev_alloc_skb that
basically just calls __dev_alloc_skb).

~210 are 1000+ bytes (was ~250 with allyesconfig) and ~350 in 500+
(was ~440 previously). Full list has small number of entries without
any details indicating compile failures, and a bit larger set of cases
where the static inline was preprocessed away due to #if blocks. Except
for that, it should be self explinary:
  http://www.cs.helsinki.fi/u/ijjarvin/inlines/sorted.v2.6.25-rc2-mm1

I include couple of net related uninlines in this patch series. I don't
include the jhash to lib/ patch this time (was there previously) because
I haven't had time to finish it up.

Another view to the results showing size of the uninlined bodies is
available in here:
  http://www.cs.helsinki.fi/u/ijjarvin/inlines/bodies.v2.6.25-rc2-mm1

The tools I used are available here except the site-specific
distribute machinery (in addition one needs pretty late
codiff from Arnaldo's toolset because there were some inline
related bugs fixed lately):

  http://www.cs.helsinki.fi/u/ijjarvin/inline-tools.git/

As stated earlier, similar analysis should be performed on .h files
not under include/, but it would require minor modifications to
those tools.

--
 i.

[1] http://www.cs.helsinki.fi/u/ijjarvin/inlines/config.nodebug.v2.6.25-rc2-mm1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ