lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150512114353.GA13699@gmail.com>
Date:	Tue, 12 May 2015 13:43:54 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Graf <tgraf@...g.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Bart Van Assche <bvanassche@....org>,
	Peter Zijlstra <peterz@...radead.org>,
	David Rientjes <rientjes@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] force inlining of spinlock ops


* Denys Vlasenko <dvlasenk@...hat.com> wrote:

> On 05/12/2015 09:44 AM, Ingo Molnar wrote:
> > 
> > * Denys Vlasenko <dvlasenk@...hat.com> wrote:
> > 
> >> With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline
> >> very small functions we expect to be inlined. In particular,
> >> with this config: http://busybox.net/~vda/kernel_config
> >> there are more than a thousand copies of tiny spinlock-related functions:
> > 
> > That's an x86-64 allyesconfig AFAICS, right?
> 
> Close, but I disabled options which are clearly "heavy debugging" stuff.
> IOW: many developers run their work machines with lock debugging etc,
> but few would constantly use something which slows kernel down by a factor of 3!
> 
> So, CONFIG_KASAN is off. CONFIG_STAGING is also off. And a few others I forgot.
> 
> I'm using this config to see which inlines should be deinlined.
> For that, I need to cover all callsites of each inline.
> Thus, I need ~allyesconfig.
> 
> The discovery that there also exists the opposite problem (wrongly
> *un*inlined functions) was accidental.
> 
> 
> > It's not mysterious, but an effect of -Os plus allowing GCC to do 
> > inlining heuristics:
> > 
> >   CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> >   CONFIG_OPTIMIZE_INLINING=y
> > 
> > Does the problem go away if you unset of these config options?
> 
> With CONFIG_CC_OPTIMIZE_FOR_SIZE off,
> problem greatly diminishes, but is not eliminated.
> Testing allyesconfig would take too long, so I just took defconfig.
> 
> On defconfig kernel, the following functions below 16 bytes
> of machine code are auto-deinlined:
> 
> #Calls_ Size(hex)_______   Name____________________
>       7 000000000000000b t hweight_long
>       5 000000000000000f t init_once
>       4 000000000000000d t cpumask_set_cpu
>       4 000000000000000b t udp_lib_close
>       4 0000000000000006 t udp_lib_hash
>       3 000000000000000a t nofill
>       3 0000000000000006 t sg_set_page.part.7
>       2 000000000000000f t udplite_sk_init
>       2 000000000000000f t ct_seq_next
>       2 000000000000000e t encode_cookie
>       2 000000000000000d t ktime_get_real
>       2 000000000000000b t spin_lock
>       2 000000000000000b t device_create_release
>       2 000000000000000b t cpu_smt_flags
>       2 000000000000000b t cpu_core_flags
>       2 0000000000000009 t default_write_file
>       2 0000000000000008 t __initcall_pl_driver_init6
>       2 0000000000000008 t __initcall_nf_defrag_init6
>       2 0000000000000008 t __initcall_hid_init6
>       2 0000000000000008 t __initcall_ch_driver_init6
>       2 0000000000000008 t default_read_file
>       2 0000000000000006 t wiphy_to_rdev.part.4
>       2 0000000000000006 t s_stop
>       2 0000000000000006 t sg_set_page.part.3
>       2 0000000000000006 t generic_print_tuple
>       2 0000000000000006 t exp_seq_stop
>       2 0000000000000006 t ct_seq_stop
>       2 0000000000000006 t ct_cpu_seq_stop
> 
> In particular, one of the functions from my patches,
> spin_lock(), has been auto-deinlined:
> 
> ffffffff8108adb0 <spin_lock>:
> ffffffff8108adb0:       55                      push   %rbp
> ffffffff8108adb1:       48 89 e5                mov    %rsp,%rbp
> ffffffff8108adb4:       e8 37 db 81 00          callq  ffffffff818a88f0 <_raw_spin_lock>
> ffffffff8108adb9:       5d                      pop    %rbp
> ffffffff8108adba:       c3                      retq
> 
> 
> > Furtermore, what is the size win on x86 defconfig with these options 
> > set?
> 
> CONFIG_OPTIMIZE_INLINING=y is in defconfig.
> 
> Size difference for CC_OPTIMIZE_FOR_SIZE:
> 
>     text    data     bss      dec    hex filename
> 12335864 1746152 1081344 15163360 e75fe0 vmlinux.CC_OPTIMIZE_FOR_SIZE=y
> 10373764 1684200 1077248 13135212 c86d6c vmlinux.CC_OPTIMIZE_FOR_SIZE=n
> 
> Decrease by about 19%.

I suspect the 'filename' field wants to be flipped?

In any case, the interesting measurement would not be -Os comparisons 
(which causes GCC to be too crazy), but to see the size effect of your 
_patch_ that always-inlines spinlock ops, on plain defconfig and on 
defconfig-Os.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ