lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190613135653.310055383@infradead.org>
Date:   Thu, 13 Jun 2019 15:54:47 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     torvalds@...ux-foundation.org, mingo@...nel.org, bp@...en8.de,
        tglx@...utronix.de, luto@...nel.org, namit@...are.com,
        peterz@...radead.org
Cc:     linux-kernel@...r.kernel.org, Nadav Amit <nadav.amit@...il.com>
Subject: [PATCH v2 2/5] x86/percpu: Relax smp_processor_id()

Nadav reported that since this_cpu_read() became asm-volatile, many
smp_processor_id() users generated worse code due to the extra
constraints.

However since smp_processor_id() is reading a stable value, we can use
__this_cpu_read().

While this does reduce text size somewhat, this mostly results in code
movement to .text.unlikely as a result of more/larger .cold.
subfunctions. Less text on the hotpath is good for I$.

$ ./compare.sh defconfig-build1 defconfig-build2 vmlinux.o
setup_APIC_ibs                                             90         98   -12,+20
force_ibs_eilvt_setup                                     400        413   -57,+70
pci_serr_error                                            109        104   -54,+49
pci_serr_error                                            109        104   -54,+49
unknown_nmi_error                                         125        120   -76,+71
unknown_nmi_error                                         125        120   -76,+71
io_check_error                                            125        132   -97,+104
intel_thermal_interrupt                                   730        822   +92,+0
intel_init_thermal                                        951        945   -6,+0
generic_get_mtrr                                          301        294   -7,+0
generic_get_mtrr                                          301        294   -7,+0
generic_set_all                                           749        754   -44,+49
get_fixed_ranges                                          352        360   -41,+49
x86_acpi_suspend_lowlevel                                 369        363   -6,+0
check_tsc_sync_source                                     412        412   -71,+71
irq_migrate_all_off_this_cpu                              662        674   -14,+26
clocksource_watchdog                                      748        748   -113,+113
__perf_event_account_interrupt                            204        197   -7,+0
attempt_merge                                            1748       1741   -7,+0
intel_guc_send_ct                                        1424       1409   -15,+0
__fini_doorbell                                           235        231   -4,+0
bdw_set_cdclk                                             928        923   -5,+0
gen11_dsi_disable                                        1571       1556   -15,+0
gmbus_wait                                                493        488   -5,+0
md_make_request                                           376        369   -7,+0
__split_and_process_bio                                   543        536   -7,+0
delay_tsc                                                  96         89   -7,+0
hsw_disable_pc8                                           696        691   -5,+0
tsc_verify_tsc_adjust                                     215        228   -22,+35
cpuidle_driver_unref                                       56         49   -7,+0
blk_account_io_completion                                 159        148   -11,+0
mtrr_wrmsr                                                 95         99   -29,+33
__intel_wait_for_register_fw                              401        419   +18,+0
cpuidle_driver_ref                                         43         36   -7,+0
cpuidle_get_driver                                         15          8   -7,+0
blk_account_io_done                                       535        528   -7,+0
irq_migrate_all_off_this_cpu                              662        674   -14,+26
check_tsc_sync_source                                     412        412   -71,+71
irq_wait_for_poll                                         170        163   -7,+0
generic_end_io_acct                                       329        322   -7,+0
x86_acpi_suspend_lowlevel                                 369        363   -6,+0
nohz_balance_enter_idle                                   198        191   -7,+0
generic_start_io_acct                                     254        247   -7,+0
blk_account_io_start                                      341        334   -7,+0
perf_event_task_tick                                      682        675   -7,+0
intel_init_thermal                                        951        945   -6,+0
amd_e400_c1e_apic_setup                                    47         51   -28,+32
setup_APIC_eilvt                                          350        328   -22,+0
hsw_enable_pc8                                           1611       1605   -6,+0
                                             total   12985947   12985892   -994,+939

Reported-by: Nadav Amit <nadav.amit@...il.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
 arch/x86/include/asm/smp.h |    3 ++-
 include/linux/smp.h        |   45 +++++++++++++++++++++++++++++++--------------
 2 files changed, 33 insertions(+), 15 deletions(-)

--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -162,7 +162,8 @@ __visible void smp_call_function_single_
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (this_cpu_read(cpu_number))
+#define raw_smp_processor_id()  this_cpu_read(cpu_number)
+#define __smp_processor_id() __this_cpu_read(cpu_number)
 
 #ifdef CONFIG_X86_32
 extern int safe_smp_processor_id(void);
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -181,29 +181,46 @@ static inline int get_boot_cpu_id(void)
 
 #endif /* !SMP */
 
-/*
- * smp_processor_id(): get the current CPU ID.
+/**
+ * raw_processor_id() - get the current (unstable) CPU id
+ *
+ * For then you know what you are doing and need an unstable
+ * CPU id.
+ */
+
+/**
+ * smp_processor_id() - get the current (stable) CPU id
+ *
+ * This is the normal accessor to the CPU id and should be used
+ * whenever possible.
  *
- * if DEBUG_PREEMPT is enabled then we check whether it is
- * used in a preemption-safe way. (smp_processor_id() is safe
- * if it's used in a preemption-off critical section, or in
- * a thread that is bound to the current CPU.)
+ * The CPU id is stable when:
+ *
+ *  - IRQs are disabled;
+ *  - preemption is disabled;
+ *  - the task is CPU affine.
  *
- * NOTE: raw_smp_processor_id() is for internal use only
- * (smp_processor_id() is the preferred variant), but in rare
- * instances it might also be used to turn off false positives
- * (i.e. smp_processor_id() use that the debugging code reports but
- * which use for some reason is legal). Don't use this to hack around
- * the warning message, as your code might not work under PREEMPT.
+ * When CONFIG_DEBUG_PREEMPT; we verify these assumption and WARN
+ * when smp_processor_id() is used when the CPU id is not stable.
  */
+
+/*
+ * Allow the architecture to differentiate between a stable and unstable read.
+ * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
+ * regular asm read for the stable.
+ */
+#ifndef __smp_processor_id
+#define __smp_processor_id(x) raw_smp_processor_id(x)
+#endif
+
 #ifdef CONFIG_DEBUG_PREEMPT
   extern unsigned int debug_smp_processor_id(void);
 # define smp_processor_id() debug_smp_processor_id()
 #else
-# define smp_processor_id() raw_smp_processor_id()
+# define smp_processor_id() __smp_processor_id()
 #endif
 
-#define get_cpu()		({ preempt_disable(); smp_processor_id(); })
+#define get_cpu()		({ preempt_disable(); __smp_processor_id(); })
 #define put_cpu()		preempt_enable()
 
 /*


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ