lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Jul 2020 18:35:04 +0200
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Anna-Maria Gleixner <anna-maria@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH 01/10] timer: Prevent base->clk from moving backward

Hi,

Thanks for the set, first of all. Reviewing and testing it.

On 01/07/20 03:10, Frederic Weisbecker wrote:
> When a timer is enqueued with a negative delta (ie: expiry is below
> base->clk), it gets added to the wheel as expiring now (base->clk).
> 
> Yet the value that gets stored in base->next_expiry, while calling
> trigger_dyntick_cpu(), is the initial timer->expires value. The
> resulting state becomes:
> 
> 	base->next_expiry < base->clk
> 
> On the next timer enqueue, forward_timer_base() may accidentally
> rewind base->clk. As a possible outcome, timers may expire way too
> early, the worst case being that the highest wheel levels get spuriously
> processed again.
> 
> To prevent from that, make sure that base->next_expiry doesn't get below
> base->clk.
> 
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Anna-Maria Gleixner <anna-maria@...utronix.de>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> ---
>  kernel/time/timer.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 398e6eadb861..9a838d38dbe6 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -584,7 +584,15 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
>  	 * Set the next expiry time and kick the CPU so it can reevaluate the
>  	 * wheel:
>  	 */
> -	base->next_expiry = timer->expires;
> +	if (time_before(timer->expires, base->clk)) {
> +		/*
> +		 * Prevent from forward_timer_base() moving the base->clk
> +		 * backward
> +		 */
> +		base->next_expiry = base->clk;
> +	} else {
> +		base->next_expiry = timer->expires;
> +	}
>  	wake_up_nohz_cpu(base->cpu);
>  }
>  
> @@ -896,10 +904,13 @@ static inline void forward_timer_base(struct timer_base *base)
>  	 * If the next expiry value is > jiffies, then we fast forward to
>  	 * jiffies otherwise we forward to the next expiry value.
>  	 */
> -	if (time_after(base->next_expiry, jnow))
> +	if (time_after(base->next_expiry, jnow)) {
>  		base->clk = jnow;
> -	else
> +	} else {
> +		if (WARN_ON_ONCE(time_before(base->next_expiry, base->clk)))
> +			return;

I actually ported it to latest RT tree (v5.6.17-rt10) w/o conflicts,
but hit this one above:

 ...
 000: Kernel command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.17-rt10 root=/dev/mapper/rhel_rt--qe--04-root ro crashkernel=auto resume=/dev/mapper/rhel_rt--qe--04-swap rd.lvm.lv=rhel_rt-qe-04/root rd.lvm.lv=rhel_rt-qe-04/swap console=ttyS0,115200
 000: mem auto-init: stack:off, heap alloc:off, heap free:off
 000: Memory: 2102240K/134089036K available (12292K kernel code, 2449K rwdata, 4332K rodata, 2248K init, 15392K bss, 2278548K reserved, 0K cma-reserved)
 000: random: get_random_u64 called from cache_random_seq_create+0x7c/0x140 with crng_init=0
 000: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=24, Nodes=2
 000: Kernel/User page tables isolation: enabled
 000: ftrace: allocating 37680 entries in 148 pages
 000: ftrace: allocated 148 pages with 3 groups
 000: rcu: Preemptible hierarchical RCU implementation.
 000: rcu:        RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=24.
 000: rcu:        RCU priority boosting: priority 1 delay 500 ms.
 000: rcu:        RCU_SOFTIRQ processing moved to rcuc kthreads.
 000:     No expedited grace period (rcu_normal_after_boot).
 000:     Tasks RCU enabled.
 000: rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
 000: rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=24
 000: NR_IRQS: 524544, nr_irqs: 1432, preallocated irqs: 16
 000: random: crng done (trusting CPU's manufacturer)
 000: Console: colour VGA+ 80x25
 000: printk: console [ttyS0] enabled
 000: ACPI: Core revision 20200110
 000: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
 000: APIC: Switch to symmetric I/O mode setup
 000: DMAR: Host address width 46
 000: DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
 000: DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020df
 000: DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1
 000: DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020df
 000: DMAR: RMRR base: 0x00000079190000 end: 0x00000079192fff
 000: DMAR: RMRR base: 0x000000791f4000 end: 0x000000791f7fff
 000: DMAR: RMRR base: 0x000000791de000 end: 0x000000791f3fff
 000: DMAR: RMRR base: 0x000000791cb000 end: 0x000000791dbfff
 000: DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ddfff
 000: DMAR: RMRR base: 0x0000005a661000 end: 0x0000005a6a0fff
 000: DMAR-IR: IOAPIC id 10 under DRHD base  0xfbffc000 IOMMU 0
 000: DMAR-IR: IOAPIC id 8 under DRHD base  0xc7ffc000 IOMMU 1
 000: DMAR-IR: IOAPIC id 9 under DRHD base  0xc7ffc000 IOMMU 1
 000: DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000
 000: DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
 000: DMAR-IR: Enabled IRQ remapping in x2apic mode
 000: x2apic enabled
 000: Switched APIC routing to cluster x2apic.
 000: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
 000: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x228d095820e, max_idle_ns: 440795295198 ns
 000: ------------[ cut here ]------------
 000: WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:897 add_timer_on+0x129/0x140
 000: Modules linked in:
 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.17-rt10 #1
 000: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
 000: RIP: 0010:add_timer_on+0x129/0x140
 000: Code: 24 48 89 ef e8 e8 ca 7d 00 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75 1f 48 83 c4 10 5b 5d 41 5c c3 48 89 45 48 eb ca 0f 0b <0f> 0b eb c4 e8 53 94 e9 ff e9 7b ff ff ff e8 64 c4 f6 ff 0f 1f 40
 000: RSP: 0000:ffffffffb2c03e80 EFLAGS: 00010083
 000: RAX: 00000000fffb6c25 RBX: ffffffffb3886540 RCX: 0000000000000000
 000: RDX: 00000000fffb6c20 RSI: ffffffffb2c03e80 RDI: 0000000000000001
 000: RBP: ffff92687fa19300 R08: 0000000000000000 R09: 0000000000000001
 000: R10: ffffffffb2c5b4e0 R11: 3235393730343420 R12: 0000000000000000
 000: R13: ffffffffb2cd42c0 R14: 0000000000000000 R15: ffff92687fa27f40
 000: FS:  0000000000000000(0000) GS:ffff92687fa00000(0000) knlGS:0000000000000000
 000: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 000: CR2: ffff92610ac01000 CR3: 00000008c920e001 CR4: 00000000000606b0
 000: Call Trace:
 000:  clocksource_select_watchdog+0x144/0x1a0
 000:  __clocksource_register_scale+0x88/0xf0
 000:  tsc_init+0x1a1/0x268
 000:  start_kernel+0x4ae/0x56e
 000:  secondary_startup_64+0xb6/0xc0
 000: ---[ end trace 0000000000000001 ]---

Guess you might be faster to understand what I'm missing. :-)

Best,

Juri

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ