lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Dec 2017 11:54:56 +0100
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Rabin Vincent <rabin.vincent@...s.com>
Cc:     fweisbec@...il.com, riel@...hat.com, mingo@...hat.com,
        peterz@...radead.org, stable@...r.kernel.org,
        linux-kernel@...r.kernel.org, Rabin Vincent <rabinv@...s.com>
Subject: Re: [PATCH for v4.9-stable] sched: fix softirq time accounting

On Wed, Dec 13, 2017 at 11:11:16AM +0100, Rabin Vincent wrote:
> From: Rabin Vincent <rabinv@...s.com>
> 
> softirq time accounting is broken on v4.9.x if ksoftirqd runs.
> 
> With
> 	CONFIG_IRQ_TIME_ACCOUNTING=y
> 	# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
> 
> this test code:
> 
> 	struct tasklet_struct tasklet;
> 
> 	static void delay_tasklet(unsigned long data)
> 	{
> 		udelay(10);
> 		tasklet_schedule(&tasklet);
> 	}
> 
> 	tasklet_init(&tasklet, delay_tasklet, 0);
> 	tasklet_schedule(&tasklet);
> 
> results in:
> 
> $ while :; do grep cpu0 /proc/stat; done
> cpu0 5 0 80 25 16 107 1 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 81 25 16 107 0 0 0 0
> cpu0 5 0 81 25 16 107 1 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 0 0 0 0
> cpu0 6 0 81 25 16 108 0 0 0 0
> cpu0 6 0 81 25 16 108 0 0 0 0
> 
> As can be seen, the softirq numbers are totally bogus.
> 
> When ksoftirq is running, irqtime_account_process_tick() increments
> cpustat[CPUSTAT_SOFTIRQ].  This causes the "nsecs_to_cputime64(irqtime)
> - cpustat[CPUSTAT_SOFTIRQ]" calculation in irqtime_account_update() to
> underflow the next time a softirq is handled leading to the above
> values.
> 
> The underflow bug was added by 57430218317e5b280 ("sched/cputime: Count
> actually elapsed irq & softirq time").
> 
> But ksoftirqd accounting was wrong even in earlier kernels.  In earlier
> kernels, after a ksoftirq run, the kernel would simply stop accounting
> softirq time spent outside of ksoftirqd until that (accumulated) time
> exceeded the time for which ksofirqd previously had run.
> 
> Fix both the underflow and the wrong accounting by using a counter
> specifically for the non-ksoftirqd softirq case.
> 
> This code has been fixed in current mainline by a499a5a14db
> ("sched/cputime: Increment kcpustat directly on irqtime account") [note
> also the followup 25e2d8c1b9e327e ("sched/cputime: Fix ksoftirqd cputime
> accounting regression")], but that patch is a part of the many changes
> for eliminating of cputime_t so it does not seem suitable for backport.

I _really_ only want to take the exact upstream patches, as every time
we do something like what you are proposing to do here, we get it wrong.
Seriously, our track record is horrible.  Like 90% wrong.

Can you try just those two patches instead?

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ