lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 30 Mar 2017 00:54:30 +0200 From: Frederic Weisbecker <fweisbec@...il.com> To: Rik van Riel <riel@...hat.com> Cc: Luiz Capitulino <lcapitulino@...hat.com>, Wanpeng Li <kernellwp@...il.com>, linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de> Subject: Re: [BUG nohz]: wrong user and system time accounting (Adding Thomas in Cc) On Wed, Mar 29, 2017 at 04:08:45PM -0400, Rik van Riel wrote: > On Wed, 2017-03-29 at 13:16 -0400, Luiz Capitulino wrote: > > On Tue, 28 Mar 2017 13:24:06 -0400 > > Luiz Capitulino <lcapitulino@...hat.com> wrote: > > > > > 1. In my tracing I'm seeing that sometimes (always?) the > > > time interval between two timer interrupts is less than 1ms > > > > I think that's the root cause. > > > > In this trace, we see the following: > > > > 1. On CPU15, we transition from user-space to kernel-space because > > of a timer interrupt (it's the tick) > > > > 2. vtimer_delta() returns 0, because jiffies didn't change since the > > last accounting > > > > 3. While CPU15 is executing in kernel-space, jiffies is updated > > by CPU0 > > > > 4. When going back to user-space, vtime_delta() returns non-zero > > and the whole time is accounted for system time (observe how > > the cputime parameter in account_system_time() is less than 1ms) > > In other words, the tick on cpu0 is aligned > with the tick on the nohz_full cpus, and > jiffies is advanced while the nohz_full cpus > with an active tick happen to be in kernel > mode? Ah you found out faster than me :-) > Frederic, can you think of any reason why > the tick on nohz_full CPUs would end up aligned > with the tick on cpu0, instead of running at some > random offset? tick_init_jiffy_update() takes that decision to align all ticks. I'm not sure why. I don't see anything that could depend on that wide tick synchronization. The jiffies update itself relies on ktime to check when to update it. So even if the tick fires a bit later on CPU 1 than on CPU 0, the jiffies updates should stay coherent and should never exceed 999us delay in the worst case (for HZ=1000) Now I might overlook something. > > A random offset, or better yet a somewhat randomized > tick length to make sure that simultaneous ticks are > fairly rare and the vtime sampling does not end up > "in phase" with the jiffies incrementing, could make > the accounting work right again. > > Of course, that assumes the above hypothesis is correct :) I'm not sure that randomizing the tick start per CPU would be a right solution. Somewhere in the world you can be sure the tick randomization of some nohz_full CPU will coincide with the tick of CPU 0 :o) Or we could force that tick on nohz_full CPUs to be far from CPU 0's tick... I'm not sure such a solution would be accepted though.
Powered by blists - more mailing lists