lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 29 Jan 2013 11:46:03 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Li Zhong <zhong@...ux.vnet.ibm.com>,
	Namhyung Kim <namhyung.kim@....com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Paul Gortmaker <paul.gortmaker@...driver.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sedat Dilek <sedat.dilek@...il.com>,
	Gleb Natapov <gleb@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Tony Luck <tony.luck@...el.com>,
	Fenghua Yu <fenghua.yu@...el.com>
Subject: Re: [GIT PULL] cputime: Full dynticks task/cputime accounting v7

On Tue, 2013-01-29 at 16:12 +0100, Frederic Weisbecker wrote:

> I have set some cputime tracepoint and looked at the stacktrace. They
> are sorted by weight, which is the amount of cputime accounted. I
> can't find anything wrong on these. See attachment.

That's because a histogram doesn't show the issue. You need to see the
order of events. That is, you need to see that user time is calculated
again within a system call. I ran my trace again with stack trace
enabled and it shows the race nicely:

kernelsp-2367    7....  7088.483227: funcgraph_entry:                   |  syscall_trace_leave() {
kernelsp-2367    7d..1  7088.483228: bprint:               account_system_time : system: 17887
kernelsp-2367    7d..1  7088.483242: kernel_stack:         <stack trace>
=> __vtime_account_system (ffffffff810646e5)
=> vtime_user_enter (ffffffff81064a5e)
=> user_enter (ffffffff810d8b2b)
=> syscall_trace_leave (ffffffff8100d03f)
=> return_to_handler (ffffffff814c8dc5)
=> int_check_syscall_exit_work (ffffffff814c925f)

syscall_trace_leave() was called, we are leaving a system call and going
back to userspace. We calculate the system time and set the state to
IN_USER by user_enter().


kernelsp-2367    7d..1  7088.483244: bprint:               account_user_time : user: 15753
kernelsp-2367    7d..1  7088.483262: kernel_stack:         <stack trace>
=> vtime_account_user (ffffffff81064a12)
=> vtime_account_irq_enter (ffffffff81064b40)
=> irq_enter (ffffffff8103d36e)
=> smp_apic_timer_interrupt (ffffffff814caa83)
=> apic_timer_interrupt (ffffffff814c9bb2)
=> user_enter (ffffffff810d8b58)
=> syscall_trace_leave (ffffffff8100d03f)
=> return_to_handler (ffffffff814c8dc5)
=> int_check_syscall_exit_work (ffffffff814c925f)


But before we got to user space, we took an interrupt (smp_apic_timer_interrupt).

Now the vtime_account_irq_enter() sees the IN_USER set, and accounts the
time back to userspace. That was 16us it calculated incorrectly.


kernelsp-2367    7d.h1  7088.483267: bprint:               account_system_time : system: 23401
kernelsp-2367    7d.h1  7088.483286: kernel_stack:         <stack trace>
=> __vtime_account_system (ffffffff810646e5)
=> vtime_account_irq_exit (ffffffff810649b8)
=> irq_exit (ffffffff8103d39e)
=> smp_apic_timer_interrupt (ffffffff814caacf)
=> apic_timer_interrupt (ffffffff814c9bb2)
=> user_enter (ffffffff810d8b58)
=> syscall_trace_leave (ffffffff8100d03f)
=> return_to_handler (ffffffff814c8dc5)
=> int_check_syscall_exit_work (ffffffff814c925f)

Now we are leaving the interrupt and since the state is still IN_USER,
the system time is calculated.

kernelsp-2367    7d..2  7088.483287: bprint:               account_user_time : user: 19865
kernelsp-2367    7d..2  7088.483309: kernel_stack:         <stack trace>
=> vtime_account_user (ffffffff81064a12)
=> vtime_account_irq_enter (ffffffff81064b40)
=> __do_softirq (ffffffff8103d0f5)
=> call_softirq (ffffffff814ca2fc)
=> do_softirq (ffffffff81003fdc)
=> irq_exit (ffffffff8103d3df)
=> smp_apic_timer_interrupt (ffffffff814caacf)
=> apic_timer_interrupt (ffffffff814c9bb2)
=> user_enter (ffffffff810d8b58)
=> syscall_trace_leave (ffffffff8100d03f)
=> return_to_handler (ffffffff814c8dc5)
=> int_check_syscall_exit_work (ffffffff814c925f)

But oops, we are now running a softirq, which happens to get everything
wrong again, and calculates another 20us incorrectly towards user time.

Perhaps the best thing to do is to have the irq vtime determine usermode
or not with "user_mode(get_irq_regs())". I'll try it.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ