linux-kernel - sched/cputime: sig->prev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 04 Apr 2013 10:40:16 -0700
From:	Dave Hansen <dave@...1.net>
To:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Subject: sched/cputime: sig->prev_stime underflow

With the 3.9-rcs (and probably much earlier) I'm seeing some weird top
output where the cpu time "spent" is millions of hours:

445 root      20   0     0    0    0 S    0  0.0  5124095h kworker/45:1
404 root      20   0     0    0    0 S    0  0.0  5124095h kworker/4:1

I see it mostly with kernel threads, but it doesn't seem to happen on my
distro kernel (3.5 era).  The suspect code is in thread_group_times():

	sig->prev_stime = max(sig->prev_stime, rtime - sig->prev_utime);

In my case, I caught it with rtime=34 and sig->prev_utime=35.  This code
_looks_ to be pretty mature, coming in at commit 0cf55e1e in 2009.  The
system I'm running on _does_ have some non-sync'd TSCs, but they are at
least being detected, so I expect the fallout to be minimal:

	tsc: Marking TSC unstable due to check_tsc_sync_source failed

config:

	http://sr71.net/~dave/linux/config-bigbox-04042013.txt

The dumb fix here would seem to be to just check "rtime <
sig->prev_utime".  Any thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/