[<prev] [next>] [day] [month] [year] [list]
Message-ID: <14e201cd5ee9$c1b57ce0$452076a0$@com>
Date: Tue, 10 Jul 2012 15:17:12 -0700
From: "Alec Matusis" <alecm@...tango.com>
To: <linux-kernel@...r.kernel.org>
Subject: bogus utime and stime in /proc/<PID/stat - possibly related to fs/proc/array.c change
I run a number of Linux servers and I noticed an interesting bug, possibly
related to a recent change in fs/proc/array.c
After upgrading from Ubuntu 2.6.24-26 to 2.6.32-40 (and higher) in Ubuntu, I
noticed that about once per month, suddenly, a user process causing the main
load on a given machine disappears from "top", but it still continues to run
normally (perhaps with a slight performance decrease). After this, the load
average of the system remains the same, but the top shows no running
processes causing the load. This happened on a variety of new IBM System X
machines, all running different tasks (httpd 2.2, mysqld 5.1, Twisted Python
TCP servers).
I looked at a problematic process, and discovered that ps -o pcpu showed
crazily large numbers:
#ps -o pcpu,pid,cmd -p1587
%CPU PID CMD
317713124 1587 /nail/encap/mysql-5.1.60/libexec/mysqld
Then I looked at:
# cat /proc/1587/stat
1587 (mysqld) S 1212 1088 1088 0 -1 4202752 14307313 0 162 0 85773299069
4611685932654088833 0 0 20 0 52 0 3549 27255418880 5483524
18446744073709551615 4194304 11111617 140733749236976 140733749235984
8858659 0 552967 4102 26345 18446744073709551615 0 0 17 5 0 0 0 0
I noticed that the 14th and 15th entry 85773299069 4611685932654088833
(utime and stime) become abnormally large and they were stuck. When the
server is in the normal state (i.e. the system load-causing process shows up
on top, and ps -o pcpu shows reasonable %CPU) , these numbers are 13 orders
of magnitude smaller, e.g. 416786 602262, and they are advancing by about
10 per second.
I do not understand what causes this problem, expect that I know that
machines with 2.6.24-26 or earlier do not have this behavior, and since then
there was a change in fs/proc/array.c.
I wrote this up in detail in
http://serverfault.com/questions/406489/load-causing-processes-disappearing-
from-top-ps-o-pcpu-shows-bogus-numbers
If you have any comment on this, it'd be highly appreciated.
Thank you.
Alec Matusis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists