[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111202175932.GA3894@tiehlicka.suse.cz>
Date: Fri, 2 Dec 2011 18:59:33 +0100
From: Michal Hocko <mhocko@...e.cz>
To: "Artem S. Tashkinov" <t.artem@...os.com>
Cc: pomac@...or.com, linux-kernel@...r.kernel.org, rjw@...k.pl,
tino.keitel@...ei.de
Subject: Re: [PATCH] proc: Do not overflow get_{idle,iowait}_time for nohz
(was: Re: Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage)
On Fri 02-12-11 17:49:17, Michal Hocko wrote:
> On Fri 02-12-11 14:35:15, Michal Hocko wrote:
> > On Tue 29-11-11 11:38:47, Artem S. Tashkinov wrote:
> > > On Nov 29, 2011, Michal Hocko <mhocko@...e.cz> wrote:
> > >
> > > > As I have written in other email could you post your config and collect
> > > > the following data?
> > > > for i in `seq 30`;
> > > > do
> > > > cat /proc/stat > `date +'%s'`
> > > > sleep 1
> > > > done
> > > > export old_user=0 old_nice=0 old_sys=0 old_idle=0 old_iowait=0;
> > > >
> > > > # for all your available CPUs
> > > > grep cpu0 * | while read cpu user nice sys idle iowait rest;
> > > > do
> > > > echo $cpu $(($user-$old_user)) $(($nice-$old_nice)) $(($sys-$old_sys)) $(($idle-$old_idle)) $(($iowait-$old_iowait))
> > > > old_user=$user old_nice=$nice old_sys=$sys old_idle=$idle old_iowait=$iowait
> > > > done
> > >
> > > 1322566208:cpu0 5199 0 2931 357890604 2541
> > > 1322566209:cpu0 0 0 1 0 0
> > > 1322566210:cpu0 0 0 0 0 0
> > > 1322566211:cpu0 0 0 0 0 0
> [...]
> >
> > Could you post raw data as well? Ideally starting right after boot and
> > collected for more than 30s (longer better...)
>
> Ahh, missed that you attached data. And also noticed that you are using
> CONFIG_HZ_300 which explains the problem and why I do cannot reproduce
> it.
>
> get_{idle,iowait}_time translates us to cputime64_t and it uses
> usecs_to_cputime which is just an alias for usecs_to_jiffies and it does
> if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
> return MAX_JIFFY_OFFSET;
> which in your case (HZ=300) means that we overflow much more often than
> for HZ==100. The patch below should fix this:
And the one with a more cleaned up changelog. No functional changes
---
>From 107887016b91de59194a93c751d040b05d5e37fe Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.cz>
Date: Fri, 2 Dec 2011 16:17:03 +0100
Subject: [PATCH] proc: Do not overflow get_{idle,iowait}_time for nohz
Since a25cac51 [proc: Consider NO_HZ when printing idle and iowait times]
we are reporting idle/io_wait time also while a CPU is tickless. We rely
on get_{idle,iowait}_time functions to retrieve proper data.
These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.
When do we overflow depends on CONFIG_HZ but especially for
CONFIG_HZ_300 it is quite low (1431649781) so we are getting
MAX_JIFFY_OFFSET for >3000s! until we overflow unsigned int.
Just for reference CONFIG_100 has an overflow window around 20s,
CONFIG_250 ~8s and CONFIG_1000 ~2s.
This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load. The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for
user system time.
Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).
Signed-off-by: Michal Hocko <mhocko@...e.cz>
---
fs/proc/stat.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 42b274d..2a30d67 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -32,7 +32,7 @@ static cputime64_t get_idle_time(int cpu)
idle = kstat_cpu(cpu).cpustat.idle;
idle = cputime64_add(idle, arch_idle_time(cpu));
} else
- idle = usecs_to_cputime(idle_time);
+ idle = nsecs_to_jiffies64(1000 * idle_time);
return idle;
}
@@ -46,7 +46,7 @@ static cputime64_t get_iowait_time(int cpu)
/* !NO_HZ so we can rely on cpustat.iowait */
iowait = kstat_cpu(cpu).cpustat.iowait;
else
- iowait = usecs_to_cputime(iowait_time);
+ iowait = nsecs_to_jiffies64(1000 * iowait_time);
return iowait;
}
--
1.7.7.3
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists