linux-kernel - Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101205131702.GE9138@n2100.arm.linux.org.uk>
Date:	Sun, 5 Dec 2010 13:17:02 +0000
From:	Russell King - ARM Linux <linux@....linux.org.uk>
To:	Mikael Pettersson <mikpe@...uu.se>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>
Cc:	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

On Sun, Dec 05, 2010 at 01:32:37PM +0100, Mikael Pettersson wrote:
> Mikael Pettersson writes:
>  > The scenario is that I do a remote login to an ARM build server,
>  > use screen to start a sub-shell, in that shell start a largish
>  > compile job, detach from that screen, and from the original login
>  > shell I occasionally monitor the compile job with top or ps or
>  > by attaching to the screen.
>  > 
>  > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
>  > very sluggish: top takes forever to start, once started it shows no
>  > activity from the compile job (it's as if it's sleeping on a lock),
>  > and ps also takes forever and shows no activity from the compile job.
>  > 
>  > Rebooting into 2.6.36 eliminates these issues.
>  > 
>  > I do pretty much the same thing (remote login -> screen -> compile job)
>  > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
>  > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
>  > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
>  > 
>  > Has anyone else seen this? Any ideas about the cause?
> 
> (Re-followup since I just realised my previous followups were to Rafael's
> regressions mailbot rather than the original thread.)
> 
> > The bug is still present in 2.6.37-rc4.  I'm currently trying to bisect it.
> 
> git bisect identified
> 
> [305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task
> 
> as the cause of this regression.  Reverting it from 2.6.37-rc4 (requires some
> hackery due to subsequent changes in the same area) restores sane behaviour.
> 
> The original patch submission talks about irq-heavy scenarios.  My case is the
> exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
> bound in userspace but expected to schedule quickly when needed (e.g. running
> top or ps or just hitting CR in one shell while another runs a compile job).
> 
> I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
> ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
> (x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.
> 
> So it looks like an ARM-only issue, possibly depending on platform specifics.
> 
> One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
> machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
> much higher on Kirkwood, even when the machine is idle.

The above patch you point out is fundamentally broken.

+               rq->clock = sched_clock_cpu(cpu);
+               irq_time = irq_time_cpu(cpu);
+               if (rq->clock - irq_time > rq->clock_task)
+                       rq->clock_task = rq->clock - irq_time;

This means that we will only update rq->clock_task if it is smaller than
rq->clock.  So, eventually over time, rq->clock_task becomes the maximum
value that rq->clock can ever be.  Or in other words, the maximum value
of sched_clock_cpu().

Once that has been reached, although rq->clock will wrap back to zero,
rq->clock_task will not, and so (I think) task execution time accounting
effectively stops dead.

I guess this hasn't been noticed on x86 as they have a 64-bit sched_clock,
and so need to wait a long time for this to be noticed.  However, on ARM
where we tend to have 32-bit counters feeding sched_clock(), this value
will wrap far sooner.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/