lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19707.34405.791777.298955@pilspetsen.it.uu.se>
Date:	Sun, 5 Dec 2010 13:32:37 +0100
From:	Mikael Pettersson <mikpe@...uu.se>
To:	Mikael Pettersson <mikpe@...uu.se>
Cc:	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG] 2.6.37-rc3 massive interactivity regression on ARM

Mikael Pettersson writes:
 > The scenario is that I do a remote login to an ARM build server,
 > use screen to start a sub-shell, in that shell start a largish
 > compile job, detach from that screen, and from the original login
 > shell I occasionally monitor the compile job with top or ps or
 > by attaching to the screen.
 > 
 > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
 > very sluggish: top takes forever to start, once started it shows no
 > activity from the compile job (it's as if it's sleeping on a lock),
 > and ps also takes forever and shows no activity from the compile job.
 > 
 > Rebooting into 2.6.36 eliminates these issues.
 > 
 > I do pretty much the same thing (remote login -> screen -> compile job)
 > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
 > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
 > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
 > 
 > Has anyone else seen this? Any ideas about the cause?

(Re-followup since I just realised my previous followups were to Rafael's
regressions mailbot rather than the original thread.)

> The bug is still present in 2.6.37-rc4.  I'm currently trying to bisect it.

git bisect identified

[305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task

as the cause of this regression.  Reverting it from 2.6.37-rc4 (requires some
hackery due to subsequent changes in the same area) restores sane behaviour.

The original patch submission talks about irq-heavy scenarios.  My case is the
exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
bound in userspace but expected to schedule quickly when needed (e.g. running
top or ps or just hitting CR in one shell while another runs a compile job).

I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
(x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.

So it looks like an ARM-only issue, possibly depending on platform specifics.

One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
much higher on Kirkwood, even when the machine is idle.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ