lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20130928141310.302530@gmx.com>
Date:	Sat, 28 Sep 2013 16:13:10 +0200
From:	"Tibor Billes" <tbilles@....com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Unusually high system CPU usage with recent kernels

Hi Paul,

I was just wondering if you received my last mail, because I haven't heard from
you for a while now.

Tibor


> ----- Original Message -----
> From: Tibor Billes
> Sent: 09/14/13 03:59 PM
> To: paulmck@...ux.vnet.ibm.com
> Subject: Re: Unusually high system CPU usage with recent kernels
> 
> > From: Paul E. McKenney Sent: 09/13/13 02:19 AM
> > On Wed, Sep 11, 2013 at 08:46:04AM +0200, Tibor Billes wrote:
> > > > From: Paul E. McKenney Sent: 09/09/13 10:44 PM
> > > > On Mon, Sep 09, 2013 at 09:47:37PM +0200, Tibor Billes wrote:
> > > > > > From: Paul E. McKenney Sent: 09/08/13 08:43 PM
> > > > > > On Sun, Sep 08, 2013 at 07:22:45PM +0200, Tibor Billes wrote:
> > > > > > > Good news Paul, the above patch did solve this issue :) I see no extra
> > > > > > > context switches, no extra CPU usage and no extra compile time.
> > > > > > 
> > > > > > Woo-hoo!!! ;-)
> > > > > > 
> > > > > > May I add your Tested-by to the fix?
> > > > > 
> > > > > Yes, please :)
> > > > 
> > > > Done! ;-)
> > > > 
> > > > > > > Any idea why couldn't you reproduce this? Why did it only hit my system?
> > > > > > 
> > > > > > Timing, maybe? Another question is "Why didn't lots of people complain
> > > > > > about this?" It would be good to find out because it is quite possible
> > > > > > that there is some other bug that this patch is masking -- or even just
> > > > > > making less probable.
> > > > > 
> > > > > Good point!
> > > > > 
> > > > > > If you are interested, please build one of the old kernels but with
> > > > > > CONFIG_RCU_TRACE=y. Then run something like the following as root
> > > > > > concurrently with your workload:
> > > > > > 
> > > > > > sleep 10
> > > > > > echo 1 > /sys/kernel/debug/tracing/events/rcu/enable
> > > > > > sleep 0.01
> > > > > > echo 0 > /sys/kernel/debug/tracing/events/rcu/enable
> > > > > > cat /sys/kernel/debug/tracing/trace > /tmp/trace
> > > > > > 
> > > > > > Send me the /tmp/trace file, which will probably be a few megabytes in
> > > > > > size, so please compress it before sending. ;-) A good compression
> > > > > > tool should be able to shrink it by a factor of 20 or thereabouts.
> > > > > 
> > > > > Ok, I did that. Twice! The first is with commit
> > > > > 910ee45db2f4837c8440e770474758493ab94bf7, which was the first bad commit
> > > > > according to the bisection I did initially. Second with the current
> > > > > mainline 3.11. I have little idea of what the fields and lines mean in
> > > > > the RCU trace files, so I'm not going to guess if they are essentially
> > > > > the same or not, but it may provide more information to you. Both files
> > > > > were created by using a shell script containing the commands you
> > > > > suggested.
> > > > 
> > > > So traces both correspond to bad cases, correct? They are both quite
> > > > impressive -- looks like you have quite the interrupt rate going on there!
> > > > Almost looks like interrupts are somehow getting enabled on the path
> > > > to/from idle.
> > > > 
> > > > Could you please also send along a trace with the fix applied?
> > > 
> > > Sure. The attached tar file contains traces of good kernels. The first is with
> > > version 3.9.7 (no patch applied) which was the last stable kernel I tried and
> > > didn't have this issue. The second is version 3.11 with your fix applied.
> > > Judging by the size of the traces, 3.11.0+ is still doing more work than
> > > 3.9.7.
> > 
> > Indeed, though quite a bit less than the problematic traces.
> > 
> > Did you have all three patches applied to 3.11.0, or just the last one?
> > If the latter, could you please try it with all three?
> 
> Only the last one was applied to 3.11.0. The attachement now contains the
> RCU trace with all thee applied. It seems to be smaller in size, but still
> not close to 3.9.7.
> 
> > > > > I'm not sure about LKML policies about attaching not-so-small files to
> > > > > emails, so I've dropped LKML from the CC list. Please CC the mailing
> > > > > list in your reply.
> > > > 
> > > > Done!
> > > > 
> > > > Another approach is to post the traces on the web and send the URL to
> > > > LKML. But whatever works for you is fine by me.
> > > 
> > > Sending directly to you won again :) Could you please CC the list in your
> > > reply?
> > 
> > Done! ;-)
> 
> Could you please CC the list in your reply again? :)
> 
> Tibor

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ