linux-kernel - next-20130206 x86_64- high CPU usage, spinlock issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20493.1361221949@turing-police.cc.vt.edu>
Date:	Mon, 18 Feb 2013 16:12:29 -0500
From:	Valdis Kletnieks <Valdis.Kletnieks@...edu>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>
Cc:	linux-kernel@...r.kernel.org
Subject: next-20130206 x86_64- high CPU usage, spinlock issue

Over the weekend, I had a number of occurrences of my laptop becoming
unresponsive for periods of up to several minutes.  gkrellm monitors showed
near 100% system time for both CPUs (and the way X and other userspace programs
were behaving was consistent with them being starved for CPU for extended
periods - at one point gkrellm was unable to update the display for over 4
minutes).  The problem would hit, everything would get sluggish/stop for
anywhere from 15 seconds or so to several minutes - and then it would just
as mysteriously go away for no apparent reason, only to return a bit later...

The only other clue I was able to get was running 'perf top' to see where
the CPU was going - anytime the kernel was in this state, I'd see the top
two entries were consistently delay_tsc and do_raw_read_lock:

 25.29%  [kernel]                                        [k] delay_tsc
 23.47%  [kernel]                                        [k] do_raw_read_lock
  8.01%  [kernel]                                        [k] ext4_es_reclaim_extents_count
  7.44%  [kernel]                                        [k] ftrace_likely_update
  2.80%  [kernel]                                        [k] lock_release
  2.79%  [kernel]                                        [k] do_raw_read_unlock
  2.65%  [kernel]                                        [k] lock_acquire

 15.33%  [kernel]                                        [k] delay_tsc
 14.02%  [kernel]                                        [k] do_raw_read_lock
  6.80%  libc-2.17.so                                    [.] re_search_internal
  5.55%  libmagic.so.1.0.0                               [.] 0x000000000000af9a
  5.16%  [kernel]                                        [k] ftrace_likely_update
  4.79%  [kernel]                                        [k] ext4_es_reclaim_extents_count

 22.94%  [kernel]                                        [k] delay_tsc
 19.84%  [kernel]                                        [k] do_raw_read_lock
  6.92%  [kernel]                                        [k] ext4_es_reclaim_extents_count
  6.79%  [kernel]                                        [k] ftrace_likely_update
  2.62%  libc-2.17.so                                    [.] re_search_internal

I'm not sure why ext4_es-reclaim_extents_count appears a lot - the workload
that was running at the time was seeing some memory pressure due to a large
Firefox process, and a 'find | xargs file' command causing a lot of inode and
data read I/O, but there shouldn't have been a lot of reclaims of extents as
not much was being written anyplace?

I can try to reproduce the issue - any suggestions how to efficiently tell
who/what is doing the contending in do_raw_read_lock?

Content of type "application/pgp-signature" skipped