linux-kernel - Re: System CPU increasing on idle 2.6.36

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <AANLkTi=6OydKWE+gpo5vgQ1DWgDOiRRAjFB_c2ABehgG@mail.gmail.com>
Date:	Fri, 7 Jan 2011 17:30:46 -0800
From:	Mark Moseley <moseleymark@...il.com>
To:	Simon Kirby <sim@...tway.ca>
Cc:	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: System CPU increasing on idle 2.6.36

On Fri, Jan 7, 2011 at 4:52 PM, Simon Kirby <sim@...tway.ca> wrote:
> On Fri, Jan 07, 2011 at 10:05:39AM -0800, Mark Moseley wrote:
>
>> NOTE: NFS/Kernel guys: I've left the 2.6.36.2 box still thrashing
>> about in case there's something you'd like me to look at.
>>
>> Ok, both my 2.6.36.2 kernel box and my 2.6.35->2.6.36 bisect box (was
>> bisected to ce7db282a3830f57f5b05ec48288c23a5c4d66d5 -- this is my
>> first time doing bisect, so I'll preemptively apologize for doing
>
> That's a merge commit, so that doesn't sound right (it doesn't contain
> changes by itself).  How were you running the bisect?  Were you
> definitely seeing a good case since v2.6.35?  Could you post your
> "git bisect log" so I can see if I can follow it?

Here's 'git bisect log':

# git bisect log
git bisect start
# bad: [f6f94e2ab1b33f0082ac22d71f66385a60d8157f] Linux 2.6.36
git bisect bad f6f94e2ab1b33f0082ac22d71f66385a60d8157f
# good: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect good 9fe6206f400646a2322096b56c59891d530e8d51
# good: [78417334b5cb6e1f915b8fdcc4fce3f1a1b4420c] Merge branch
'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
git bisect good 78417334b5cb6e1f915b8fdcc4fce3f1a1b4420c
# good: [14a4fa20a10d76eb98b7feb25be60735217929ba] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect good 14a4fa20a10d76eb98b7feb25be60735217929ba
# bad: [ce7db282a3830f57f5b05ec48288c23a5c4d66d5] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
git bisect bad ce7db282a3830f57f5b05ec48288c23a5c4d66d5

I did an initial checkout of stable
(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git)
and (after git bisect start), did a 'git bisect good v2.6.35; git
bisect bad v2.6.36' but that seemed to be way too old. I tried to
approximate 2.6.35.9->2.6.36 by doing 'git bisect good
ea8a52f9f4bcc3420c38ae07f8378a2f18443970; git bisect bad
f6f94e2ab1b33f0082ac22d71f66385a60d8157f' but that had compilation
errors so I did 'git bisect good
9fe6206f400646a2322096b56c59891d530e8d51' which compiled ok -- heh,
did I mention this was my first outing with 'git bisect' :)


> Yeah, I was seeing load proportional to these, but I don't think their
> growth is anything wrong.  It just seems to be something that is taking
> up CPU from flush proceses as they grow, and something which is causing a
> thundering herd of reclaim with lots of spinlock contention.
>
> Hmmm.. I tried to see if drop_caches would clear nfs_inode_cache and thus
> the CPU usage.  The top of slabtop before doing anything:
>
>  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 1256343 1221799  97%    0.95K  38071       33   1218272K nfs_inode_cache
>
> # echo 2 > /proc/sys/vm/drop_caches
>
> this took about 40 seconds to execute, and perf top showed this for
> pretty much the whole time:
>
>             samples  pcnt function                 DSO
>             _______ _____ ________________________ ___________
>
>             1188.00 45.1% _raw_spin_lock           [kernel]
>              797.00 30.2% rpcauth_cache_shrinker   [kernel]
>              132.00  5.0% shrink_slab              [kernel]
>               60.00  2.3% queue_io                 [kernel]
>               59.00  2.2% _cond_resched            [kernel]
>               44.00  1.7% nfs_writepages           [kernel]
>               35.00  1.3% writeback_single_inode   [kernel]
>               34.00  1.3% writeback_sb_inodes      [kernel]
>               23.00  0.9% do_writepages            [kernel]
>               20.00  0.8% bit_waitqueue            [kernel]
>               15.00  0.6% redirty_tail             [kernel]
>               11.00  0.4% __iget                   [kernel]
>               11.00  0.4% __GI_strstr              libc-2.7.so
>
> slabtop then showed:
>
>  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 1256211 531720  42%    0.95K  38067       33   1218144K nfs_inode_cache
>
> and this doesn't change further with drop_caches=3 or 1.  It's strange
> that most of them were freed, but not all...  We do actually have
> RPC_CREDCACHE_DEFAULT_HASHBITS set to 10 at build time, which might
> explain a bigger credcache and time to free that.  Did you change your
> auth_hashtable_size at all?  Anyway, even with all of that, system CPU
> doesn't change.

Oh, now that's interesting. I'm setting RPC_CREDCACHE_DEFAULT_HASHBITS
to 8 (and have been since Jan, so like 2.6.33 or something) -- I was
having the same issue as in this thread,
http://www.spinics.net/lists/linux-nfs/msg09261.html. Bumping up
RPC_CREDCACHE_DEFAULT_HASHBITS was like a miracle on some of our
systems. I was going to get around to switching to the 'right way' to
do that as a boot param one of these days, instead of a hack.

In the bisected kernel, I definitely did *not* touch
RPC_CREDCACHE_DEFAULT_HASHBITS and it blew up anyway. If it hadn't
completely died, it'd have been nice to see what 'perf top' said.


> Ok, weird, Running "sync" actually caused "vmstat 1" to go from 25% CPU
> spikes every 5 seconds with nfs_writepages in the "perf top" every 5
> seconds to being idle!  /proc/vmstat showed only nr_dirty 8 and
> nr_writeback 0 before I ran "sync".  Either that, or it was just good
> timing.
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 6767892   9696  41088    0    0     5   180   15   70 62  7 29  1
>  0  0      0 6767892   9696  41088    0    0     0     0 1124 1978  0  0 100  0
>  0  0      0 6767892   9696  41088    0    0     0     0 1105 1967  0  0 100  0
>  0  0      0 6767892   9696  41088    0    0     0     0 1180 1984  0  2 98  0
>  2  0      0 6767892   9696  41088    0    0     0     0 2102 1968  0 22 78  0
>  0  0      0 6767768   9704  41084    0    0     0    52 1504 2022  0  8 91  1
>  0  0      0 6767768   9704  41084    0    0     0     0 1059 1965  0  0 100  0
>  0  0      0 6767768   9704  41088    0    0     0     0 1077 1982  0  0 100  0
>  0  0      0 6767768   9704  41088    0    0     0     0 1166 1972  1  2 97  0
>  0  0      0 6767768   9704  41088    0    0     0     0 1353 1990  0  6 94  0
>  0  0      0 6767644   9704  41100    0    0     0     0 2531 2050  0 27 72  0
>  0  0      0 6767644   9704  41100    0    0     0     0 1065 1974  0  0 100  0
>  1  0      0 6767644   9704  41100    0    0     0     0 1090 1967  0  0 100  0
>  0  0      0 6767644   9704  41100    0    0     0     0 1175 1971  0  2 98  0
>  0  0      0 6767644   9704  41100    0    0     0     0 1321 1967  0  6 94  0
>  1  0      0 6767644   9712  41100    0    0     0    52 1995 1995  0 18 81  1
>  0  0      0 6767644   9712  41100    0    0     0     0 1342 1964  0  6 94  0
>  0  0      0 6767644   9712  41100    0    0     0     0 1064 1974  0  0 100  0
>  0  0      0 6767644   9712  41100    0    0     0     0 1163 1968  0  2 98  0
>  0  0      0 6767644   9712  41100    0    0     0     0 1390 1976  1  6 94  0
>  0  0      0 6767768   9712  41112    0    0     0     0 1282 2018  0  1 99  0
>  0  0      0 6767768   9712  41112    0    0     0     0 2166 1972  0 24 76  0
>  0  0      0 6786260   9712  41112    0    0     0     0 1092 1989  0  0 100  0
>  0  0      0 6786260   9712  41112    0    0     0     0 1039 2003  0  2 98  0
>  0  0      0 6774588   9712  41112    0    0     0     0 1524 2015  3  9 89  0
>  0  1      0 6770124   9716  41112    0    0     0    48 1315 1990  3  0 87 11
>  1  0      0 6769628   9720  41112    0    0     0     4 2196 1966  0 20 79  0
>  0  0      0 6768364   9724  41108    0    0    36     0 1403 2031  1  5 94  1
>  0  0      0 6768388   9724  41144    0    0     0     0 1185 1969  0  2 98  0
>  0  0      0 6768388   9724  41144    0    0     0     0 1351 1975  0  6 94  0
>  0  0      0 6768140   9724  41156    0    0     0     0 1261 2020  0  1 99  0
>  0  0      0 6768140   9724  41156    0    0     0     0 1053 1973  0  0 100  0
>  1  0      0 6768140   9724  41156    0    0     0     0 2194 1971  1 26 73  0
>  0  0      0 6768140   9732  41148    0    0     0    52 1283 2088  0  4 96  0
>  2  0      0 6768140   9732  41156    0    0     0     0 1184 1965  0  2 98  0
>  0  0      0 6786368   9732  41156    0    0     0     0 1297 2000  0  4 97  0
>  0  0      0 6786384   9732  41156    0    0     0     0  998 1949  0  0 100  0
> (stop vmstat 1, run sync, restart vmstat 1)
>  1  0      0 6765304   9868  41736    0    0     0     0 1239 1967  0  0 100  0
>  0  0      0 6765304   9868  41736    0    0     0     0 1174 2035  0  0 100  0
>  0  0      0 6765304   9868  41736    0    0     0     0 1084 1976  0  0 100  0
>  1  0      0 6765172   9868  41736    0    0     0     0 1100 1975  0  0 100  0
>  0  0      0 6765304   9868  41760    0    0     0     0 1501 1997  1  1 98  0
>  0  0      0 6765304   9868  41760    0    0     0     0 1092 1964  0  0 100  0
>  1  0      0 6765304   9868  41760    0    0     0     0 1096 1963  0  0 100  0
>  0  0      0 6765304   9868  41760    0    0     0     0 1083 1967  0  0 100  0
>  0  0      0 6765304   9876  41752    0    0     0    76 1101 1980  0  0 99  1
>  0  0      0 6765304   9876  41760    0    0     0     4 1199 2097  0  2 98  0
>  0  0      0 6765304   9876  41760    0    0     0     0 1055 1964  0  0 100  0
>  0  0      0 6765304   9876  41760    0    0     0     0 1064 1977  0  0 100  0
>  0  0      0 6765304   9876  41760    0    0     0     0 1073 1968  0  0 100  0
>  0  0      0 6765304   9876  41760    0    0     0     0 1080 1976  0  0 100  0
>  0  0      0 6765156   9884  41772    0    0     0    20 1329 2034  1  1 98  1
>  0  0      0 6765180   9884  41780    0    0     0     0 1097 1971  0  1 99  0
>  0  0      0 6765180   9884  41780    0    0     0     0 1073 1960  0  0 100  0
>  0  0      0 6765180   9884  41780    0    0     0     0 1079 1976  0  0 100  0
>  0  0      0 6765172   9884  41780    0    0     0     0 1120 1980  0  0 100  0
>  0  0      0 6765180   9892  41772    0    0     0    60 1097 1982  0  0 100  0
>  0  0      0 6765180   9892  41780    0    0     0     0 1067 1969  0  0 100  0
>  0  0      0 6765180   9892  41780    0    0     0     0 1063 1973  0  0 100  0
>  0  0      0 6765180   9892  41780    0    0     0     0 1064 1969  0  1 100  0
>
> Hmmm.  I'll watch it and see if it climbs back up again.  That would mean
> that for now, echo 2 > /proc/sys/vm/drop_caches ; sync is a workaround
> for the problem we're seeing with system CPU slowly creeping up since
> 2.6.36.  I have no idea if this affects the sudden flush process spinlock
> contention and attempted writeback without much progress during normal
> load, but it certainly changes things here.
>
> http://0x.ca/sim/ref/2.6.37/cpu_after_drop_caches_2_and_sync.png

That's pretty painful to have to do though (less painful than the box
becoming unusable, I'll admit). I wonder if when I did a umount on
mine (which must've sync'd at least that mount), it gave something a
big enough kick for the flush-0:xx threads to unstick themselves,
presumably what happened with your sync (though I didn't drop the
cache).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/