[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121019205055.2b258d09@sacrilege>
Date: Fri, 19 Oct 2012 20:50:55 +0600
From: Mike Kazantsev <mk.fraggod@...il.com>
To: linux-mm@...ck.org
Cc: paul@...l-moore.com, netdev@...r.kernel.org
Subject: PROBLEM: Memory leak (at least with SLUB) from "secpath_dup" (xfrm)
in 3.5+ kernels
Good day,
There seem to be a large slab memory leak in standard (kernel.org)
kernel with the specific configuration and workload I have here.
From what I can tell at the moment, it appears to be a leak in
IPSec-related xfrm code.
It was really noticeable on several different physical machines
with same kernel configuration but different worloads since I've
upgraded to kernel 3.5.0.
Graph of total slab usage (+ total available RAM) on these machines:
http://i.imgur.com/IyPqA.png
Presence of some leak can clearly be seen over time, and it caused
near-OOM condition several times now.
Sharp drops in memory usage indicates reboot, which, I'm afraid, with
such condition, has to be done at the regular intervals.
Initially I thought that it was triggered by heavy filesystem load, but
today finally got around to reboot one of the machines with
slub_debug=U and it doesn't seem to be the case.
slabtop showed "kmalloc-64" being the 99% offender in the past, but
with recent kernels (3.6.1), it has changed to "secpath_cache",
alloc_calls in /sys/kernel/slab/secpath_cache/ lists only the following:
2779138 secpath_dup+0x1b/0x5a age=400/169538/326767 pid=0-1543 cpus=0-3
And free_calls lists these two lines:
2543886 <not-available> age=4295223985 pid=0 cpus=0
235252 __secpath_destroy+0x3e/0x43 age=1651/174629/327902 pid=0-1519 cpus=0-3
Contents of all paths available in /sys/kernel/slab/secpath_cache/
and "slabtop -o" output should be attached to this mail.
These were taken after heavy network + fs i/o load (rsync from a
different machine over network) after ~10-20min.
"secpath_dup" seem to be ipsec-related call, and all machines in
question communicate over IPSec almost exclusively all the time
(openswan-2.6.37 userspace at the moment).
As noted, the problem is highly reproducible - all I have to do is to
run rsync or something similar between these nodes for a few minutes.
All machines in question have x86_64 kernel 3.6.1 now, but I'll
probably update it to 3.6.2 in a moment.
Keywords:
linux kernel networking mm slub slab secpath_dup secpath_cache xfrm
ipsec 3.5 3.6 memory leak oom slabtop x86 x86_64 amd64
/proc/version:
Linux version 3.6.1-fg.mf_master (root@...thema) (gcc version 4.6.3
(Exherbo gcc-4.6.3-r1) ) #1 SMP Sat Oct 13 04:21:08 YEKT 2012
Other information about the system (as per REPORTING-BUGS) is
attached, also including slabtop and slub_debug-related /sys paths
output/contents.
--
Mike Kazantsev // fraggod.net
View attachment "cpuinfo.txt" of type "text/plain" (2948 bytes)
View attachment "iomem.txt" of type "text/plain" (1988 bytes)
View attachment "ioports.txt" of type "text/plain" (1450 bytes)
View attachment "lspci.txt" of type "text/plain" (25996 bytes)
View attachment "modules.txt" of type "text/plain" (2943 bytes)
View attachment "slabtop_output.txt" of type "text/plain" (1657 bytes)
Download attachment "slub_leak_secpath_dup_slub_debug.tar.gz" of type "application/x-gzip" (979 bytes)
View attachment "ver_linux.txt" of type "text/plain" (1355 bytes)
Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)
Powered by blists - more mailing lists