lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 19 Oct 2012 20:50:55 +0600
From:	Mike Kazantsev <mk.fraggod@...il.com>
To:	linux-mm@...ck.org
Cc:	paul@...l-moore.com, netdev@...r.kernel.org
Subject: PROBLEM: Memory leak (at least with SLUB) from "secpath_dup" (xfrm)
 in 3.5+ kernels

Good day,

There seem to be a large slab memory leak in standard (kernel.org)
kernel with the specific configuration and workload I have here.
From what I can tell at the moment, it appears to be a leak in
IPSec-related xfrm code.


It was really noticeable on several different physical machines
with same kernel configuration but different worloads since I've
upgraded to kernel 3.5.0.

Graph of total slab usage (+ total available RAM) on these machines:

  http://i.imgur.com/IyPqA.png

Presence of some leak can clearly be seen over time, and it caused
near-OOM condition several times now.
Sharp drops in memory usage indicates reboot, which, I'm afraid, with
such condition, has to be done at the regular intervals.

Initially I thought that it was triggered by heavy filesystem load, but
today finally got around to reboot one of the machines with
slub_debug=U and it doesn't seem to be the case.

slabtop showed "kmalloc-64" being the 99% offender in the past, but
with recent kernels (3.6.1), it has changed to "secpath_cache",
alloc_calls in /sys/kernel/slab/secpath_cache/ lists only the following:

  2779138 secpath_dup+0x1b/0x5a age=400/169538/326767 pid=0-1543 cpus=0-3

And free_calls lists these two lines:

  2543886 <not-available> age=4295223985 pid=0 cpus=0
  235252 __secpath_destroy+0x3e/0x43 age=1651/174629/327902 pid=0-1519 cpus=0-3

Contents of all paths available in /sys/kernel/slab/secpath_cache/
and "slabtop -o" output should be attached to this mail.
These were taken after heavy network + fs i/o load (rsync from a
different machine over network) after ~10-20min.

"secpath_dup" seem to be ipsec-related call, and all machines in
question communicate over IPSec almost exclusively all the time
(openswan-2.6.37 userspace at the moment).

As noted, the problem is highly reproducible - all I have to do is to
run rsync or something similar between these nodes for a few minutes.
All machines in question have x86_64 kernel 3.6.1 now, but I'll
probably update it to 3.6.2 in a moment.


Keywords:
linux kernel networking mm slub slab secpath_dup secpath_cache xfrm
ipsec 3.5 3.6 memory leak oom slabtop x86 x86_64 amd64


/proc/version: 
  Linux version 3.6.1-fg.mf_master (root@...thema) (gcc version 4.6.3
  (Exherbo gcc-4.6.3-r1) ) #1 SMP Sat Oct 13 04:21:08 YEKT 2012

Other information about the system (as per REPORTING-BUGS) is
attached, also including slabtop and slub_debug-related /sys paths
output/contents.


-- 
Mike Kazantsev // fraggod.net

View attachment "cpuinfo.txt" of type "text/plain" (2948 bytes)

View attachment "iomem.txt" of type "text/plain" (1988 bytes)

View attachment "ioports.txt" of type "text/plain" (1450 bytes)

View attachment "lspci.txt" of type "text/plain" (25996 bytes)

View attachment "modules.txt" of type "text/plain" (2943 bytes)

View attachment "slabtop_output.txt" of type "text/plain" (1657 bytes)

Download attachment "slub_leak_secpath_dup_slub_debug.tar.gz" of type "application/x-gzip" (979 bytes)

View attachment "ver_linux.txt" of type "text/plain" (1355 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ