netdev - XFRM/IPsec memory leak

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAMnf+Ph-Bx=WzxTpXuc8H6vXtDf-7Z52ywDRt9Gm6WN0X14PFg@mail.gmail.com>
Date:   Wed, 8 May 2019 18:38:36 -0500
From:   Unknown Unknown <jdtxs00@...il.com>
To:     netdev@...r.kernel.org
Subject: XFRM/IPsec memory leak

Hello all,

I've been looking into a severe kernel memory leak (120MB per day)
with xfrm/ipsec for the past few weeks and I'm a bit stuck on it. Here
is my configuration/setup and a bit of background.

==== Affected kernels (only tested x86-64) ====
3.x
4.4.x
4.14.x
4.19.x
5.0
5.1

==== Setup/config ====
CentOS 7.6.1810 64bit
KVM virtualization (QEMU)
strongSwan U5.7.2 - IKEv2 in tunnel mode, IPv4 traffic only.

==== Some background ====
I have a few hundred IKEv2 tunnels established on a few virtual
machines, and I had noticed them running out of memory and triggering
OOMkiller.

These virtual machines are running the 4.14 series kernel. I looked at
userspace memory usage with various tools and /proc/meminfo, top/htop,
and saw nothing using the memory at all.

I reviewed slabtop & smem and saw the kernel uncached memory usage was
extremely high, and slabtop showed an excessive amount of objects
inside of the kmalloc-1024 slab.

~]# grep -w "kmalloc-1024" /proc/slabinfo  | tail -n1
kmalloc-1024       35552  35856   1024   16    4 : tunables    0    0
 0 : slabdata   2241   2241      0

~]# smem -tkw
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          5.0G       3.8G       1.2G
userspace memory             637.3M      83.1M     554.2M
free memory                    2.2G       2.2G          0
----------------------------------------------------------
                               7.8G       6.1G       1.7G

~]# cat /proc/meminfo
MemTotal:        8170884 kB
MemFree:         2314448 kB
MemAvailable:    5655448 kB
Buffers:          297816 kB
Cached:          3501628 kB
SwapCached:            0 kB
Active:          2943096 kB
Inactive:        1427776 kB
Active(anon):     842004 kB
Inactive(anon):   160604 kB
Active(file):    2101092 kB
Inactive(file):  1267172 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                96 kB
Writeback:             0 kB
AnonPages:        563076 kB
Mapped:            88336 kB
Shmem:            431180 kB
Slab:             361640 kB
SReclaimable:     278508 kB
SUnreclaim:        83132 kB
KernelStack:        4928 kB
PageTables:        22036 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4085440 kB
Committed_AS:    1586480 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:    346112 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      507772 kB
DirectMap2M:     7880704 kB


I tried stopping every piece of software on the entire machine, even
manually clearing out all xfrm policies/states. Nothing reclaimed the
memory back. Only fully rebooting the virtual machine gave us the
memory back.

==== Debugging it ====
>From there, I decided to debug the issue further by building a 4.14
kernel with KMEMLEAK config options enabled. This got me some results
and a clue,

~]# grep "comm" /sys/kernel/debug/kmemleak | grep -v "softirq" | awk
'{print $2}' | cut -d '"' -f2 | sort | uniq -c | sort -n -r
16392 charon

Here's the backtrace for that, all of them look identical to this,
just with a different pointer address.

unreferenced object 0xffff8881b1185000 (size 1024):
comm "charon", pid 3878, jiffies 4703093548 (age 1220.692s)
hex dump (first 32 bytes):
80 50 1c 82 ff ff ff ff 00 00 00 00 00 00 00 00 .P..............
00 02 00 00 00 00 ad de 00 01 00 00 00 00 ad de ................
backtrace:
[<ffffffff817ed5da>] kmemleak_alloc+0x4a/0xa0
[<ffffffff8122f8de>] kmem_cache_alloc_trace+0xce/0x1d0
[<ffffffff81747530>] xfrm_policy_alloc+0x30/0x110
[<ffffffff81758395>] xfrm_policy_construct+0x25/0x230
[<ffffffff81758658>] xfrm_add_policy+0xb8/0x170
[<ffffffff81757894>] xfrm_user_rcv_msg+0x1b4/0x1e0
[<ffffffff816dae0f>] netlink_rcv_skb+0xdf/0x120
[<ffffffff81756a35>] xfrm_netlink_rcv+0x35/0x50
[<ffffffff816da55d>] netlink_unicast+0x18d/0x260
[<ffffffff816da90f>] netlink_sendmsg+0x2df/0x3d0
[<ffffffff8167916e>] sock_sendmsg+0x3e/0x50
[<ffffffff81679652>] SYSC_sendto+0x102/0x190
[<ffffffff8167b1ee>] SyS_sendto+0xe/0x10
[<ffffffff81003959>] do_syscall_64+0x79/0x1b0
[<ffffffff81800081>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<ffffffffffffffff>] 0xffffffffffffffff

So, I decided to upgrade to the 4.19 kernel to see if it was fixed.
While it appeared that kmemleak was not reporting anything, memory was
still being lost quickly on the machine.

I then upgraded to the 5.1 mainline kernel, but the kernel memory leak
was still happening, despite no reports from kmemleak.

==== Reproducing the problem ====

>From my testing, the following can be done to reproduce the leak on
all kernel versions:
- Bring up multiple IKEv2 tunnels
- Pass IPv4 traffic through the tunnel(s) (if you simply bring up the
tunnel and pass no traffic, the leak does not seem to happen.)
- Observe kernel memory usage grow over time

With a load of ~100 IKEv2 tunnels, and 200Mbps traffic between all of
them, I saw a leak ~121MB per 24 hours.

I have tried all varieties of hardware (single CPU, dual cpu), NIC's
(bridging/SR-IOV), kernels, and it happens on every configuration I
tried.


Does anyone know what might be causing this or have any advice on
debugging this further?