netdev - Re: Followup: Kernel memory leak on 4.11+ & 5.3.x with IPsec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7faa5b0c-d404-8b5b-3f00-f3814c5a2487@gmail.com>
Date:   Fri, 1 Nov 2019 16:31:41 -0500
From:   JD <jdtxs00@...il.com>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     netdev@...r.kernel.org
Subject: Re: Followup: Kernel memory leak on 4.11+ & 5.3.x with IPsec

On 11/1/2019 2:53 AM, Steffen Klassert wrote:
> On Wed, Oct 30, 2019 at 02:30:27PM -0500, JD wrote:
>> Here are some clear steps to reproduce:
>> - On your preferred OS, install an IPsec daemon/software
>> (strongswan/openswan/whatever)
>> - Setup a IKEv2 conn in tunnel mode. Use a RFC1918 private range for
>> your client IP pool. e.g: 10.2.0.0/16
>> - Enable IP forwarding (net.ipv4.ip_forward = 1)
>> - MASQUERADE the 10.2.0.0/16 range using iptables, e.g: "-A
>> POSTROUTING -s 10.2.0.0/16 -o eth0 -j MASQUERADE"
>> - Connect some IKEv2 clients (any device, any platform, doesn't
>> matter) and pass traffic through the tunnel.
>> ^^ It speeds up the leak if you have multiple tunnels passing traffic
>> at the same time.
>>
>> - Observe memory is lost over time and never recovered. Doesn't matter
>> if you restart the daemon, bring down the tunnels, or even unload
>> xfrm/ipsec modules. The memory goes into the void. Only way to reclaim
>> is by restarting completely.
>>
>> Please let me know if anything further is needed to diagnose/debug
>> this problem. We're stuck with the 4.9 kernel because all newer
>> kernels leak memory. Any help or advice is appreciated.
> Looks like we forget to free the page that we use for
> skb page fragments when deleting the xfrm_state.
>
> Can you please try the patch below? I don't have access
> to my test environment today, so this patch is untested.
> I'll try to do some tests on Monday.
>
>
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index c6f3c4a1bd99..f3423562d933 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -495,6 +495,8 @@ static void ___xfrm_state_destroy(struct xfrm_state *x)
>   		x->type->destructor(x);
>   		xfrm_put_type(x->type);
>   	}
> +	if (x->xfrag.page)
> +		put_page(x->xfrag.page);
>   	xfrm_dev_state_free(x);
>   	security_xfrm_state_free(x);
>   	xfrm_state_free(x);

Hi Steffen,

Thanks for your reply and patch. It applied cleanly to 5.3.8.

Early results are looking solid. Been running a test for approx 4 hours 
and the memory appears to be staying consistently the same.

I will keep the test running over the weekend just to make sure, and 
I'll follow up with you on Monday. If you still want to test it/verify 
it's fixed, please feel free.