lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fdd9b2ee30894cfe8d9829d7f86f3fc3a8c77a1e.camel@redhat.com>
Date: Tue, 20 Feb 2024 11:28:31 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Florian Westphal <fw@...len.de>, netdev@...r.kernel.org
Cc: syzbot+bfde3bef047a81b8fde6@...kaller.appspotmail.com
Subject: Re: [PATCH net] net: ip_tunnel: do not adjust device headroom on
 xmit

On Fri, 2024-02-16 at 13:01 +0100, Florian Westphal wrote:
> syzkaller triggered following kasan splat:
> BUG: KASAN: use-after-free in __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
> Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191
> [..]
>  kasan_report+0xda/0x110 mm/kasan/report.c:588
>  __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
>  skb_flow_dissect_flow_keys include/linux/skbuff.h:1514 [inline]
>  ___skb_get_hash net/core/flow_dissector.c:1791 [inline]
>  __skb_get_hash+0xc7/0x540 net/core/flow_dissector.c:1856
>  skb_get_hash include/linux/skbuff.h:1556 [inline]
>  ip_tunnel_xmit+0x1855/0x33c0 net/ipv4/ip_tunnel.c:748
>  ipip_tunnel_xmit+0x3cc/0x4e0 net/ipv4/ipip.c:308
>  __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4954 [inline]
>  xmit_one net/core/dev.c:3548 [inline]
>  dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
>  __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4349
>  dev_queue_xmit include/linux/netdevice.h:3134 [inline]
>  neigh_connected_output+0x42c/0x5d0 net/core/neighbour.c:1592
>  ...
>  ip_finish_output2+0x833/0x2550 net/ipv4/ip_output.c:235
>  ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
>  ..
>  iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
>  ip_tunnel_xmit+0x1dbc/0x33c0 net/ipv4/ip_tunnel.c:831
>  ipgre_xmit+0x4a1/0x980 net/ipv4/ip_gre.c:665
>  __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4954 [inline]
>  xmit_one net/core/dev.c:3548 [inline]
>  dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
>  ...
> 
> The splat occurs because skb->data points past skb->head allocated
> area.  This is because neigh layer does:
>   __skb_pull(skb, skb_network_offset(skb));
> 
> ... but skb_network_offset() returns a negative offset and
> __skb_pull() arg is unsigned.  IOW, we skb->data gets "adjusted"
> by a huge value.
> 
> The negative value is returned because skb->head and skb->data distance is
> more than 64k and skb->network_header (u16) has wrapped around.
> 
> The bug is in the ip_tunnel infrastructure, which can cause
> dev->needed_headroom to increment ad infinitum.
> 
> The syzkaller reproducer consists of packets getting routed via a gre
> tunnel, and route of gre encapsulated packets pointing at another (ipip)
> tunnel.  The ipip encapsulation finds gre0 as next output device.
> 
> This results in the following pattern:
> 
> 1). First packet is to be sent out via gre0.
> Route lookup found an output device, ipip0.
> 
> 2).
> ip_tunnel_xmit for gre0 bumps gre0->needed_headroom based on the
> future output device, rt.dev->needed_headroom (ipip0).
> 
> 3).
> ip output / start_xmit moves skb on to ipip0. which runs the same
> code path again (xmit recursion).
> 
> 4).
> Routing step for the post-gre0-encap packet finds gre0 as output
> device to use for ipip0 encapsulated packet.
> 
> tunl0->needed_headroom is then incremented based on the (already
> bumped) gre0 device headroom.
> 
> This repeats for every future packet:
> 
> gre0->needed_headroom gets inflated because previous packets'
> ipip0 step incremented rt->dev (gre0) headroom, and ipip0 incremented
> because gre0 needed_headroom was increased.
> 
> For each subsequent packet, gre/ipip0->needed_headroom grows until
> post-expand-head reallocations result in a skb->head/data distance of
> more than 64k.
> 
> Once that happens, skb->network_header (u16) wraps around when
> pskb_expand_head tries to make sure that skb_network_offset() is
> unchanged after the headroom expansion/reallocation.
> 
> After this skb_network_offset(skb) returns a different (and
> negative) result post headroom expansion.
> 
> The next trip to neigh layer (or anything else that would __skb_pull
> the network header) makes skb->data point to a memory location outside
> skb->head area.
> 
> Remove this optimization.
> 
> Alternative would be to cap the needed_headroom update to a reasonable
> upperlimit such as 256 to prevent growth.

I fear this will cause visible regression for tunnels in metadata mode:
the egress device should not be available at creation time, so the
guessed needed_headroom should be low. I recall (very old) performance
tests showing head reallocation as quite noticeable while traversing
tunnels. 

I think enforcing the upper limit should be safer and hopefully not too
complex. We could use even an higher bound (say 512) as just need to
avoid 16bits overflow, right?

Thanks!

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ