netdev - Re: Re: [Bug] net/ipv6: skb_over_panic in mld

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181205155725.uudvssixfdy4ul6h@diconico07.dev>
Date:   Wed, 5 Dec 2018 16:57:25 +0100
From:   Nicolas Belouin <nicolas.belouin@...di.net>
To:     Benjamin Poirier <bpoirier@...e.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: Re: [Bug] net/ipv6: skb_over_panic in mld_newpack

On 05/12 15:57, Benjamin Poirier wrote:
> On 2018/12/04 11:52, Nicolas Belouin wrote:
> > On 03/12 07:59, Eric Dumazet wrote:
> > > 
> > > 
> > > On 12/03/2018 07:20 AM, Nicolas Belouin wrote:
> > > > Hi,
> > > > I ran into a panic while adding an interface to a bridge with a vxlan
> > > > interface already attached to it, as it seems related mtu=9000.
> > > > 
> > > > I get the following panic info :
> > > > 
> > > > [ 2482.419893] br100: port 2(vif1.1) entered blocking state
> > > > [ 2482.425427] br100: port 2(vif1.1) entered forwarding state
> > > > [ 2482.431797] skbuff: skb_over_panic: text:ffffffff816e4f78 len:40 put:40 head:ffff880146449000 data:ffff880146458fd0 tail:0xfff8 end:0xec0 dev:vif1.1
> > > > [ 2482.442891] ------------[ cut here ]------------
> > > > [ 2482.448254] kernel BUG at /srv/jenkins/workspace/workspace/hosting-xen-dom0-kernel/build/src/linux-4.9/net/core/skbuff.c:105!
> > > > [ 2482.459009] invalid opcode: 0000 [#1] PREEMPT SMP
> > > > [ 2482.464371] Modules linked in:
> > > > [ 2482.469682] CPU: 19 PID: 1317 Comm: kworker/19:1 Not tainted 4.9.135-dom0-e9d15b2-x86_64-iaas #2
> > > > [ 2482.480362] Hardware name: Dell Inc. PowerEdge C8220/09N44V, BIOS 2.7.1 03/04/2015
> > > > [ 2482.491008] Workqueue: ipv6_addrconf addrconf_dad_work
> > > > [ 2482.496380] task: ffff88017eef1a00 task.stack: ffffc90001fcc000
> > > > [ 2482.501785] RIP: e030:[<ffffffff815ed71f>]  [<ffffffff815ed71f>] skb_panic+0x5f/0x70
> > > > [ 2482.512450] RSP: e02b:ffffc90001fcfba8  EFLAGS: 00010296
> > > > [ 2482.517817] RAX: 0000000000000088 RBX: ffff880117fb0800 RCX: 0000000000000000
> > > > [ 2482.528447] RDX: 0000000000000088 RSI: ffff880184cd03c8 RDI: ffff880184cd03c8
> > > > [ 2482.539085] RBP: ffffc90001fcfc00 R08: 00000000000006a8 R09: ffffffff81ea7359
> > > > [ 2482.549717] R10: ffff880180406f80 R11: 00000000000006a8 R12: ffff880147258cc0
> > > > [ 2482.560350] R13: ffffc90001fcfc20 R14: ffffffff81d13440 R15: 0000000000000000
> > > > [ 2482.570993] FS:  0000000000000000(0000) GS:ffff880184cc0000(0000) knlGS:0000000000000000
> > > > [ 2482.581646] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 2482.587039] CR2: 00007f5b17f032b0 CR3: 0000000001c08000 CR4: 0000000000042660
> > > > [ 2482.597675] Stack:
> > > > [ 2482.602958]  ffff880146458fd0 000000000000fff8 0000000000000ec0 ffff88017f3f0000
> > > > [ 2482.613619]  ffffffff815efa62 ffffffff816e4f78 ffff880117fb0800 ffffc90001fcfc20
> > > > [ 2482.624288]  ffff880147258cc0 ffff88017f3f0000 ffff880146502000 ffffc90001fcfc68
> > > > [ 2482.634955] Call Trace:
> > > > [ 2482.640254]  [<ffffffff815efa62>] ? skb_put+0x42/0x50
> > > > [ 2482.645633]  [<ffffffff816e4f78>] ? ip6_mc_hdr.constprop.36+0x58/0xd0
> > > > [ 2482.651045]  [<ffffffff816e511a>] ? mld_newpack+0x12a/0x1e0
> > > > [ 2482.656421]  [<ffffffff816e5257>] ? add_grhead.isra.28+0x87/0xa0
> > > > [ 2482.661825]  [<ffffffff816e60d6>] ? add_grec+0x446/0x4c0
> > > > [ 2482.667198]  [<ffffffff8108b06b>] ? __local_bh_enable_ip+0x1b/0xb0
> > > > [ 2482.672609]  [<ffffffff816e6328>] ? mld_send_initial_cr.part.29+0x58/0xa0
> > > > [ 2482.678022]  [<ffffffff816e83d6>] ? ipv6_mc_dad_complete+0x26/0x60
> > > > [ 2482.683441]  [<ffffffff816cc2cf>] ? addrconf_dad_completed+0x29f/0x2c0
> > > > [ 2482.688850]  [<ffffffff816e6b84>] ? ipv6_dev_mc_inc+0x194/0x2c0
> > > > [ 2482.694249]  [<ffffffff816cc3ee>] ? addrconf_dad_work+0xfe/0x3d0
> > > > [ 2482.699650]  [<ffffffff817484ed>] ? _raw_spin_unlock_irq+0xd/0x20
> > > > [ 2482.705052]  [<ffffffff8109de12>] ? process_one_work+0x142/0x3e0
> > > > [ 2482.710453]  [<ffffffff8109e112>] ? worker_thread+0x62/0x480
> > > > [ 2482.715848]  [<ffffffff8109e0b0>] ? process_one_work+0x3e0/0x3e0
> > > > [ 2482.721256]  [<ffffffff810a3472>] ? kthread+0xe2/0x100
> > > > [ 2482.726621]  [<ffffffff81028701>] ? __switch_to+0x261/0x6b0
> > > > [ 2482.732006]  [<ffffffff810a3390>] ? kthread_park+0x60/0x60
> > > > [ 2482.737379]  [<ffffffff81748c37>] ? ret_from_fork+0x57/0x70
> > > > [ 2482.742761] Code: 00 00 48 89 44 24 10 8b 87 b0 00 00 00 48 89 44 24 08 48 8b 87 c0 00 00 00 48 c7 c7 50 8e a2 81 48 89 04 24 31 c0 e8 b5 07 b6 ff <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 
> > > > [ 2482.759199] RIP  [<ffffffff815ed71f>] skb_panic+0x5f/0x70
> > > > [ 2482.764672]  RSP <ffffc90001fcfba8>
> > > > [ 2482.771186] ---[ end trace 6d0fe52ed049d841 ]---
> > > > [ 2482.776641] Kernel panic - not syncing: Fatal exception in interrupt
> > > > [ 2482.861621] Kernel Offset: disabled
> > > > 
> > > > I circumvented the bug by applying this patch:
> > > > diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> > > > index 21f6deb2aec9..2762c3dcc883 100644
> > > > --- a/net/ipv6/mcast.c
> > > > +++ b/net/ipv6/mcast.c
> > > > @@ -1605,8 +1605,6 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
> > > >                      IPV6_TLV_PADN, 0 };
> > > > 
> > > >         /* we assume size > sizeof(ra) here */
> > > > -       /* limit our allocations to order-0 page */
> > > > -       size = min_t(int, size, SKB_MAX_ORDER(0, 0));
> > > >         skb = sock_alloc_send_skb(sk, size, 1, &err);
> > > > 
> > > >         if (!skb)
> > > > 
> > > > The lines are introduced by commit 72e09ad107e78d69ff4d3b97a69f0aad2b77280f
> > > > stating that "order-2 GRP_ATOMIC allocations are very unreliable"
> > > > I then wonder if this statement is still relevant, or if such a patch
> > > > would be acceptable to you.
> > > 
> > > Thanks for the report, but this patch is not correct.
> > > 
> > > I rather suspect commit 1837b2e2bcd23137766555a63867e649c0b637f0 ("mld, igmp: Fix reserved tailroom calculation")
> > > is the problem.
> > > 
> > 
> > If I follow the code here, skb_over_panic is triggered if
> > skb->tail > skb->end, with the order-0 page limitation, and mtu 9000,
> > we can assume (in the case of PAGE_SIZE=4k) skb->end to be 4k.
> > 
> > Tail is set to 0 by the alloc, and then to LL_RESERVED_SPACE of the net
> > device by the skb_reserve. We then only change it during the skb_put
> > leading to the skb_over_panic by setting it to
> > LL_RESERVED_SPACE(dev) + sizeof(struct ipv6hdr)
> > So I don't see what is the role of the commit you suspect here, the only
> > cause I can see here is that the needed space is just larger than a 4k
> > page.
> > If you can explain to me why you think the reserved tailroom calculation
> > has an impact on this skb_put, I might be able to look at it more in
> > depth
> 
> Indeed, commit 1837b2e2bcd2 ("mld, igmp: Fix reserved tailroom
> calculation", v4.5) only changes the assignment of
> skb->reserved_tailroom which has no impact on the failure of the
> skb_put().
> 
> Given the panic splat that Nicolas posted, it looks like vif1.1 has
> LL_RESERVED_SPACE() === 65488, which is quite a lot. What kind of
> interface is vif1.1?
> 
> > 
> > The patch I posted is of course a bit harsh and I do not really expect
> > it to be the right solution, but unless I set a lesser MTU (<PAGE_SIZE),
> > I don't really see any way to get it to work with the order 0 page
> > limitation
> 
> It's not the right solution because high order atomic allocations are
> especially unreliable; reclaim and compaction are not possible at that
> time. igmp_sk has sk_allocation = GFP_ATOMIC.
> 
> Maybe there is some problem around the needed_headroom manipulations
> done in br_add_if().
> 
> Can you try again without your changes but with the following extra
> debugging prints and post the output (just the extra bridge messages and
> the skb_over_panic line are enough I think).
> 
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 9b46d2dc4c22..a911e7653bc7 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -641,11 +641,15 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  	netdev_update_features(br->dev);
>  
>  	br_hr = br->dev->needed_headroom;
> +	br_info(br, "before: n_h %u port %s n_h %u\n", br_hr, dev->name,
> +		dev->needed_headroom);
>  	dev_hr = netdev_get_fwd_headroom(dev);
>  	if (br_hr < dev_hr)
>  		update_headroom(br, dev_hr);
>  	else
>  		netdev_set_rx_headroom(dev, br_hr);
> +	br_info(br, "after: n_h %u port %s n_h %u\n",
> +		br->dev->needed_headroom, dev->name, dev->needed_headroom);
>  
>  	if (br_fdb_insert(br, p, dev->dev_addr, 0))
>  		netdev_err(dev, "failed insert local address bridge forwarding table\n");
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 946de0e24c87..9b26e9201a92 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -97,10 +97,12 @@ EXPORT_SYMBOL(sysctl_max_skb_frags);
>  static void skb_panic(struct sk_buff *skb, unsigned int sz, void *addr,
>  		      const char msg[])
>  {
> -	pr_emerg("%s: text:%p len:%d put:%d head:%p data:%p tail:%#lx end:%#lx dev:%s\n",
> +	pr_emerg("%s: text:%p len:%d put:%d head:%p data:%p tail:%#lx end:%#lx dev:%s hhl:%u n_h:%u\n",
>  		 msg, addr, skb->len, sz, skb->head, skb->data,
>  		 (unsigned long)skb->tail, (unsigned long)skb->end,
> -		 skb->dev ? skb->dev->name : "<NULL>");
> +		 skb->dev ? skb->dev->name : "<NULL>",
> +		 skb->dev ? skb->dev->hard_header_len : 0,
> +		 skb->dev ? skb->dev->needed_headroom : 0);
>  	BUG();
>  }
>  

Thanks for your help, using your debug patch I got the value of needed_headroom:
USHRT_MAX - 64
And tracked it down to a legacy out of tree patch of ours I then fixed.
The patch was increasing/decreasing the needed_headroom without checking
for bounds...

Sorry for the noise then.

Regards,

Nicolas