lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 20 Apr 2012 18:49:27 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Basil Gor <basil.gor@...il.com>
Cc:	netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH] macvlan/macvtap: Fix vlan tagging on user read

Basil Gor <basil.gor@...il.com> writes:

> I did some additional code review, and it's easier to show on stack traces and
> by comparing macvtap with tun/tap driver.
>
> tun/tap device does not need to care about vlan tag stuff, as it gets skb with
> vlan id in the header and vlan_tci is not used.
>
> [97493.070321] tun_net_xmit devname vnet0 vlan_tci 0 vlan 0 proto 8100 len 64
> [97493.070327] Pid: 0, comm: swapper/2 Tainted: G           O 3.3.1-3.fc16.x86_64 #1
> [97493.070331] Call Trace:
> [97493.070334]  <IRQ>  [<ffffffffa02c6827>] tun_net_xmit+0x47/0x260 [tun]
> [97493.070347]  [<ffffffff814e8072>] dev_hard_start_xmit+0x332/0x6d0 <------ __vlan_put_tag is called
> [97493.070355]  [<ffffffff81503f5a>] sch_direct_xmit+0xfa/0x1d0
> [97493.070364]  [<ffffffff814e85b5>] dev_queue_xmit+0x1a5/0x640
> [97493.070377]  [<ffffffffa057c180>] ? br_flood+0xc0/0xc0 [bridge]
> [97493.070395]  [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070409]  [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070423]  [<ffffffffa057c1ec>] br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
> [97493.070438]  [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070457]  [<ffffffffa057c242>] br_forward_finish+0x22/0x60 [bridge]
> [97493.070471]  [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070485]  [<ffffffffa057c3dd>] __br_forward+0x5d/0xb0 [bridge]
> [97493.070495]  [<ffffffff814dafe4>] ? skb_clone+0x54/0xb0
> [97493.070508]  [<ffffffffa057bf1e>] deliver_clone+0x3e/0x60 [bridge]
> [97493.070523]  [<ffffffffa057c143>] br_flood+0x83/0xc0 [bridge]
> [97493.070534]  [<ffffffffa057c525>] br_flood_forward+0x15/0x20 [bridge]
> [97493.070544]  [<ffffffffa057d256>] br_handle_frame_finish+0x246/0x2a0 [bridge]
> [97493.070555]  [<ffffffffa057d444>] br_handle_frame+0x194/0x260 [bridge]
> [97493.070567]  [<ffffffffa057d2b0>] ? br_handle_frame_finish+0x2a0/0x2a0 [bridge]
> [97493.070581]  [<ffffffff814e56de>] __netif_receive_skb+0x1be/0x5c0  <------ vlan_untag is called
> [97493.070594]  [<ffffffff81088ba2>] ? default_wake_function+0x12/0x20
> [97493.070604]  [<ffffffff814e5ef1>] process_backlog+0xb1/0x170
> [97493.070613]  [<ffffffff814e718b>] net_rx_action+0x12b/0x270
> [97493.070623]  [<ffffffff8108daed>] ? sched_clock_cpu+0xbd/0x110
> [97493.070633]  [<ffffffff8105efb8>] __do_softirq+0xb8/0x230
> [97493.070644]  [<ffffffff810e3c30>] ? handle_irq_event+0x50/0x70
> [97493.070654]  [<ffffffff815fd49c>] call_softirq+0x1c/0x30
> [97493.070662]  [<ffffffff81016455>] do_softirq+0x65/0xa0
> [97493.070668]  [<ffffffff8105f3ce>] irq_exit+0x9e/0xc0
> [97493.070675]  [<ffffffff815fdd03>] do_IRQ+0x63/0xe0
> [97493.070682]  [<ffffffff815f42ae>] common_interrupt+0x6e/0x6e
> [97493.070686]  <EOI>  [<ffffffff8131b236>] ? intel_idle+0xe6/0x150
> [97493.070697]  [<ffffffff8131b218>] ? intel_idle+0xc8/0x150
> [97493.070705]  [<ffffffff814a4071>] cpuidle_idle_call+0xc1/0x280
> [97493.070713]  [<ffffffff8101322f>] cpu_idle+0xcf/0x120
> [97493.070720]  [<ffffffff815e3b1f>] start_secondary+0x282/0x284
>
> but macvtap device gets skb with vlan tag extacted in vlan_tci, and as original driver
> code was mostly based on tun/tap driver vlan thing was missed.
>
> [98143.863560] macvtap_receive devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
> [98143.863570] macvtap_forward devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
> [98143.863578] Pid: 0, comm: swapper/2 Tainted: G           O 3.3.1-3.fc16.x86_64 #1
> [98143.863583] Call Trace:
> [98143.863587]  <IRQ>  [<ffffffffa026819c>] macvtap_forward+0x8c/0x1b0 [macvtap]
> [98143.863606]  [<ffffffffa0268314>] macvtap_receive+0x54/0x60 [macvtap]
> [98143.863623]  [<ffffffffa02c05db>] macvlan_handle_frame+0xbb/0x2c0 [macvlan]
> [98143.863635]  [<ffffffffa02c0520>] ? macvlan_broadcast+0x160/0x160 [macvlan]
> [98143.863646]  [<ffffffff814e56de>] __netif_receive_skb+0x1be/0x5c0   <------ vlan_untag is called
> [98143.863653]  [<ffffffff814e67e3>] netif_receive_skb+0x23/0x90
> [98143.863660]  [<ffffffff814e6c09>] ? dev_gro_receive+0x1b9/0x2b0
> [98143.863667]  [<ffffffff814e6950>] napi_skb_finish+0x50/0x70
> [98143.863673]  [<ffffffff814e6f45>] napi_gro_receive+0xf5/0x140
> [98143.863697]  [<ffffffffa0241fab>] e1000_receive_skb+0x5b/0x70 [e1000e] 
Actually                                  __vlan_hwaccel_put_tag  is called here not vlan_untag
> [98143.863718]  [<ffffffffa0244b21>] e1000_clean_rx_irq+0x2f1/0x400 [e1000e]
> [98143.863737]  [<ffffffffa02432e8>] e1000_clean+0x78/0x2c0 [e1000e]
> [98143.863745]  [<ffffffff814e718b>] net_rx_action+0x12b/0x270
> [98143.863752]  [<ffffffff8108daed>] ? sched_clock_cpu+0xbd/0x110
> [98143.863759]  [<ffffffff8105efb8>] __do_softirq+0xb8/0x230
> [98143.863767]  [<ffffffff810e3c30>] ? handle_irq_event+0x50/0x70
> [98143.863775]  [<ffffffff815fd49c>] call_softirq+0x1c/0x30
> [98143.863782]  [<ffffffff81016455>] do_softirq+0x65/0xa0
> [98143.863788]  [<ffffffff8105f3ce>] irq_exit+0x9e/0xc0
> [98143.863796]  [<ffffffff815fdd03>] do_IRQ+0x63/0xe0
> [98143.863803]  [<ffffffff815f42ae>] common_interrupt+0x6e/0x6e
> [98143.863807]  <EOI>  [<ffffffff8131b236>] ? intel_idle+0xe6/0x150
> [98143.863818]  [<ffffffff8131b218>] ? intel_idle+0xc8/0x150
> [98143.863826]  [<ffffffff814a4071>] cpuidle_idle_call+0xc1/0x280
> [98143.863834]  [<ffffffff8101322f>] cpu_idle+0xcf/0x120
> [98143.863841]  [<ffffffff815e3b1f>] start_secondary+0x282/0x284
>
> and as Eric Biederman noted, why not add vlan header back at the last moment? in
> macvtap_put_user. And it would work for user space applications which read
> /dev/tapX, but in kvm case actual reading is done by vhost_net driver. And this
> driver actually does skb_peek on macvtap queue to get next packet size before
> reading (in handle_rx).
>
> [98143.863878] vhost peek_head_len devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
>
> so, it gets skb len without vlan tag and then performs read with buffer smaller then needed
>
> [98143.863885] macvtap_do_read buflen 102 <--- 90 + (vnet_hdr_sz 12 bytes)
> [98143.863889]  devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
> __vlan_put_tag is called here
> [98143.863894] macvtap_do_read reallen 106 <--- 90 + 4 + (vnet_hdr_sz 12)
> [98143.863898]  devname macvtap0 vlan_tci 0 vlan 0 proto 8100 len 94
> [98143.863904] Pid: 7289, comm: vhost-7236 Tainted: G           O 3.3.1-3.fc16.x86_64 #1
> [98143.863935] Call Trace:
> [98143.863944]  [<ffffffffa0268593>] macvtap_do_read+0x243/0x420 [macvtap]
> [98143.863954]  [<ffffffff81088b90>] ? try_to_wake_up+0x2b0/0x2b0
> [98143.863962]  [<ffffffffa02687ba>] macvtap_recvmsg+0x4a/0x70 [macvtap]
> [98143.863971]  [<ffffffffa02e149e>] handle_rx+0x39e/0x6e0 [vhost_net]
> [98143.863983]  [<ffffffffa02e17f5>] handle_rx_net+0x15/0x20 [vhost_net]
> [98143.863996]  [<ffffffffa02de84c>] vhost_worker+0xcc/0x150 [vhost_net]
> [98143.864008]  [<ffffffffa02de780>] ? __vhost_add_used_n+0x110/0x110 [vhost_net]
> [98143.864020]  [<ffffffff81079af3>] kthread+0x93/0xa0
> [98143.864032]  [<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
> [98143.864044]  [<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
> [98143.864056]  [<ffffffff815fd3a0>] ? gs_change+0x13/0x13
>
> things get more interesting when we take another case in account. When one kvm guest sends
> packet on the same macvlan to another guest macvtap gets skb with vlan id in the header
> and vlan_tci is not used.
>
> [99564.523943] macvtap_forward devname (null) vlan_tci 0 vlan 0 proto 8100 len 94
> [99564.523946] Pid: 8849, comm: vhost-8797 Tainted: G           O 3.3.1-3.fc16.x86_64 #1
> [99564.523947] Call Trace:
> [99564.523952]  [<ffffffffa02de19c>] macvtap_forward+0x8c/0x1b0 [macvtap]
> [99564.523963]  [<ffffffffa02c0502>] macvlan_broadcast+0x142/0x160 [macvlan]
> [99564.523967]  [<ffffffffa02c146d>] macvlan_start_xmit+0x14d/0x178 [macvlan]
> [99564.523969]  [<ffffffffa02df378>] macvtap_get_user+0x388/0x420 [macvtap]
> [99564.523971]  [<ffffffffa02df43b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
> [99564.523973]  [<ffffffffa026bb3d>] handle_tx+0x2dd/0x620 [vhost_net]
> [99564.523976]  [<ffffffffa026beb5>] handle_tx_kick+0x15/0x20 [vhost_net]
> [99564.523978]  [<ffffffffa026884c>] vhost_worker+0xcc/0x150 [vhost_net]
> [99564.523980]  [<ffffffffa0268780>] ? __vhost_add_used_n+0x110/0x110 [vhost_net]
> [99564.523984]  [<ffffffff81079af3>] kthread+0x93/0xa0
> [99564.523987]  [<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
> [99564.523989]  [<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
> [99564.523991]  [<ffffffff815fd3a0>] ? gs_change+0x13/0x13
> [99564.523999] vhost peek_head_len devname (null) vlan_tci 0 vlan 0 proto 8100 len 94
> [99564.524003] macvtap_do_read buflen 106
> [99564.524004] macvtap_do_read reallen 106
>
> And we definitely want to have common rules for all cases. So we either
> 1. restore vlan headers from vlan_tci for any packets coming outside of macvlan in
> macvtap_receive and we don't need to fix vhost_net and we preserve same vlan id
> policy that tun/tap driver have. (my original patch)
> or

> 2. we extract vlan ids for packets coming from the same macvlan, fixing vhost_net to
> take vlan_tci into account and restoring vlan headers on
> macvtap_put_user

And 2 seems to be the right answer.

Not long ago we went a few rounds with this with the core parts of the
networking stack and the rule that evolved was that we always strip the
vlan tag.  Which is why we have the vlan_untag call in __netif_receive_skb()
to handle the small handful of networking devices that don't.

vhost/net.c is just buggy.  PF_PACKET sockets have been returning
the vlan_id in ancillary data from year hardware like the e1000 since
at least 2005.

So I agree that both macvtap and vhost/net.c both need to be fixed.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists