netdev - Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 3 Mar 2016 16:44:34 -0800
From:	Cong Wang <xiyou.wangcong@...il.com>
To:	Mahesh Bandewar <maheshb@...gle.com>
Cc:	Mahesh Bandewar <mahesh@...dewar.net>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	netdev <netdev@...r.kernel.org>, Tim Hockin <thockin@...gle.com>,
	Alex Pollitt <alex.pollitt@...aswitch.com>,
	Matthew Dupre <matthew.dupre@...aswitch.com>
Subject: Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing

On Thu, Mar 3, 2016 at 3:21 PM, Mahesh Bandewar <maheshb@...gle.com> wrote:
>
>
> On Thu, Mar 3, 2016 at 1:43 PM, Mahesh Bandewar <maheshb@...gle.com> wrote:
>>
>> On Wed, Mar 2, 2016 at 8:45 PM, Cong Wang <xiyou.wangcong@...il.com>
>> wrote:
>> >
>> > On Mon, Feb 29, 2016 at 2:08 PM, Mahesh Bandewar <mahesh@...dewar.net>
>> > wrote:
>> > > From: Mahesh Bandewar <maheshb@...gle.com>
>> > >
>> > > netif_receive_skb_core() dispatcher uses skb->dev device to send it
>> > > to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet
>> > > handlers intern use the device passed to determine the net-ns to
>> > > further process these packets.  Now with the nomination logic, the
>> > > dispatcher will call netif_get_l3_dev() helper to select the device
>> > > to be used for this processing. Since l3_dev is initialized to self,
>> > > normal packet processing should not change.
>> > >
>> >
>> > So, if I understand your patches correctly, _logically_ the skb is still
>> > passed into the slave's netns via dev_forward_skb() but now goes over
>> > the iptable rules from the default netns by only changing the netns
>> > parameter to these hooks?
>> >
>> We are using different dev pointer for L3 processing than skb->dev. All
>> netns, routing etc, associated with this dev (l3_dev) should be used for
>> L3.
>>
>> > That is ugly... Logically, you should still need to continue to pass
>> > the skb upper to the stack in default netns until
>> > ip_local_deliver_finish().
>> >
>> >
>> > So, how about adding an iptable hook in ipvlan so that skb will
>> > continue traverse in the original stack and then moved into slave's
>> > netns? This might be harder since logically we need an L3 entrance
>> > to the stack.
>> >
>> > Thoughts?
>>
>> As you mentioned logically we should be able to pass the skb in master's
>> ns
>> until L3 processing is completed. This patch series attempts to do that by
>> disassociating this logic from skb->dev and adding it to l3_dev. This
>> should
>> include not just IPT but all that is done in L3 phase (IPT, routing etc.)
>> Also since dev->l3_dev is same as dev, this should not break any existing
>> logic.
>>
> Well, looking at the code I realized that I missed few places (especially
> routing
> logic) which continues using skb->dev in ingress path and should be
> corrected to
> use l3_dev. I'll correct those places and send the next version.


Look, even you yourself are missing something here. ;) This is exactly why
I suggest to consider another approach. Please don't introduce any code
that is hard to debug even for yourself. The struct net pointer is passed
around in kernel network subsystem almost everywhere, it is not easy to make
it bug-free by just switching skb->dev.


>> That's the generic implementation as far as the stack is concerned and
>> IPvlan
>> uses it to make the IPT hooks symmetric.
>>
>> Another IPT hook may be good enough (however I haven't
>> given much thought to it) for IPvlan, but this generic approach will be
>> for
>> whole of L3. Also currently this I have implemented for the ingress path
>> but that does not mean the same cannot be extended for the egress path
>> (in fact I'm thinking about that)
>

This is logically correct and easier to understand or debug, since IPT hook
is very common in network subsystem even ingress qdisc uses it.