linux-kernel - Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190908101527.GA20499@splinter>
Date:   Sun, 8 Sep 2019 13:15:27 +0300
From:   Ido Schimmel <idosch@...sch.org>
To:     "Allan W. Nielsen" <allan.nielsen@...rochip.com>
Cc:     Jiri Pirko <jiri@...nulli.us>, David Miller <davem@...emloft.net>,
        andrew@...n.ch, horatiu.vultur@...rochip.com,
        alexandre.belloni@...tlin.com, UNGLinuxDriver@...rochip.com,
        ivecera@...hat.com, f.fainelli@...il.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

On Tue, Sep 03, 2019 at 10:14:12AM +0200, Allan W. Nielsen wrote:
> The 09/03/2019 09:13, Ido Schimmel wrote:
> > On Mon, Sep 02, 2019 at 07:42:31PM +0200, Allan W. Nielsen wrote:
> > With these patches applied I assume I will see the following traffic
> > when running tcpdump on one of the netdevs exposed by the ocelot driver:
> > 
> > - Ingress: All
> > - Egress: Only locally generated traffic and traffic forwarded by the
> >   kernel from interfaces not belonging to the ocelot driver
> > 
> > The above means I will not see any offloaded traffic transmitted by the
> > port. Is that correct?
> Correct - but maybe we should change this.
> 
> In Ocelot and in LANxxxx (the part we are working on now), we can come pretty
> close. We can get the offloaded TX traffic to the CPU, but it will not be
> re-written (it will look like the ingress frame, which is not always the same as
> the egress frame, vlan tags an others may be re-written).

Yes, this is the same with mlxsw. You can trap the egress frames, but
they will reach the CPU unmodified via the ingress port.

> In some of our chips we can actually do this (not Ocelot, and not the LANxxxx
> part we are working on now) after the frame as been re-written.

Cool.

> > I see that the driver is setting 'offload_fwd_mark' for any traffic trapped
> > from bridged ports, which means the bridge will drop it before it traverses
> > the packet taps on egress.
> Correct.
> 
> > Large parts of the discussion revolve around the fact that switch ports
> > are not any different than other ports. Dave wrote "Please stop
> > portraying switches as special in this regard" and Andrew wrote "[The
> > user] just wants tcpdump to work like on their desktop."
> And we are trying to get as close to this as practical possible, knowing that it
> may not be exactly the same.
> 
> > But if anything, this discussion proves that switch ports are special in
> > this regard and that tcpdump will not work like on the desktop.
> I think it can come really close. Some drivers may be able to fix the TX issue
> you point out, others may not.
> 
> > Beside the fact that I don't agree (but gave up) with the new
> > interpretation of promisc mode, I wonder if we're not asking for trouble
> > with this patchset. Users will see all offloaded traffic on ingress, but
> > none of it on egress. This is in contrast to the sever/desktop, where
> > Linux is much more dominant in comparison to switches (let alone hw
> > accelerated ones) and where all the traffic is visible through tcpdump.
> > I can already see myself having to explain this over and over again to
> > confused users.
> > 
> > Now, I understand that showing egress traffic is inherently difficult.
> > It means one of two things:
> > 
> > 1. We allow packets to be forwarded by both the software and the
> > hardware
> > 2. We trap all ingressing traffic from all the ports
> If the HW cannot copy the egress traffic to the CPU (which our HW cannot), then
> you need to do both. All ingress traffic needs to go to the CPU, you need to
> make all the forwarding decisions in the CPU, to figure out what traffic happens
> to go to the port you want to monitor.
> 
> I really doubt this will work in real life. Too much traffic, and HW may make
> different forwarding decision that the SW (tc rules in HW but not in SW), which
> means that it will not be good for debugging anyway.

I agree.

> 
> > Both options can have devastating effects on the network and therefore
> > should not be triggered by a supposedly innocent invocation of tcpdump.
> Agree.
> 
> > I again wonder if it would not be wiser to solve this by introducing two
> > new flags to tcpdump for ingress/egress (similar to -Q in/out) capturing
> > of offloaded traffic. The capturing of egress offloaded traffic can be
> > documented with the appropriate warnings.
> Not sure I agree, but I will try to spend some more time considering it.
> 
> In the mean while, what TC action was it that Jiri suggestion we should use? The
> trap action is no good, and it prevents the forwarding in silicon, and I'm not
> aware of a "COPY-TO-CPU" action.

I agree. We would either need a new or just extend the existing one with
a new attribute.

> > Anyway, I don't want to hold you up, I merely want to make sure that the
> > above (assuming it's correct) is considered before the patches are
> > applied.
> Sounds good, and thanks for all the time spend on reviewing and asking the
> critical questions.

Thanks for bringing up these issues. I will be happy to review future
patches.