lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 23 Dec 2020 13:29:26 +0800
From:   John Wang <wangzhiqiang.bj@...edance.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Samuel Mendoza-Jonas <sam@...dozajonas.com>,
        Joel Stanley <joel@....id.au>,
        Lotus Xu <xuxiaohan@...edance.com>,
        郁雷 <yulei.sh@...edance.com>,
        "David S. Miller" <davem@...emloft.net>,
        Gavin Shan <gwshan@...ux.vnet.ibm.com>,
        "open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        Bo Chen <chenbo.gil@...edance.com>
Subject: Re: [External] Re: [PATCH] net/ncsi: Use real net-device for response handler

On Wed, Dec 23, 2020 at 10:25 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote:
> > On Tue, 2020-12-22 at 06:13 +0000, Joel Stanley wrote:
> > > On Sun, 20 Dec 2020 at 12:40, John Wang wrote:
> > > > When aggregating ncsi interfaces and dedicated interfaces to bond
> > > > interfaces, the ncsi response handler will use the wrong net device
> > > > to
> > > > find ncsi_dev, so that the ncsi interface will not work properly.
> > > > Here, we use the net device registered to packet_type to fix it.
> > > >
> > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")
> > > > Signed-off-by: John Wang <wangzhiqiang.bj@...edance.com>
>
> This sounds like exactly the case for which orig_dev was introduced.
> I think you should use the orig_dev argument, rather than pt->dev.

will send a v2

>
> Can you test if that works?

Yes,  it works.

>
> > > Can you show me how to reproduce this?

On g220a, eth1 is the dedicated interface, eth0 is the ncsi interface

kernel cfg:
CONFIG_BONDING=y

cat /etc/systemd/network/00-bmc-bond1.netdev
[NetDev]
Name=bond1
Description=Bond eth0 and eth1
Kind=bond

[Bond]
Mode=active-backup

cat /etc/systemd/network/00-bmc-eth0.network
[Match]
Name=eth0
[Network]
Bond=bond1

cat /etc/systemd/network/00-bmc-eth0.network
[Match]
Name=eth1
[Network]
Bond=bond1
PrimarySlave=true

ip addr
....
6: bond1: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue qlen 1000
    link/ether b4:05:5d:8f:6a:ad brd ff:ff:ff:ff:ff:ff
    inet 169.254.11.178/16 brd 169.254.255.255 scope link bond1
       valid_lft forever preferred_lft forever
    inet 192.168.1.108/24 brd 192.168.1.255 scope global bond1
       valid_lft forever preferred_lft forever
    inet 10.2.16.118/24 brd 10.2.16.255 scope global bond1
       valid_lft forever preferred_lft forever
    inet6 fe80::b605:5dff:fe8f:6aad/64 scope link
...


Without this patch:
After bmc boots:
echo eth0 > /sys/class/net/bond1/bonding/active_slave
admin@...0a:~#
admin@...0a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave
[  105.964357] bond1: (slave eth0): making interface the new active one
admin@...0a:~# ping 10.2.16.1
PING 10.2.16.1 (10.2.16.1): 56 data bytes
64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.096 ms
64 bytes from 10.2.16.1: seq=1 ttl=255 time=2.143 ms
64 bytes from 10.2.16.1: seq=2 ttl=255 time=2.111 ms
[  112.642734] ftgmac100 1e660000.ethernet eth0: NCSI Channel 0 timed out!
64 bytes from 10.2.16.1: seq=3 ttl=255 time=2.039 ms
64 bytes from 10.2.16.1: seq=4 ttl=255 time=2.037 ms
[  117.842814] ftgmac100 1e660000.ethernet eth0: NCSI: No channel with
link found, configuring channel 0
[  134.482746] ftgmac100 1e660000.ethernet eth0: NCSI Channel 0 timed out!
[  139.682820] ftgmac100 1e660000.ethernet eth0: NCSI: No channel with
link found, configuring channel 0

with this patch:
After bmc boots:

admin@...0a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave
[58332.123754] bond1: (slave eth0): making interface the new active one
admin@...0a:~# ping 10.2.16.1
PING 10.2.16.1 (10.2.16.1): 56 data bytes
64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.279 ms
...
...
64 bytes from 10.2.16.1: seq=N ttl=255 time=2.037 ms



> > >
> > > I don't know the ncsi or net code well enough to know if this is the
> > > correct fix. If you are confident it is correct then I have no
> > > objections.
> >
> > This looks like it is probably right; pt->dev will be the original
> > device from ncsi_register_dev(), if a response comes in to
> > ncsi_rcv_rsp() associated with a different device then the driver will
> > fail to find the correct ncsi_dev_priv. An example of the broken case
> > would be good to see though.
>
> From the description sounds like the case is whenever the ncsi
> interface is in a bond, the netdev from the second argument is
> the bond not the interface from which the frame came. It should
> be possible to repro even with only one interface on the system,
> create a bond or a team and add the ncsi interface to it.
>
> Does that make sense? I'm likely missing the subtleties here.

:)  I guess so.

Powered by blists - more mailing lists