netdev - Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58900535.2050709@fb.com>
Date:   Mon, 30 Jan 2017 19:32:05 -0800
From:   Alexei Starovoitov <ast@...com>
To:     Saeed Mahameed <saeedm@....mellanox.co.il>
CC:     Tom Herbert <tom@...bertland.com>,
        Or Gerlitz <gerlitz.or@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        David Miller <davem@...emloft.net>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

On 1/30/17 1:18 PM, Saeed Mahameed wrote:
> On Mon, Jan 30, 2017 at 6:45 PM, Alexei Starovoitov <ast@...com> wrote:
>> On 1/29/17 1:11 AM, Saeed Mahameed wrote:
>>>
>>>
>>> ConnectX4/5 and hopefully so on .. provide three different isolated
>>> steering layers:
>>>
>>> 3. vport layer: avaialbe for any PF/VF vport nic driver instance
>>> (netdevice), it allows vlan/mac filtering
>>>    ,RSS hashing and n-tuple steering (for both encapsulated and
>>> nonencapsulated traffic) and RFS steering. ( the code above only
>>> writes flow entries of a PF/VF to its own vport flow tables, there is
>>> another mechanism to propagate l2 steering rules down to eswitch from
>>> the vport layer.
>>>
>>> 2. eswitch layer: Available for PFs only with
>>> HCA_CAP.vport_group_manager capability set.
>>> it allows steering between PF and different VFs on the same host (vlan
>>> mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
>>> steering and offloads for switchdev mode - eswitch_offloads.c - )
>>> if this table is not create the default is pass-throu traffic to PF
>>>
>>> 1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
>>> capability set.
>>> needed for MH configurations and only PF is allowed and should write
>>> "request UC MAC - set_l2_table_entry" on behalf of the PF itself and
>>> it's own VFs.
>>>
>>> - On a bare metal machine only layer 3 is required (all traffic is
>>> passed to the PF vport).
>>> - On a MH configuration layer 3 and 1 are required.
>>> - On a SRIOV configuration layer 3 and 2 are required.
>>> - On MH with SRIOV all layers are required.
>>>
>>> in the driver, eswitch and L2 layers are handled by PF@...itch.c.
>>>
>>> So for your question:
>>>
>>> PF always init_eswitch ( no eswitch -sriov- tables are created), and
>>> the eswitch will start listening for vport_change_events.
>>>
>>> A PF/VF or netdev vport instance on any steering changes updates
>>> should call  mlx5e_vport_context_update[1]
>>>
>>> vport_context_update is A FW command that will store the current
>>> UC/MC/VLAN list and promiscuity info of a vport.
>>>
>>> The FW will generate an event to the PF driver eswitch manager (vport
>>> manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
>>> set_l2_table_entry for each UC mac on each vport change event of any
>>> vport (including its own vport), in case of SRIOV is enabled it will
>>> update eswitch tables as well.
>>>
>>> To simplify my answer the function calls are:
>>> Vport VF/PF netdevice:
>>> mlx5e_set_rx_mode_work
>>>       mlx5e_vport_context_update
>>>          mlx5e_vport_context_update_addr_list  --> FW event will be
>>> generated to the PF esiwtch manager
>>>
>>> PF eswitch manager(eswitch.c) on a vport change FW event:
>>> mlx5_eswitch_vport_event
>>>         esw_vport_change_handler
>>>              esw_vport_change_handle_locked
>>>                      esw_apply_vport_addr_list
>>>                                 esw_add_uc_addr
>>>                                        set_l2_table_entry --> this will
>>> update the l2 table in case MH is enabled.
>>
>>
>> all makes sense. To test this logic I added printk-s
>> to above functions, but I only see:
>> # ip link set eth0 addr 24:8a:07:47:2b:6e
>> [  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
>> [  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0
>>
>> MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().
>
> Strange, just double checked and i got those events on latest net-next
> bare-metal box.
>
>> Yet nic seems to work fine. Packets come and go.
>>
>
> Is it multi host configuration or bare metal ?

multihost

> Do you have internal loopback traffic between different hosts ?

in a multihost? how can I check that?
Is there an ethtool command?

>> broken firmware or expected behavior?
>
> which driver did you test ? backported or net-next ?

both backported and net-next with Tom's patches.

> if it is backported driver please verify that on driver load the
> following occurs :
>
> 1. VPORTS change events are globally enabled:
> in mlx5_start_eqs@...c:
> async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);

this one is done.

> 2. UC address change events are enabled for vport 0 (PF):
> In eswitch_attach or on eswitch_init (depends on the kernel version) @eswitch.c
> esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.

this one is not. Tom's proposal to compile out eswitch.c
removes invocation of mlx5_eswitch_attach() and
corresponding esw_enable_vport() call as well.
The question is why is it necessary?
What will break if it's not done?
so far we don't see any adverse effects in both multihost
and baremetal setups.

> BTW folks, i am going to be on vacation for the rest of the week, so
> please expect slow responses.

have a great time off. I hope other mlx folks can answer.