netdev - Re: Latest net-next kernel 4.19.0+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 31 Oct 2018 22:22:39 +0100
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Saeed Mahameed <saeedm@...lanox.com>,
        "eric.dumazet@...il.com" <eric.dumazet@...il.com>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "dmichail@...gle.com" <dmichail@...gle.com>
Subject: Re: Latest net-next kernel 4.19.0+



W dniu 31.10.2018 o 22:05, Saeed Mahameed pisze:
> On Tue, 2018-10-30 at 10:32 -0700, Cong Wang wrote:
>> On Tue, Oct 30, 2018 at 7:16 AM Eric Dumazet <eric.dumazet@...il.com>
>> wrote:
>>>
>>>
>>> On 10/30/2018 01:09 AM, Paweł Staszewski wrote:
>>>>
>>>> W dniu 30.10.2018 o 08:29, Eric Dumazet pisze:
>>>>> On 10/29/2018 11:09 PM, Dimitris Michailidis wrote:
>>>>>
>>>>>> Indeed this is a bug. I would expect it to produce frequent
>>>>>> errors
>>>>>> though as many odd-length
>>>>>> packets would trigger it. Do you have RXFCS? Regardless, how
>>>>>> frequently do you see the problem?
>>>>>>
>>>>> Old kernels (before 88078d98d1bb) were simply resetting
>>>>> ip_summed to CHECKSUM_NONE
>>>>>
>>>>> And before your fix (commit d55bef5059dd057bd), mlx5 bug was
>>>>> canceling the bug you fixed.
>>>>>
>>>>> So we now need to also fix mlx5.
>>>>>
>>>>> And of course use skb_header_pointer() in mlx5e_get_fcs() as I
>>>>> mentioned earlier,
>>>>> plus __get_unaligned_cpu32() as you hinted.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> No RXFCS
>>
>> Same with Pawel, RXFCS is disabled by default.
>>
>>
>>>> And this trace is rly frequently like once per 3/4 seconds
>>>> like below:
>>>> [28965.776864] vlan1490: hw csum failure
>>> Might be vlan related.
> Hi Pawel, is the vlan stripping offload disabled or enabled in your
> case ?
>
> To verify:
> ethtool -k <interface> | grep rx-vlan-offload
> rx-vlan-offload: on
> To set:
> ethtool -K <interface> rxvlan on/off
Enabled:
ethtool -k enp175s0f0
Features for enp175s0f0:
rx-checksumming: on
tx-checksumming: on
         tx-checksum-ipv4: on
         tx-checksum-ip-generic: off [fixed]
         tx-checksum-ipv6: on
         tx-checksum-fcoe-crc: off [fixed]
         tx-checksum-sctp: off [fixed]
scatter-gather: on
         tx-scatter-gather: on
         tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
         tx-tcp-segmentation: on
         tx-tcp-ecn-segmentation: off [fixed]
         tx-tcp-mangleid-segmentation: off
         tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


>
> if the vlan offload is off then it will trigger the mlx5e vlan csum
> adjustment code pointed out by Eric.
>
> Anyhow, it should work in both cases, but i am trying to narrow down
> the possibilities.
>
> Also could it be a double tagged packet ?
no double tagged packets there


>
>
>> Unlike Pawel's case, we don't use vlan at all, maybe this is why we
>> see
>> it much less frequently than Pawel.
>>
>> Also, it is probably not specific to mlx5, as there is another report
>> which
>> is probably a non-mlx5 driver.
>>
> Cong, How often does this happen ? can you some how verify if the
> problematic packet has extra end padding after the ip payload ?
>
> It would be cool if we had a feature in kernel to store such SKB in
> memory when such issue occurs, and let the user dump it later (via
> tcpdump) and send the dump to the vendor for debug so we could just
> replay and see what happens.
>
>> Thanks.