linux-kernel - Re: [Intel-wired-lan] [PATCH] ice/ptp: fix the PTP worker retrying indefinitely if the link went down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <423a29e2-886d-2c41-16d4-a8fca5537c2e@intel.com>
Date:   Thu, 19 Jan 2023 11:24:59 -0800
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Daniel Vacek <neelx@...hat.com>
CC:     Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        "Eric Dumazet" <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Richard Cochran <richardcochran@...il.com>,
        "Kolacinski, Karol" <karol.kolacinski@...el.com>,
        Siddaraju <siddaraju.dh@...el.com>,
        "Michalik, Michal" <michal.michalik@...el.com>,
        <netdev@...r.kernel.org>, <intel-wired-lan@...ts.osuosl.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [Intel-wired-lan] [PATCH] ice/ptp: fix the PTP worker retrying
 indefinitely if the link went down



On 1/19/2023 1:38 AM, Daniel Vacek wrote:
> On Wed, Jan 18, 2023 at 11:22 PM Jacob Keller <jacob.e.keller@...el.com> wrote:
>> On 1/18/2023 2:11 PM, Daniel Vacek wrote:
>>> On Wed, Jan 18, 2023 at 9:59 PM Jacob Keller <jacob.e.keller@...el.com> wrote:
>>>> On 1/18/2023 7:14 AM, Daniel Vacek wrote:
>>>> 1) request tx timestamp
>>>> 2) timestamp occurs
>>>> 3) link goes down while processing
>>>
>>> I was thinking this is the case we got reported. But then again, I'm
>>> not really experienced in this field.
>>>
>>
>> I think it might be, or at least something similar to this.
>>
>> I think that can be fixed with the link check you added. I think we
>> actually have a copy of the current link status in the ice_ptp or
>> ice_ptp_tx structure which could be used instead of having to check back
>> to the other structure.
> 
> If you're talking about ptp_port->link_up that one is always false no
> matter the actual NIC link status. First I wanted to use it but
> checking all the 8 devices available in the dump data it just does not
> match the net_dev->state or the port_info->phy.link_info.link_info
> 
> crash> net_device.name,state 0xff48df6f0c553000
>   name = "ens1f1",
>   state = 0x7,    // DOWN
> crash> ice_port_info.phy.link_info.link_info 0xff48df6f05dca018
>   phy.link_info.link_info = 0xc0,    // DOWN
> crash> ice_ptp_port.port_num,link_up 0xff48df6f05dd44e0
>   port_num = 0x1
>   link_up = 0x0,    // False
> 
> crash> net_device.name,state 0xff48df6f25e3f000
>   name = "ens1f0",
>   state = 0x3,    // UP
> crash> ice_port_info.phy.link_info.link_info 0xff48df6f070a3018
>   phy.link_info.link_info = 0xe1,    // UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f063184e0
>   port_num = 0x0
>   link_up = 0x0,    // False
> 
> crash> ice_ptp_port.port_num,link_up 0xff48df6f25b844e0
>   port_num = 0x2
>   link_up = 0x0,    // False even this device is UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f140384e0
>   port_num = 0x3
>   link_up = 0x0,    // False even this device is UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f055044e0
>   port_num = 0x0
>   link_up = 0x0,    // False even this device is UP
> crash> ice_ptp_port.port_num,link_up 0xff48df6f251cc4e0
>   port_num = 0x1
>   link_up = 0x0,
> crash> ice_ptp_port.port_num,link_up 0xff48df6f33a9c4e0
>   port_num = 0x2
>   link_up = 0x0,
> crash> ice_ptp_port.port_num,link_up 0xff48df6f3bb7c4e0
>   port_num = 0x3
>   link_up = 0x0,
> 
> In other words, the ice_ptp_port.link_up is always false and cannot be
> used. That's why I had to fall back to
> hw->port_info->phy.link_info.link_info
> 

Hmm. We call ice_ptp_link_change in ice_link_event which is called from
ice_handle_link_event...

In ice_link_event, a local link_up field is set based on
phy_info->link_info.link_info & ICE_AQ_LINK_UP

What kernel are you testing on? Does it include 6b1ff5d39228 ("ice:
always call ice_ptp_link_change and make it void")?

Prior to this commit the field was only valid for E822 devices, but I
fixed that as it was used for other checks as well.

I am guessing that the Red Hat kernel you are using lacks several of
these clean ups and fixes.

For the current code in the net-next kernel I believe we can safely use
the ptp_port->link_up field.

Thanks,
Jake