[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACjP9X_v9AFVNRgz2a-qJce+ZqR0TzRzyd4gPFufESoRXmCdJQ@mail.gmail.com>
Date: Thu, 19 Jan 2023 10:38:55 +0100
From: Daniel Vacek <neelx@...hat.com>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Jesse Brandeburg <jesse.brandeburg@...el.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Richard Cochran <richardcochran@...il.com>,
"Kolacinski, Karol" <karol.kolacinski@...el.com>,
Siddaraju <siddaraju.dh@...el.com>,
"Michalik, Michal" <michal.michalik@...el.com>,
netdev@...r.kernel.org, intel-wired-lan@...ts.osuosl.org,
linux-kernel@...r.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH] ice/ptp: fix the PTP worker retrying
indefinitely if the link went down
On Wed, Jan 18, 2023 at 11:22 PM Jacob Keller <jacob.e.keller@...el.com> wrote:
> On 1/18/2023 2:11 PM, Daniel Vacek wrote:
> > On Wed, Jan 18, 2023 at 9:59 PM Jacob Keller <jacob.e.keller@...el.com> wrote:
> >> On 1/18/2023 7:14 AM, Daniel Vacek wrote:
> >> 1) request tx timestamp
> >> 2) timestamp occurs
> >> 3) link goes down while processing
> >
> > I was thinking this is the case we got reported. But then again, I'm
> > not really experienced in this field.
> >
>
> I think it might be, or at least something similar to this.
>
> I think that can be fixed with the link check you added. I think we
> actually have a copy of the current link status in the ice_ptp or
> ice_ptp_tx structure which could be used instead of having to check back
> to the other structure.
If you're talking about ptp_port->link_up that one is always false no
matter the actual NIC link status. First I wanted to use it but
checking all the 8 devices available in the dump data it just does not
match the net_dev->state or the port_info->phy.link_info.link_info
crash> net_device.name,state 0xff48df6f0c553000
name = "ens1f1",
state = 0x7, // DOWN
crash> ice_port_info.phy.link_info.link_info 0xff48df6f05dca018
phy.link_info.link_info = 0xc0, // DOWN
crash> ice_ptp_port.port_num,link_up 0xff48df6f05dd44e0
port_num = 0x1
link_up = 0x0, // False
crash> net_device.name,state 0xff48df6f25e3f000
name = "ens1f0",
state = 0x3, // UP
crash> ice_port_info.phy.link_info.link_info 0xff48df6f070a3018
phy.link_info.link_info = 0xe1, // UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f063184e0
port_num = 0x0
link_up = 0x0, // False
crash> ice_ptp_port.port_num,link_up 0xff48df6f25b844e0
port_num = 0x2
link_up = 0x0, // False even this device is UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f140384e0
port_num = 0x3
link_up = 0x0, // False even this device is UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f055044e0
port_num = 0x0
link_up = 0x0, // False even this device is UP
crash> ice_ptp_port.port_num,link_up 0xff48df6f251cc4e0
port_num = 0x1
link_up = 0x0,
crash> ice_ptp_port.port_num,link_up 0xff48df6f33a9c4e0
port_num = 0x2
link_up = 0x0,
crash> ice_ptp_port.port_num,link_up 0xff48df6f3bb7c4e0
port_num = 0x3
link_up = 0x0,
In other words, the ice_ptp_port.link_up is always false and cannot be
used. That's why I had to fall back to
hw->port_info->phy.link_info.link_info
--nX
> I'm just hoping not to re-introduce bugs related to the hardware
> interrupt counter that we had which results in preventing all future
> timestamp interrupts.
>
> > --nX
> >
> >> 1) link down
> >> 2) request tx timestamp rejected
> >>
> >> Thanks!
> >>
> >> -Jake
> >
>
Powered by blists - more mailing lists