netdev - Re: [Intel-wired-lan] [PATCH] ice/ptp: fix the PTP worker retrying indefinitely if the link went down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <42e74619-f2d0-1079-28b1-61e9e17ae953@intel.com>
Date:   Wed, 18 Jan 2023 12:59:18 -0800
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Daniel Vacek <neelx@...hat.com>
CC:     Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        "Eric Dumazet" <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Richard Cochran <richardcochran@...il.com>,
        "Kolacinski, Karol" <karol.kolacinski@...el.com>,
        Siddaraju <siddaraju.dh@...el.com>,
        "Michalik, Michal" <michal.michalik@...el.com>,
        <netdev@...r.kernel.org>, <intel-wired-lan@...ts.osuosl.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [Intel-wired-lan] [PATCH] ice/ptp: fix the PTP worker retrying
 indefinitely if the link went down



On 1/18/2023 7:14 AM, Daniel Vacek wrote:
> On Tue, Jan 17, 2023 at 7:47 PM Jacob Keller <jacob.e.keller@...el.com> wrote:
>> On 1/17/2023 10:15 AM, Daniel Vacek wrote:
>>> When the link goes down the ice_ptp_tx_tstamp_work() may loop forever trying
>>> to process the packets. In this case it makes sense to just drop them.
>>>
>>
>> Hi,
>>
>> Do you have some more details on what situation caused this state? Or is
>> this just based on code review?
>>
>> It won't loop forever because if link is down for more than 2 seconds
>> we'll discard the old timestamps which we assume are not going to arrive.
> 
> Oh, my bad. I should have improved the commit message. This
> comes from RHEL 8.4 where we do not have commit 0dd9286263923
> ("ice: handle discarding old Tx requests in ice_ptp_tx_tstamp")
> yet.
> 
> Though, I still consider 2 seconds being 'forever like'.
> 

It is somewhat forever like in CPU time, yes.

> What really happened is that after the link went down, the
> kthread worker 'ice-ptp-0000:10' being a high prio RT class
> task kept looping on CPU 33 for almost three minutes starving
> other tasks with lower priority. This resulted in RCU stall
> and finally in panic due to hung tasks.
> 

Right, because we essentially used to starve the task that was supposed
to clean this up. That is fixed with the commit you mentioned above.

> The machine is using the E810 NIC.
> 
>> [1193024.082447] ice 0000:10:00.1 ens1f1: NIC Link is Down
>> [1193024.087861] iavf 0000:10:09.1 net1: NIC Link is Down
>> [1193024.093077] iavf 0000:10:09.2 llsm1: NIC Link is Down
>> [1193024.099324] iavf 0000:10:09.3 llsm1: NIC Link is Down
>> [1193060.630500] ice 0000:10:00.2 ens1f2: NIC Link is up 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: Off, Autoneg Negotiated: False, Flow Contr>
>> [1193060.647921] iavf 0000:10:11.1 llsm2: NIC Link is Up Speed is 25 Gbps Full Duplex
>> [1193060.655575] iavf 0000:10:11.2 net2: NIC Link is Up Speed is 25 Gbps Full Duplex
>> [1193060.663140] iavf 0000:10:11.3 llsm2: NIC Link is Up Speed is 25 Gbps Full Duplex
>> [1193084.383798] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> [1193084.383799] rcu:   Tasks blocked on level-1 rcu_node (CPUs 32-47): P183189
>> [1193084.383804]        (detected by 0, t=60002 jiffies, g=379331397, q=294595)
>> [1193084.383807] runc            R  running task        0 183189 183175 0x00084080
>> [1193088.721408] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P161097 P183189 } 63616 jiffies s: 1074129 root: 0x4/.
>> [1193089.074856] rcu: blocking rcu_node structures: l=1:32-47:0x0/T
>>
>> [1193199.313106] INFO: task xfsaild/sda4:1428 blocked for more than 120 seconds.
>>
>> [1193201.399021] NMI backtrace for cpu 33
>> [1193201.399021] CPU: 33 PID: 1354 Comm: ice-ptp-0000:10 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-305.49.1.rt7.121.el8_4.x86_64 #1
>>
>> [1193201.399050] Kernel panic - not syncing: hung_task: blocked tasks
>> [1193201.399052] CPU: 63 PID: 559 Comm: khungtaskd Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-305.49.1.rt7.121.el8_4.x86_64 #1
> 
>> crash> bc 33
>> PID: 1354     TASK: ff48df6f2c5c8000  CPU: 33   COMMAND: "ice-ptp-0000:10"
>>  #0 [fffffe000069ce38] crash_nmi_callback+0x33 at ffffffffb664cb63
>>  #1 [fffffe000069ce40] nmi_handle+0x5b at ffffffffb661c68b
>>  #2 [fffffe000069ce98] default_do_nmi+0x72 at ffffffffb6f4a562
>>  #3 [fffffe000069cec0] do_nmi+0x18d at ffffffffb661cd1d
>>  #4 [fffffe000069cef0] end_repeat_nmi+0x16 at ffffffffb7001544
>>     [exception RIP: delay_tsc+0x6d]
>>     RIP: ffffffffb6f4794d  RSP: ff7daf7a4aa2fd48  RFLAGS: 00000246
>>     RAX: 000000008cf66cff  RBX: 0006582e8cf66caf  RCX: 0000000000000000
>>     RDX: 000000000006582e  RSI: ffffffffb76eefe3  RDI: ffffffffb76d1b60
>>     RBP: 0006582e8cf5ef89   R8: 000000000000000c   R9: 0000000000000000
>>     R10: 0000000000000000  R11: 0000000000000000  R12: 00000000000249f2
>>     R13: 0000000000000021  R14: ff48df63ec9aa3e0  R15: ff48df57541d0300
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> --- <NMI exception stack> ---
>>  #5 [ff7daf7a4aa2fd48] delay_tsc+0x6d at ffffffffb6f4794d
>>  ## [  endcall jmp 68] __const_udelay+0x37 at ffffffffb6f477f7
>>  #6 [ff7daf7a4aa2fd68] ice_sq_send_cmd+0x356 at ffffffffc0612ee6 [ice]
>>  ## [  inlined     c0] ice_sbq_send_cmd at drivers/net/ethernet/intel/ice/ice_common.c:1407
>>  #7 [ff7daf7a4aa2fdc0] ice_sbq_rw_reg+0xc4 at ffffffffc06147e4 [ice]
>>  #8 [ff7daf7a4aa2fe18] ice_read_phy_reg_e810+0x4d at ffffffffc0655b0d [ice]
>>  ## [  inlined     50] ice_read_phy_tstamp_e810 at drivers/net/ethernet/intel/ice/ice_ptp_hw.c:2608
>>  #9 [ff7daf7a4aa2fe50] ice_read_phy_tstamp+0x5a at ffffffffc06584ca [ice]
>>
>> #10 [ff7daf7a4aa2fe90] ice_ptp_tx_tstamp_work+0x95 at ffffffffc0654185 [ice]
>> #11 [ff7daf7a4aa2fee0] kthread_worker_fn+0xab at ffffffffb6701c4b
>> #12 [ff7daf7a4aa2ff10] kthread+0x15d at ffffffffb6701b7d
>>
>> crash> delay_fn
>> delay_fn = (void (*)(u64)) 0xffffffffb6f478e0 <delay_tsc>
>>
>> crash> bc 33 | grep -o 'R[BDA1][PX2]: 0[^ ]*'
>> RAX: 000000008cf66cff
>> RBX: 0006582e8cf66caf
>> RDX: 000000000006582e
>> RBP: 0006582e8cf5ef89
>> R12: 00000000000249f2
>>
>>         249f2 r12     cycles (150002)
>> 6582e8cf5ef89 rbp     bclock
>> 6582e8cf66cff rdx:rax now
>> 6582e8cf66caf rbx     last
>>
>>  962 ice_sq_send_cmd(struct ice_hw *hw, struct ice_ctl_q_info *cq,
>> --
>>  965 {
>> --
>> 1063         do {
>> 1064                 if (ice_sq_done(hw, cq))
>> 1065                         break;
>> 1066
>> 1067                 udelay(ICE_CTL_Q_SQ_CMD_USEC);
>> 1068                 total_delay++;
>> 1069         } while (total_delay < cq->sq_cmd_timeout);
>>
>>  19 /* 0x10c7 is 2**32 / 1000000 (rounded up) */
>>  20 #define udelay(n)                                                       \
>> --
>>  26                                 __const_udelay((n) * 0x10c7ul);         \
>>
>> crash> pd loops_per_jiffy
>> loops_per_jiffy = 1500000
>>
>> #define ICE_CTL_Q_SQ_CMD_USEC           100   /* Check every 100usec */
>>
>> crash> pd (100*0x10c7*4*250*loops_per_jiffy>>32)+1
>> 150002
>>
>> crash> p 100*0x10c7ul
>> 0x68dbc
>>
>> crash> dis ice_sq_send_cmd+0x346 13
>> 0xffffffffc0612ed6 <ice_sq_send_cmd+0x346>: jmp    0xffffffffc0612eef <ice_sq_send_cmd+0x35f>
>>
>> 0xffffffffc0612ed8 <ice_sq_send_cmd+0x348>: mov    $0x68dbc,%edi
>> 0xffffffffc0612edd <ice_sq_send_cmd+0x34d>: add    $0x1,%r13d
>> 0xffffffffc0612ee1 <ice_sq_send_cmd+0x351>: call   0xffffffffb6f477c0 <__const_udelay>
>> 0xffffffffc0612ee6 <ice_sq_send_cmd+0x356>: cmp    %r13d,0xc8(%rbp)
>> 0xffffffffc0612eed <ice_sq_send_cmd+0x35d>: jbe    0xffffffffc0612f05 <ice_sq_send_cmd+0x375>
>>
>> 0xffffffffc0612eef <ice_sq_send_cmd+0x35f>: mov    0xa0(%rbp),%eax
>> 0xffffffffc0612ef5 <ice_sq_send_cmd+0x365>: add    (%rbx),%rax
>> 0xffffffffc0612ef8 <ice_sq_send_cmd+0x368>: mov    (%rax),%edx
>> 0xffffffffc0612efa <ice_sq_send_cmd+0x36a>: movzwl 0x9a(%rbp),%eax
>> 0xffffffffc0612f01 <ice_sq_send_cmd+0x371>: cmp    %eax,%edx
>> 0xffffffffc0612f03 <ice_sq_send_cmd+0x373>: jne    0xffffffffc0612ed8 <ice_sq_send_cmd+0x348>
>>
>> 0xffffffffc0612f05 <ice_sq_send_cmd+0x375>: mov    0xa0(%rbp),%eax
>>
>> crash> dis delay_tsc 2
>> 0xffffffffb6f478e0 <delay_tsc>: nopl   0x0(%rax,%rax,1) [FTRACE NOP]
>> 0xffffffffb6f478e5 <delay_tsc+0x5>: push   %r13
>>
>> crash> rx ff7daf7a4aa2fd60
>> ff7daf7a4aa2fd60:  0000000000000001    // total_delay
> 
> It's the first time the loop in function ice_sq_send_cmd() is being iterated.
> The ice_sq_send_cmd() has to be called all over again.
> 
>> crash> dis ice_sq_send_cmd+0x2d
>> 0xffffffffc0612bbd <ice_sq_send_cmd+0x2d>: mov    %rsi,%rbp
>>
>> crash> dis delay_tsc 6
>> 0xffffffffb6f478e0 <delay_tsc>: nopl   0x0(%rax,%rax,1) [FTRACE NOP]
>> 0xffffffffb6f478e5 <delay_tsc+0x5>: push   %r13
>> 0xffffffffb6f478e7 <delay_tsc+0x7>: push   %r12
>> 0xffffffffb6f478e9 <delay_tsc+0x9>: mov    %rdi,%r12
>> 0xffffffffb6f478ec <delay_tsc+0xc>: mov    $0x1,%edi
>> 0xffffffffb6f478f1 <delay_tsc+0x11>: push   %rbp
>>
>> crash> rx ff7daf7a4aa2fd50
>> ff7daf7a4aa2fd50:  ff48df6f05dd6730
>>
>> crash> ice_ctl_q_info.sq_cmd_timeout,sq.head,sq.next_to_use ff48df6f05dd6730
>>   sq_cmd_timeout = 0x2710 = 10000,
>>   sq.head = 0x80300,
>>   sq.next_to_use = 0x20,
> 
> 
>> crash> runqc 33
>> CPU 33 RUNQUEUE: ff48df6f3f86c380
>>   CURRENT: PID: 1354     TASK: ff48df6f2c5c8000  COMMAND: "ice-ptp-0000:10"
>>   RT PRIO_ARRAY: ff48df6f3f86c5c0
>>      [ 89] PID: 1354     TASK: ff48df6f2c5c8000  COMMAND: "ice-ptp-0000:10"
>>      [ 98] PID: 161097   TASK: ff48df634a111ec0  COMMAND: "handler111"
>>   CFS RB_ROOT: ff48df6f3f86c430
>>      [100] PID: 583      TASK: ff48df6f37845c40  COMMAND: "kworker/33:1H"
>>      [120] PID: 4113728  TASK: ff48df5055db9ec0  COMMAND: "kworker/33:6"
>>      [120] PID: 183189   TASK: ff48df5d24aa8000  COMMAND: "runc"
>>
>> crash> ps -m 1354 161097 583 4113728 183189
>> [ 0 00:02:56.990] [RU]  PID: 1354     TASK: ff48df6f2c5c8000  CPU: 33   COMMAND: "ice-ptp-0000:10"
>> [ 0 00:02:56.990] [RU]  PID: 161097   TASK: ff48df634a111ec0  CPU: 33   COMMAND: "handler111"
>> [ 0 00:02:56.991] [RU]  PID: 4113728  TASK: ff48df5055db9ec0  CPU: 33   COMMAND: "kworker/33:6"
>> [ 0 00:02:57.015] [RU]  PID: 183189   TASK: ff48df5d24aa8000  CPU: 33   COMMAND: "runc"
>> [ 0 00:02:57.030] [RU]  PID: 583      TASK: ff48df6f37845c40  CPU: 33   COMMAND: "kworker/33:1H"
> 
> These runnable tasks have lower prio and hence they are being starved.
> 
>> crash> ps -mC33 | head -13
>> CPU: 33
>> [ 0 00:02:56.990] [RU]  PID: 1354     TASK: ff48df6f2c5c8000  CPU: 33   COMMAND: "ice-ptp-0000:10"
>> [ 0 00:02:56.990] [RU]  PID: 161097   TASK: ff48df634a111ec0  CPU: 33   COMMAND: "handler111"
>> [ 0 00:02:56.991] [RU]  PID: 4113728  TASK: ff48df5055db9ec0  CPU: 33   COMMAND: "kworker/33:6"
>> [ 0 00:02:56.991] [IN]  PID: 278      TASK: ff48df504fd08000  CPU: 33   COMMAND: "ksoftirqd/33"
>> [ 0 00:02:57.014] [IN]  PID: 277      TASK: ff48df504fcebd80  CPU: 33   COMMAND: "rcuc/33"
>> [ 0 00:02:57.015] [RU]  PID: 183189   TASK: ff48df5d24aa8000  CPU: 33   COMMAND: "runc"
>> [ 0 00:02:57.016] [IN]  PID: 146      TASK: ff48df5040268000  CPU: 33   COMMAND: "rcuop/16"
>> [ 0 00:02:57.020] [IN]  PID: 276      TASK: ff48df504fce8000  CPU: 33   COMMAND: "migration/33"
>> [ 0 00:02:57.024] [IN]  PID: 413      TASK: ff48df504f971ec0  CPU: 33   COMMAND: "rcuop/49"
>> [ 0 00:02:57.024] [IN]  PID: 494      TASK: ff48df504f43bd80  CPU: 33   COMMAND: "rcuop/59"
>> [ 0 00:02:57.029] [UN]  PID: 183233   TASK: ff48df5193618000  CPU: 33   COMMAND: "exe"
>> [ 0 00:02:57.030] [RU]  PID: 583      TASK: ff48df6f37845c40  CPU: 33   COMMAND: "kworker/33:1H"
> 
>> crash> ice_pf.pdev,int_name ff48df6f05dd4180
>>   pdev = 0xff48df6f37fe6000,
>>   int_name = "ice-0000:10:00.1:misc",
>>
>> crash> ice_hw.mac_type ff48df6f05dd4dc0
>>   mac_type = ICE_MAC_E810,
> 
>> crash> ice_ptp_tx.work.func,len,in_use,quad,quad_offset ff48df6f05dd44e0
>>   work.func = 0xffffffffc06540f0 <ice_ptp_tx_tstamp_work>,
>>   len = 0x40,
>>   in_use = 0xff48df63d0354cc0,
>>   quad = 0x1,
>>   quad_offset = 0x0,
>>
>> crash> rx 0xff48df63d0354cc0
>> ff48df63d0354cc0:  0000000000000001
> 
>> crash> ice_ptp_tx.tstamps ff48df6f05dd44e0
>>   tstamps = 0xff48df62d6e69800,
>>
>> crash> ice_tx_tstamp.start,cached_tstamp,skb 0xff48df62d6e69800
>>   start = 0x14717935c,
>>   cached_tstamp = 0xbe9c63f31b
>>   skb = 0xff48df5e20552f00,
>>
>> crash> pd jiffies-0x14717935c
>> 177354
> 
> The packet is waiting for almost three minutes. The mentioned
> commit 0dd9286263923 would definitely skip it and clear it's
> bit in the tx->in_use mask. But still only two seconds too late
> which needs to be addressed. And since the link went down
> I guess just dropping the packets is a reasonable approach.
> 
Yes, but we have to be careful of corner cases that could result in
completely stopping Tx timestamp interrupts until a device reset.

> Checking the latest upstream code in v6.2-rc4 I *guess* this
> can still happen as described above. Just not looping forever
> but rather only for 2 seconds. But doing nothing for two
> seconds is just as bad really. Hence the patch I'm suggesting.
> 
>> The trouble is that if a timestamp *does* arrive late, we need to ensure
>> that we never assign the captured time to the wrong packet, and that for
>> E822 devices we always read the correct number of timestamps (otherwise
>> we can get the logic for timestamp interrupt generation broken).
>>
>> Consider for example this flow for e810:
>>
>> 1) a tx packet with a timestamp request is scheduled to hw
>> 2) the packet begins being transmitted
>> 3) link goes down
>> 4) interrupt triggers, ice_ptp_tx_tstamp is run
>> 5) link is down, so we skip reading this timestamp. Since we don't read
>> the timestamp, we just discard the skb and we don't update the cached tx
>> timestamp value
>> 6) link goes up
>> 7) 2 tx packets with a timestamp request are sent and one of them is on
>> the same index as the packet in (1)
>> 8) the other tx packet completes and we get an interrupt
>> 9) the loop reads both timestamps. Since the tx packet in slot (1)
>> doesn't match its cached value it looks "new" so the function reports
>> the old timestamp to the wrong packet.
> 
> I don't really fully understand the HW nor the driver to be honest.
> For sure I can see it kept looping as the work is being re-queued
> due to tx->in_use bitmap not being cleared. And that can only
> happen when ice_read_phy_tstamp() returns with error or the
> returned &raw_tstamp is not valid or it matches the cached value
> (considering our version of ice_ptp_tx_tstamp() looks like the
> one in v5.15 stable tree). And that said the v5.15 stable needs
> backport of this fix as well I guess.
> 

Right.

>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/intel/ice/ice_ptp.c?h=v5.15.88#n1170
> 
>> Consider this flow for e822:
>>
>> 1) 2 tx packets with a timestamp request are scheduled to hw
>> 2) the packets begin being transmitted
>> 3) link goes down
>> 4) an interrupt for the Tx timestamp is triggered, but we don't read the
>> timestamps because we have link down and we skipped the ts_read.
>> 5) the internal e822 hardware counter is not decremented due to no reads
>> 6) no more timestamp interrupts will be triggered by hardware until we
>> read the appropriate number of timestamps
>>
>> I am not sure if link going up will properly reset and re-initialize the
>> Tx timestamp block but I suspect it does not. @Karol, @Siddaraju,
>> @Michal, do you recall more details on this?
>>
>> I understand the desire to avoid polling when unnecessary, but I am
>> worried because the hardware and firmware interactions here are pretty
>> complex and its easy to get this wrong. (see: all of the previous
>> patches and bug fixes we've been working on... we got this wrong a LOT
>> already...)
> 
> I see. Here my favorite quote applies: "Everybody falls the first time."
> 
> I totally understand and I cannot imagine developing and maintaining this
> without a solid test suite. On the other hand, some things (like especially
> races) could be tricky to reproduce.
>
Yea..

>> Without a more concrete explanation of what this fixes I'm worried about
>> this change :(
> 
> Hopefully I already managed to shed a bit more light here. Again, I'm sorry
> about the brief commit message.
> 

That's ok, this has provided a lot more detail thanks!

>> At a minimum I think I would only set drop_ts but not not goto skip_ts_read.
> 
> IIUC, that would still fail to clear the tx->in_use bit in case
> ice_read_phy_tstamp()
> returns with error. It would only work for the other case where no error is
> returned but rather the returned &raw_tstamp is invalid. I'll send a v2 of
> this fix trying to address the goto concern.
>

If drop_ts is set we should always be dropping the timestamp after
checking, even if we get an error. If there's a flow where that doesn't
happen we should fix that.

>> crash> ice_sbq_msg_input ff7daf7a4aa2fe2c
>> struct ice_sbq_msg_input {
>>   dest_dev = 0x2,
>>   opcode = 0x0,
>>   msg_addr_low = 0x1000,
>>   msg_addr_high = 0x309,
>>   data = 0x0
>> }
>> crash> ice_sbq_msg_req ff7daf7a4aa2fdc8
>> struct ice_sbq_msg_req {
>>   dest_dev = 0x2,
>>   src_dev = 0x0,
>>   opcode = 0x0,
>>   flags = 0x40,
>>   sbe_fbe = 0xf,
>>   func_id = 0x0,
>>   msg_addr_low = 0x1000,
>>   msg_addr_high = 0x309,
>>   data = 0x0
>> }
>> crash> ice_sbq_cmd_desc ff7daf7a4aa2fdd8
>> struct ice_sbq_cmd_desc {
>>   flags = 0x1400,
>>   opcode = 0xc00,
>>   datalen = 0x0,
>>   cmd_retval = 0x0,
>>   cookie_high = 0x0,
>>   cookie_low = 0x0,
>>   param0 = {
>>     cmd_len = 0xc,
>>     cmpl_len = 0xc
>>   },
>>   reserved = "",
>>   addr_high = 0x0,
>>   addr_low = 0x0
>> }
> 
> I see commit 1229b33973c7b ("ice: Add low latency Tx timestamp read") got
> rid of that kthread_work in favor of threaded irq. But the
> ice_misc_intr_thread_fn()
> keeps waking itself in the same manner as the work did keep re-queuing itself.
> So the loop still needs to be broken (i.e. the tx->in_use bitmap cleared). Even
> for v6.2-rc4.
> 


Yes.

>> That way if we happen to have a ready timestamp (for E822) we'll still
>> read it and avoid the miscounting from not reading a completed timestamp.
>>
>> This also ensures that on e810 the cached timestamp value is updated and
>> that we avoid the other situation.
>>
>> I'd still prefer if you have a bug report or more details on the failure
>> case. I believe even if we poll it should be no more than 2 seconds for
>> an old timestamp that never got sent to be discarded.
> 
> Unfortunately I only have this one report with a dump from customer's testing.
> The uptime was almost 14 days, so I'm not sure how easy it is to reproduce.
> 
> Again, hopefully the above covers enough details. I'm happy to share more
> if I missed some. But looping in a softirq (or worker thread) for two seconds
> is not really acceptable. That's just a wrong design. Right?
> 

Please re-send the patch with the goto concern addressed. (only set
drop_ts, and still read the timestamp just in case and make sure that if
the read function fails while drop_ts is set we still drop the ts).

I believe that alleviates my concerns regarding the potential link down
race with a completed timestamp above.

We also should already be preventing requesting new timestamps while
link is down so that should be sufficient to combine with this and cover
the three flows:

1) request tx timestamp
2) link down
3) timestamp never occurs

1) request tx timestamp
2) timestamp occurs
3) link goes down while processing

1) link down
2) request tx timestamp rejected

Thanks!

-Jake

> --nX
> 
>>> Signed-off-by: Daniel Vacek <neelx@...hat.com>
>>> ---
>>>  drivers/net/ethernet/intel/ice/ice_ptp.c | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
>>> index d63161d73eb16..c313177ba6676 100644
>>> --- a/drivers/net/ethernet/intel/ice/ice_ptp.c
>>> +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
>>> @@ -680,6 +680,7 @@ static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
>>>       struct ice_pf *pf;
>>>       struct ice_hw *hw;
>>>       u64 tstamp_ready;
>>> +     bool link_up;
>>>       int err;
>>>       u8 idx;
>>>
>>> @@ -695,6 +696,8 @@ static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
>>>       if (err)
>>>               return false;
>>>
>>> +     link_up = hw->port_info->phy.link_info.link_info & ICE_AQ_LINK_UP;
>>> +
>>>       for_each_set_bit(idx, tx->in_use, tx->len) {
>>>               struct skb_shared_hwtstamps shhwtstamps = {};
>>>               u8 phy_idx = idx + tx->offset;
>>> @@ -702,6 +705,12 @@ static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
>>>               bool drop_ts = false;
>>>               struct sk_buff *skb;
>>>
>>> +             /* Drop packets if the link went down */
>>> +             if (!link_up) {
>>> +                     drop_ts = true;
>>> +                     goto skip_ts_read;
>>> +             }
>>> +
>>>               /* Drop packets which have waited for more than 2 seconds */
>>>               if (time_is_before_jiffies(tx->tstamps[idx].start + 2 * HZ)) {
>>>                       drop_ts = true;
>>
>