[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3383e090-2545-4a02-abab-f92d4cbaa357@molgen.mpg.de>
Date: Thu, 7 Nov 2024 06:48:57 +0100
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Christopher S Hall <christopher.s.hall@...el.com>
Cc: intel-wired-lan@...ts.osuosl.org, David Zage <david.zage@...el.com>,
Vinicius Gomes <vinicius.gomes@...el.com>, netdev@...r.kernel.org,
"Cadore Cataldo, Rodrigo" <rodrigo.cadore@...coustics.com>,
Corinna Vinschen <vinschen@...hat.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
Mor Bar Gabay <morx.bar.gabay@...el.com>,
Avigail Dahan <avigailx.dahan@...el.com>,
Sasha Neftin <sasha.neftin@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the
hardware retry time to prevent timeouts
[Cc: +Sasha]
Dear Christopher,
Am 07.11.24 um 00:53 schrieb Hall, Christopher S:
>> From: Paul Menzel <pmenzel@...gen.mpg.de>
>> Sent: Wednesday, November 06, 2024 3:14 PM
>
>> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v3 2/6] igc: Lengthen the
>> hardware retry time to prevent timeouts
>> I’d use the more specific summary/title below:
>
> Will do.
>
>> igc: Lengthen hardware retry time to 4 μs to prevent timeouts
>>
>> Am 06.11.24 um 19:47 schrieb Christopher S M Hall:
>>> Lengthen the hardware retry timer to four microseconds.
>>>
>>> The i225/i226 hardware retries if it receives an inappropriate response
>>> from the upstream device. If the device retries too quickly, the root
>>> port does not respond.
>>
>> Any idea why? Is it documented somewhere?
>
> I do not. Theoretically, 1 us should work, but it does not. It could be a root
> port problem or an issue with i225/i226 NIC. I am not able to directly observe
> the state of either. 4 us has worked in all my testing I am comfortable with
> that value. 2 us also works, but given the limited hardware at my disposal
> I doubled the value to 4 us to be safe. PTM is not time critical. Typically,
> software initiates a transaction between 8 and 32 times per second. There
> is no performance impact for PTM or any other function of the card. The
> timeout occurs rarely, but if the retry time is too short the PTM state
> machine does not recover.
Thank you for clearing this up. If it’s not time critical, why not
revert the original patch and go back to 10 μs.
The referenced commit 6b8aa753a9f9 (igc: Decrease PTM short interval
from 10 us to 1 us) also says, that 1 μs was suggested by the hardware
team. Were you able to talk to them?
>>> The issue can be reproduced with the following:
>>>
>>> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
>>>
>>> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
>>> quickly reproduce the issue.
>>>
>>> PHC2SYS exits with:
>>>
>>> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
>>> fails
>>
>> Why four microseconds, and not some other value?
>
> See above.
It’d be great, if you extended the commit message.
>>> Fixes: 6b8aa753a9f9 ("igc: Decrease PTM short interval from 10 us to 1 us")
>>>
>>> -#define IGC_PTM_SHORT_CYC_DEFAULT 1 /* Default short cycle interval */
>>> +#define IGC_PTM_SHORT_CYC_DEFAULT 4 /* Default short cycle interval */
Maybe also add a comment, that 1 μs should work, but does not.
Kind regards,
Paul
Powered by blists - more mailing lists