netdev - Re: bnxt_en: Incorrect tx timestamp report

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CBBDA12F-05B4-4842-97BF-11B392AD21F1@avride.ai>
Date: Wed, 26 Mar 2025 15:50:03 +0200
From: Kamil Zaripov <zaripov-kamil@...ide.ai>
To: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
Cc: Michael Chan <michael.chan@...adcom.com>,
 Jacob Keller <jacob.e.keller@...el.com>,
 Pavan Chebbi <pavan.chebbi@...adcom.com>,
 Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: bnxt_en: Incorrect tx timestamp report



> On 25 Mar 2025, at 12:41, Vadim Fedorenko <vadim.fedorenko@...ux.dev> wrote:
> 
> On 25/03/2025 10:13, Kamil Zaripov wrote:
>> 
>> I guess I don’t understand how does it work. Am I right that if userspace program changes frequency of PHC devices 0,1,2,3 (one for each port present in NIC) driver will send PHC frequency change 4 times but firmware will drop 3 of these frequency change commands and will pick up only one? How can I understand which PHC will actually represent adjustable clock and which one is phony?
> 
> It can be any of PHC devices, mostly the first to try to adjust will be used.

I believe that randomly selecting one of the PHC clock to control actual PHC in NIC and directing commands received on other clocks to the /dev/null is quite unexpected behavior for the userspace applications.

>> Another thing that I cannot understand is so-called RTC and non-RTC mode. Is there any documentation that describes it? Or specific parts of the driver that change its behavior on for RTC and non-RTC mode?
> 
> Generally, non-RTC means free-running HW PHC clock with timecounter
> adjustment on top of it. With RTC mode every adjfine() call tries to
> adjust HW configuration to change the slope of PHC.

Just to clarify:

Am I right that in RTC mode:
1.1. All 64 bits of the PHC counter are stored on the NIC (both the “readable” 0–47 bits and the higher 48–63 bits).
1.2. When userspace attempts to change the PHC counter value (using adjtime or settime), these changes are propagated to the NIC via the PORT_MAC_CFG_REQ_ENABLES_PTP_ADJ_PHASE and FUNC_PTP_CFG_REQ_ENABLES_PTP_SET_TIME requests.
1.3. If one port of a four-port NIC is updated, the change is propagated to all other ports via the ASYNC_EVENT_CMPL_PHC_UPDATE_EVENT_DATA1_FLAGS_PHC_RTC_UPDATE event. As a result, all four instances of the bnxt_en driver receive the event with the high 48–63 bits of the counter in payload. They then asynchronously read the 0–47 bits and update the timecounter struct’s nsec field.
1.4. If we ignore the bug related to unsynchronized reading of the higher (48–63) and lower (0–47) bits of the PHC counter, the time across each timecounter instance should remain in sync.
1.5. When userspace calls adjfine, it triggers the PORT_MAC_CFG_REQ_ENABLES_PTP_FREQ_ADJ_PPB request, causing the PHC tick rate to change.

In non-RTC mode:
2.1. Only the lower 0–47 bits are stored on the NIC. The higher 48–63 bits are stored only in the timecounter struct.
2.2. When userspace tries to change the PHC counter via adjtime or settime, the change is reflected only in the timecounter struct.
2.3. Each timecounter instance may have its own nsec field value, potentially leading to different timestamps read from /dev/ptp[0-3].
2.4. When userspace calls adjfine, it only modifies the mul field in the cyclecounter struct, which means no real changeoccurs to the PHC tick rate on the hardware.

And about issue in general:
3.1. Firmware versions 230+ operate in non-RTC mode in all environments.
3.2. Firmware version 224 uses RTC mode because older driver versions were not designed to track overflows (the higher 48–63 bits of the PHC counter) on the driver side.


>>> The latest driver handles the rollover on its own and we don't need the firmware to tell us.
>>> I checked with the firmware team and I gather that the version you are using is very old.
>>> Firmware version 230.x onwards, you should not receive this event for rollovers.
>>> Is it possible for you to update the firmware? Do you have access to a more recent (230+) firmware?
>> Yes, I can update firmware if you can tell where can I find the latest firmware and the update instructions?
> 
> Broadcom's web site has pretty easy support portal with NIC firmware
> publicly available. Current version is 232 and it has all the
> improvements Pavan mentioned.

Yes, I have found the "Broadcom BCM57xx Fwupg Tools” archive with some precompiled binaries for x86_64 platform. The problem is that our hosts are aarch64 and uses the Nix as a package manager, it will take some time to make it work in our setup. I just hoped that there is firmware binary itself that I can pass to ethtool —-flash.



> On 25 Mar 2025, at 14:24, Pavan Chebbi <pavan.chebbi@...adcom.com> wrote:
> 
>>> Yes, I can update firmware if you can tell where can I find the latest firmware and the update instructions?
>>> 
>> 
>> Broadcom's web site has pretty easy support portal with NIC firmware
>> publicly available. Current version is 232 and it has all the
>> improvements Pavan mentioned.
>> 
> Thanks Vadim for chiming in. I guess you answered all of Kamil's questions.

Yes, thank you for help. Without your explanation, I would have spent a lot more time understanding it on my own.

> I am curious about Kamil's use case of running PTP on 4 ports (in a
> single host?) which seem to be using RTC mode.
> Like Vadim pointed out earlier, this cannot be an accurate config
> given we run a shared PHC.
> Can Kamil give details of his configuration?

I have a system equipped with a BCM57502 NIC that functions as a PTP grandmaster in a small local network. Four PTP clients — each connected to one of the NIC’s four ports — synchronize their time with the grandmaster using the PTP L2P2P protocol. To support this configuration, I run four ptp4l instances (one for each port) and a single phc2sys daemon to synchronize system time and PHC time by adjusting the PHC. Because the bnxt_en driver reports different PHC device indexes for each NIC port, the phc2sys daemon treats each PHC device as independent and adjusts their times separately.

We also have a similar setup with a different network card, the Intel E810-C, which has four ports as well. However, its ice driver exposes only one PHC device and probably read PHC counter in a different way. I do not remember similar issues with this setup.