lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALs4sv1KFsXLMJhsXTr2by1+UAXAiLTz90EQR5dJ4bqrs6xZCg@mail.gmail.com>
Date: Thu, 27 Mar 2025 18:46:45 +0530
From: Pavan Chebbi <pavan.chebbi@...adcom.com>
To: Kamil Zaripov <zaripov-kamil@...ide.ai>
Cc: Vadim Fedorenko <vadim.fedorenko@...ux.dev>, Michael Chan <michael.chan@...adcom.com>, 
	Jacob Keller <jacob.e.keller@...el.com>, Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: bnxt_en: Incorrect tx timestamp report

On Wed, Mar 26, 2025 at 7:20 PM Kamil Zaripov <zaripov-kamil@...ide.ai> wrote:
>
>
>
> > On 25 Mar 2025, at 12:41, Vadim Fedorenko <vadim.fedorenko@...ux.dev> wrote:
> >
> > On 25/03/2025 10:13, Kamil Zaripov wrote:
> >>
> >> I guess I don’t understand how does it work. Am I right that if userspace program changes frequency of PHC devices 0,1,2,3 (one for each port present in NIC) driver will send PHC frequency change 4 times but firmware will drop 3 of these frequency change commands and will pick up only one? How can I understand which PHC will actually represent adjustable clock and which one is phony?
> >
> > It can be any of PHC devices, mostly the first to try to adjust will be used.
>
> I believe that randomly selecting one of the PHC clock to control actual PHC in NIC and directing commands received on other clocks to the /dev/null is quite unexpected behavior for the userspace applications.
>
> >> Another thing that I cannot understand is so-called RTC and non-RTC mode. Is there any documentation that describes it? Or specific parts of the driver that change its behavior on for RTC and non-RTC mode?
> >
> > Generally, non-RTC means free-running HW PHC clock with timecounter
> > adjustment on top of it. With RTC mode every adjfine() call tries to
> > adjust HW configuration to change the slope of PHC.
>
> Just to clarify:
>
> Am I right that in RTC mode:
> 1.1. All 64 bits of the PHC counter are stored on the NIC (both the “readable” 0–47 bits and the higher 48–63 bits).
In both RTC and non-RTC modes, the driver will use the lower 48b from
HW as cycles to feed to the timecounter that driver has mapped to the
PHC.

> 1.2. When userspace attempts to change the PHC counter value (using adjtime or settime), these changes are propagated to the NIC via the PORT_MAC_CFG_REQ_ENABLES_PTP_ADJ_PHASE and FUNC_PTP_CFG_REQ_ENABLES_PTP_SET_TIME requests.
True.

> 1.3. If one port of a four-port NIC is updated, the change is propagated to all other ports via the ASYNC_EVENT_CMPL_PHC_UPDATE_EVENT_DATA1_FLAGS_PHC_RTC_UPDATE event. As a result, all four instances of the bnxt_en driver receive the event with the high 48–63 bits of the counter in payload. They then asynchronously read the 0–47 bits and update the timecounter struct’s nsec field.
Not true in the latest Firmware.

> 1.4. If we ignore the bug related to unsynchronized reading of the higher (48–63) and lower (0–47) bits of the PHC counter, the time across each timecounter instance should remain in sync.
Well, no. It won't be very accurate. We designed non-RTC mode for such
use cases. But yes, your use case is not exactly what non-RTC caters
for.

> 1.5. When userspace calls adjfine, it triggers the PORT_MAC_CFG_REQ_ENABLES_PTP_FREQ_ADJ_PPB request, causing the PHC tick rate to change.
Correct. But only the first ever port that made the freq adj will
continue to make further freq adjustments. This was a policy decision,
not exactly random. There is an option in our tools to see which is
the interface that is currently making freq adjustments.

>
> In non-RTC mode:
> 2.1. Only the lower 0–47 bits are stored on the NIC. The higher 48–63 bits are stored only in the timecounter struct.
> 2.2. When userspace tries to change the PHC counter via adjtime or settime, the change is reflected only in the timecounter struct.
Correct.

> 2.3. Each timecounter instance may have its own nsec field value, potentially leading to different timestamps read from /dev/ptp[0-3].
Basically each of the timecounters is independent.

> 2.4. When userspace calls adjfine, it only modifies the mul field in the cyclecounter struct, which means no real changeoccurs to the PHC tick rate on the hardware.
Correct.

>
> And about issue in general:
> 3.1. Firmware versions 230+ operate in non-RTC mode in all environments.
No, the driver makes the choice of when to shift to non-RTC from RTC.
Currently this happens only in the multi-host environment, where each
port is used to synchronize a different Linux system clock.
But 230+ version has the change that will not track the rollover in
FW, and the ASYNC_EVENT_CMPL_PHC_UPDATE_EVENT_DATA1_FLAGS_PHC_RTC_UPDATE
deprecated.

> 3.2. Firmware version 224 uses RTC mode because older driver versions were not designed to track overflows (the higher 48–63 bits of the PHC counter) on the driver side.
>
>
> >>> The latest driver handles the rollover on its own and we don't need the firmware to tell us.
> >>> I checked with the firmware team and I gather that the version you are using is very old.
> >>> Firmware version 230.x onwards, you should not receive this event for rollovers.
> >>> Is it possible for you to update the firmware? Do you have access to a more recent (230+) firmware?
> >> Yes, I can update firmware if you can tell where can I find the latest firmware and the update instructions?
> >
> > Broadcom's web site has pretty easy support portal with NIC firmware
> > publicly available. Current version is 232 and it has all the
> > improvements Pavan mentioned.
>
> Yes, I have found the "Broadcom BCM57xx Fwupg Tools” archive with some precompiled binaries for x86_64 platform. The problem is that our hosts are aarch64 and uses the Nix as a package manager, it will take some time to make it work in our setup. I just hoped that there is firmware binary itself that I can pass to ethtool —-flash.
>
>
>
> > On 25 Mar 2025, at 14:24, Pavan Chebbi <pavan.chebbi@...adcom.com> wrote:
> >
> >>> Yes, I can update firmware if you can tell where can I find the latest firmware and the update instructions?
> >>>
> >>
> >> Broadcom's web site has pretty easy support portal with NIC firmware
> >> publicly available. Current version is 232 and it has all the
> >> improvements Pavan mentioned.
> >>
> > Thanks Vadim for chiming in. I guess you answered all of Kamil's questions.
>
> Yes, thank you for help. Without your explanation, I would have spent a lot more time understanding it on my own.
>
> > I am curious about Kamil's use case of running PTP on 4 ports (in a
> > single host?) which seem to be using RTC mode.
> > Like Vadim pointed out earlier, this cannot be an accurate config
> > given we run a shared PHC.
> > Can Kamil give details of his configuration?
>
> I have a system equipped with a BCM57502 NIC that functions as a PTP grandmaster in a small local network. Four PTP clients — each connected to one of the NIC’s four ports — synchronize their time with the grandmaster using the PTP L2P2P protocol. To support this configuration, I run four ptp4l instances (one for each port) and a single phc2sys daemon to synchronize system time and PHC time by adjusting the PHC. Because the bnxt_en driver reports different PHC device indexes for each NIC port, the phc2sys daemon treats each PHC device as independent and adjusts their times separately.
>
If you are using Broadcom NIC, and have only one system time to
update, I don't see why we should have 4 PTP clients. Just one
instance of ptp4l running on one of the ports and one phc2sys is going
to be valid (and is sufficient?)
I am thinking out loud, the phc2sys daemon could be picking up all the
available clocks, but I think that needs to be modified, unless we
decide to stop exposing multiple clocks for the same PHC in our
design.
Of course, I am not sure if you have a requirement of 4 GMs to sync with.

> We also have a similar setup with a different network card, the Intel E810-C, which has four ports as well. However, its ice driver exposes only one PHC device and probably read PHC counter in a different way. I do not remember similar issues with this setup.
>
 I think on the Intel NIC, this problem itself would not arise,
because you will run only 1 client each of ptp4l and phc2sys, right?
But I am not sure how you can run 4 GMs on Intel NIC if you are
running that.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4196 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ