lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71bdcb46-83c3-496e-861f-cc0841fb26e3@engleder-embedded.com>
Date: Tue, 10 Dec 2024 20:57:50 +0100
From: Gerhard Engleder <gerhard@...leder-embedded.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org,
 anthony.l.nguyen@...el.com, przemyslaw.kitszel@...el.com,
 andrew+netdev@...n.ch, davem@...emloft.net, kuba@...nel.org,
 edumazet@...gle.com, pabeni@...hat.com, Gerhard Engleder <eg@...a.com>,
 Vitaly Lifshits <vitaly.lifshits@...el.com>
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v2] e1000e: Fix real-time
 violations on link up

On 10.12.24 16:27, Bjorn Helgaas wrote:
> On Sun, Dec 08, 2024 at 07:49:50PM +0100, Gerhard Engleder wrote:
>> Link down and up triggers update of MTA table. This update executes many
>> PCIe writes and a final flush. Thus, PCIe will be blocked until all writes
>> are flushed. As a result, DMA transfers of other targets suffer from delay
>> in the range of 50us. This results in timing violations on real-time
>> systems during link down and up of e1000e.
> 
> These look like PCIe memory writes (not config or I/O writes), which
> are posted and do not require Completions.  Generally devices should
> not delay acceptance of posted requests for more than 10us (PCIe r6.0,
> sec 2.3.1).
> 
> Since you mention DMA to/from other targets, maybe there's some kind
> of fairness issue in the interconnect, which would suggest a
> platform-specific issue that could happen with devices other than
> e1000e.
> 
> I think it would be useful to get to the root cause of this, or at
> least mention the interconnect design where you saw the problem in
> case somebody trips over this issue with other devices.

Getting the root cause would be interesting, but this problem happens on
a rather ancient platform: an Intel i3-2310E Ivy Bridge CPU launched in
2011 (which still does its job as robot controller). Intel support does
not answer questions for such old platforms. Even for other timing
issues on the interconnect the Intel support was limited. I will mention
the CPU more explicitly as the platform with this issue.

> The PCIe spec does have an implementation note that says drivers might
> need to restrict the programming model as you do here for designs that
> can't process posted requests fast enough.  If that's the case for
> e1000e, I would ask Intel whether other related devices might also be
> affected.

Even for newer CPUs the Intel support has already ended and this CPU is
not sold by Intel anymore. So I won't get an answer. But our experience
is that limiting the number of posted writes always make sense at least
for real-time. Even for our own FPGA based PCIe target, which is able to
consume posted writes at full speed, we limit the number of posted
writes to reduce negative effects on real-time. This experience was made
with multiple Intel platforms.

Gerhard

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ