lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3cc0d360-8f51-4cdd-90fd-1fa0a199c2ba@amd.com>
Date: Mon, 17 Jun 2024 17:53:21 +0200
From: Christian König <christian.koenig@....com>
To: Xi Ruoyao <xry111@...111.site>, Icenowy Zheng <uwu@...nowy.me>,
 Alex Deucher <alexander.deucher@....com>, Pan Xinhui <Xinhui.Pan@....com>,
 David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
 Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>,
 Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>
Cc: amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, loongarch@...ts.linux.dev
Subject: Re: [PATCH 1/2] drm/amdgpu: make duplicated EOP packet for GFX7/8
 have real content

Am 17.06.24 um 17:35 schrieb Xi Ruoyao:
> On Mon, 2024-06-17 at 22:30 +0800, Icenowy Zheng wrote:
>>> Two consecutive writes to the same bus address are perfectly legal
>>> from
>>> the PCIe specification and can happen all the time, even without this
>>> specific hw workaround.
>> Yes I know it, and I am not from Loongson, just some user trying to
>> mess around it.
> There are some purposed "workarounds" like reducing the link speed (from
> x16 to x8), tweaking the power management setting, etc.  Someone even
> claims improving the heat sink of the LS7A chip can help to work around
> this issue but I'm really skeptical...

Well when it's an ordering problem between writes and interrupts then 
nothing else than getting the order right will fix this. Otherwise it 
can always be that the CPU doesn't see coherent results from PCIe devices.

In other words if the CPU gets an interrupt but doesn't sees the fence 
value written it will assume the work is not done. But since the 
hardware won't trigger a second interrupt the CPU will then keep waiting 
for the operation to finish forever.

This is not limited to GPUs, but will potentially happen with network or 
disk I/O as well.

Regards,
Christian.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ