lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MN2PR12MB448814C93AA0664284E1DA32F7EA0@MN2PR12MB4488.namprd12.prod.outlook.com>
Date:   Mon, 9 Nov 2020 17:37:06 +0000
From:   "Deucher, Alexander" <Alexander.Deucher@....com>
To:     Thomas “illwieckz“ Debesse 
        <dev@...wieckz.net>, LKML <linux-kernel@...r.kernel.org>
CC:     "Koenig, Christian" <Christian.Koenig@....com>
Subject: RE: On disabling AGP without working alternative (PCI fallback is
 broken for years)

[AMD Public Use]

> -----Original Message-----
> From: Thomas “illwieckz“ Debesse <dev@...wieckz.net>
> Sent: Monday, November 9, 2020 6:41 AM
> To: LKML <linux-kernel@...r.kernel.org>
> Cc: Koenig, Christian <Christian.Koenig@....com>; Deucher, Alexander 
> <Alexander.Deucher@....com>
> Subject: On disabling AGP without working alternative (PCI fallback is 
> broken for years)
> 
> Hi, on May 12 2020, a commit (ba806f9) was merged disabling AGP in 
> default build.
> 
> It was signed-off by Christian König and Reviewed by Alex Deucher.
> Distributions started to backport this commit, and it seems to have 
> happened with 5.4.0-48-generic on Ubuntu 20.04 LTS side, which was 
> built on Sep 10 2020.
> 
> Around that time I noticed AGP computers experiencing lock-ups and 
> other problems making them unusable after the upgrade. After 
> investigating what was happening bisecting Linux versions, I reverted 
> the commit and those computers were working again.
> 
> Commit message was:
> 
> > This means a performance regression for some GPUs, but also a bug 
> > fix for some others.
> 
> Unfortunately, this commit does not only introduce a performance 
> regression but makes some computers unusable, maybe all computers with 
> AMD CPUs.
> 
> One of the root cause may be that PCI GPUs are broken for years on AMD 
> platforms, it was tested and verified on:
> 
> - K8-based computer with AGP
> - K8-based computer with PCI Express
> - K10-based computer with AGP
> - Piledriver-based computer with PCI Express
> 
> The breakage was tested and reproduced from Linux 4.4 to Linux
> 5.10-rc2 (I have not tried older than 4.4).
> 
> PCI GPUs may be broken on some other platforms, but I have found that 
> testing on an Intel PC (with PCI Express) does not reproduce the issue 
> when the PCI GPU hardware is plugged in.
> 
> There is two patches I'm requesting comments for:
> 
> ## drm/radeon: make all PCI GPUs use 32 bits DMA bit mask
> 
> https://lkml.org/lkml/2020/11/5/307
> 
> This one is not enough to fix PCI GPUs but it is enough to prevent to 
> fail r600_ring_test on ATI PCI devices. Note that Nvidia PCI GPUs 
> can't be fixed by this, and this uncovers other bug with AGP GPUs when 
> AGP is disabled at build time. Also, this patch may makes PCI GPUs 
> working on a non-optimal way on platform that accepts them with 40-bit 
> DMA bit mask (like Intel- based computers that already work without any patch).
> 
> This patch is inspired from the patch made to solve that issue from
> 2012 on kernel 3.5: https://bugzilla.redhat.com/show_bug.cgi?id=785375
> 
> At the time, such change may have been enough to fix the issue, it's 
> not true any more. More breakage may have been introduced since.
> 
> Also, maybe this patch becomes useless when other PCI bugs are fixed, 
> who knows? At least, this is an entry-point for investigations.

I think you may be seeing fallout from this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=33b3ad3788aba846fc8b9a065fe2685a0b64f713
That patch lead to screen corruption and other issues on older radeons.  It seemed to be related to AGP and/or HIMEM.  Disabling either of those fixes the issues.
I proposed reverting the change, but there was push back to find the root cause:
https://www.spinics.net/lists/stable/msg413960.html


> 
> ## Revert "drm/radeon: disable AGP by default"
> 
> https://lkml.org/lkml/2020/11/5/308
> 
> This is the simple fix but currently only solution to make AMD hosts 
> with AGP port to get a display again, as without this reverts, those 
> computers do not have any alternative to run a display (even not PCI GPUs).
> 
> I'm asking for comments on those patches. I may have reached my own 
> skill cap on kernel development anyway. I can repurpose hardware to 
> test any other patch and can contribute time for such testing. Unlike 
> AGP GPUs, PCI GPUs are hard to find, so you may appreciate the time 
> and availability offered.
> 
> The PCI GPU on AMD CPU issue was verified with both Nvidia (GS 8400GS
> rev.2) and ATI (Radeon HD 4350) PCI GPUs, such GPU sample not being 
> old cards from the previous millennial but capable
> ones: TeraScale RV710 architecture on ATI side and Tesla 1.0 NV98 on 
> Nvidia side. They can both do OpenGL 3.3 and feature both 512M of 
> VRAM. The ATI one had HDMI port, and it is known some variant of the 
> Nvidia one (not the one I own but same specification) had HDMI port too.
> 
> Also, fixing PCI GPUs may not be enough to fix AGP GPUs running as PCI 
> ones, since fixing some issues (not all) on PCI side raises new issues 
> with AGP GPUs running as PCI ones but not on native PCI GPUs (see below).
> 
> Bugs aside, one thing that is important to consider against the AGP 
> disablement is that there is such hardware that is very capable and 
> not that old out there. For example the ATI Radeon HD 4670 AGP
> (RV730 XT) was still sold brand new after 2010 and is a powerful and 
> featureful GPUs with 1GB of VRAM and HDMI port. Performance with it is 
> still pretty decent on competitive games. To compare with other
>  open source drivers mainlined in Linux, to outperform this GPU an
>  user has to get an Intel UHD 600 or an Nvidia GTX 1060 from 2016.
> 
> Also, yet another thing that is important to consider against AGP 
> disablement is that if PCI Express was introduced in 2004, there was 
> still AGP compatible hardware being designed, produced and sold very 
> lately, especially on AMD side. Computers with quad core 64-bit CPUs 
> with virtualisation, 16GB of RAM and AGPs exist, and this is widely 
> distributed consumer hardware, not specific esoteric hardware.
> 
> So, not only powerful AGP GPUs were still sold brand new in the 
> current decade, but there was also very capable computers to host 
> them. Because of those AGP computers, fixing PCI GPUs fallback is not 
> a solution because PCI fallback is not a solution.
> 

For newer AGP hardware like the RV730 you point out (or anything newer than R300), there is no reason to run AGP mode.  The on chip GART is far superior.  The only chips where performance may be a problem is the older R1xx/R2xx radeons, and the issue there is more around the size of the TLB on the on chip GART vs the TLB in the AGP bridge. Also as Christian mentioned, AGP is PCI so if PCI doesn't work, you have bigger problems.

Alex


> All that range of hardware became unusable with that commit disabling 
> AGP, without alternative.
> 
> Not only those AGP GPUs don't work with kernel's PCI fallback, but 
> unplugging those AGP GPUs and plugging physical PCI-native GPUs 
> instead does not work.
> 
> You'll find more details about the various issues on those bugs, I've 
> invested multiple full time day to test and reproduce bugs on a wide 
> range of hardware, I've attached, quoted and commented a lot of logs:
> 
> - https://bugs.launchpad.net/bugs/1899304
> > AGP disablement leaves GPUs without working alternative (PCI 
> > fallback is broken), makes very-capable ATI TeraScale GPUs unusable
> 
> - https://bugs.launchpad.net/bugs/1902981
> > AGP GPUs driven as PCI ones (when AGP is disabled at kernel build
> > time) are known to fail on K8 and K10 platforms
> 
> - https://bugs.launchpad.net/bugs/1902795
> > PCI graphics broken on AMD K8/K10/Piledriver platform (while it 
> > works on Intel) verified from Linux 4.4 to 5.10-rc2
> 
> I wish to be personally CC'ed the answers/comments posted to the list 
> in response to my posting.
> 
> Thank you for your attention.
> 
> --
> Thomas “illwieckz” Debesse

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ