[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YJ0+YbwSpxTrghpo@zn.tnic>
Date: Thu, 13 May 2021 16:57:37 +0200
From: Borislav Petkov <bp@...en8.de>
To: Alex Deucher <alexdeucher@...il.com>
Cc: "Joshi, Mukul" <Mukul.Joshi@....com>, x86-ml <x86@...nel.org>,
"Kasiviswanathan, Harish" <Harish.Kasiviswanathan@....com>,
lkml <linux-kernel@...r.kernel.org>,
"amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran
On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote:
> Right. The sys admin can query the bad page count and decide when to
> retire the card.
Yap, although the driver should actively "tell" the sysadmin when some
critical counts of retired VRAM pages are reached because I doubt all
admins would go look at those counts on their own.
Btw, you say "admin" - am I to understand that those are some high end
GPU cards with ECC memory? If consumer grade stuff has this too, then
the driver should very much warn on such levels on its own because
normal users won't know what and where to look.
Other than that, the big picture sounds good to me.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists