lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM4PR12MB5263A7ABC342C37CE6891707EE519@DM4PR12MB5263.namprd12.prod.outlook.com>
Date:   Thu, 13 May 2021 23:14:30 +0000
From:   "Joshi, Mukul" <Mukul.Joshi@....com>
To:     Borislav Petkov <bp@...en8.de>,
        Alex Deucher <alexdeucher@...il.com>
CC:     x86-ml <x86@...nel.org>,
        "Kasiviswanathan, Harish" <Harish.Kasiviswanathan@....com>,
        lkml <linux-kernel@...r.kernel.org>,
        "amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran

[AMD Official Use Only - Internal Distribution Only]



> -----Original Message-----
> From: Borislav Petkov <bp@...en8.de>
> Sent: Thursday, May 13, 2021 10:58 AM
> To: Alex Deucher <alexdeucher@...il.com>
> Cc: Joshi, Mukul <Mukul.Joshi@....com>; x86-ml <x86@...nel.org>;
> Kasiviswanathan, Harish <Harish.Kasiviswanathan@....com>; lkml <linux-
> kernel@...r.kernel.org>; amd-gfx@...ts.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran
> 
> [CAUTION: External Email]
> 
> On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote:
> > Right.  The sys admin can query the bad page count and decide when to
> > retire the card.
> 
> Yap, although the driver should actively "tell" the sysadmin when some critical
> counts of retired VRAM pages are reached because I doubt all admins would go
> look at those counts on their own.
> 
> Btw, you say "admin" - am I to understand that those are some high end GPU
> cards with ECC memory? If consumer grade stuff has this too, then the driver
> should very much warn on such levels on its own because normal users won't
> know what and where to look.
> 
> Other than that, the big picture sounds good to me.
> 

Since now you are OK with how page retirement works, lets revisit the original 
question.

Are you OK with a new MCE priority (MCE_PRIO_ACCEL) or do you want us to use
something else?

Thanks,
Mukul

> Thx.
> 
> --
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.
> kernel.org%2Ftglx%2Fnotes-about-
> netiquette&amp;data=04%7C01%7CMukul.Joshi%40amd.com%7C50588f11ed5
> 3456b03e008d9161f765c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0
> %7C637565146658376385%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata
> =Es0FMDNzNEKgxvFiqe1kOo9aEPK6%2BOXrhI5aWs3QH9Q%3D&amp;reserved=
> 0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ