lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Aug 2023 12:24:11 -0400
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     yazen.ghannam@....com,
        "Limonciello, Mario" <mario.limonciello@....com>,
        linux-edac@...r.kernel.org, hdegoede@...hat.com,
        markgross@...nel.org, platform-driver-x86@...r.kernel.org,
        "Luck, Tony" <tony.luck@...el.com>, linux-kernel@...r.kernel.org,
        avadhut.naik@....com
Subject: Re: [PATCH 1/2] platform/x86/amd: Introduce AMD Address Translation
 Library

On 8/8/2023 11:58 AM, Borislav Petkov wrote:
> On Tue, Aug 08, 2023 at 11:18:07AM -0400, Yazen Ghannam wrote:
>> I think it would be better to avoid dependencies between independent things.
> 
> If they really are independent then I guess. Not that it all ends up in
> a twisty dependency where you wish you should've merged the two
> together. So think about all deps before you design this - it needs to
> handle all cases without hackery.
> 

I agree. My goal with this is to avoid the hackery. :)

I guess we'll see how it goes...

>> For example, amd_smn_read() is mostly used in amd64_edac. EDAC was the
>> original user of SMN accesses, and all the SMN stuff could have been
>> included in EDAC. However, SMN is not specifically for EDAC, so it was added
>> to amd_nb.c to be commonly available. Currently, SMN accesses are done in
>> other modules. I don't think it would have been a good idea to force other
>> modules or subsystems to require EDAC to be used.
> 
> What does that have to do with this? SMN access is generic and should be
> in amd_nb.c as it is needed by other stuff. EDAC, RAS, whatever are all
> users of that thing.
> 

Right, but that's my point. The translation is "generic" and tied to the 
Data Fabric which is the central "thing" in an modern AMD SoC. Anything 
that needs memory address translation for the fabric will use it.

>> This is my reasoning for a separate, independent module for the translation.
>> EDAC is the first user of this. But there will be future code that can
>> leverage this, like CXL, and even the MCE subsystem. And, yes, mce_amd may
>> be already loaded, but this isn't a given. A person may want MCE and CXL
>> support without wanting to use EDAC.
> 
> Is that a real use case or just a hypothetical thing?
> 

Real. There are actually two use cases for CXL. First is memory error 
reporting and page offline which is analogous to MCA. Second is general 
memory online/offline support which will be used for hot-plug/hot-swap 
cases.

In actuality, both CXL cases need the same functionality. Take an AMD 
Data Fabric "normalized" address and translate it to a system physical 
address for the OS to take action.

>> Furthermore, some things using the translation will be built-in, so the
>> translation module will need to be built-in.
> 
> This sounds weird.
> 

Yes, and this is my intention for using the "imply" Kconfig thing you 
see in the second patch. Any config options that need this code will 
"imply" it. And the default option for this code will take the strictest 
setting of all the options that "imply" it.

>> I agree. And I don't think much of the existing things in EDAC should be
>> moved out. But this is new code, so there's an opportunity to have it in a
>> more appropriate place.
>>
>> And, thinking on it more, this could be another example for future "common
>> RAS" functionality. Isn't that why the CEC is in drivers/ras?
> 
> It is there because it doesn't need EDAC at all. If your translation
> doesn't need EDAC and EDAC is going to be only a user of it, then good.
>

Yep, that's the intent.

> But if you're going to have to need the MCA error decoded by EDAC and
> then the error translation done by this thing, then you'd need to
> synchronize between the two. I'm not saying it is impossible - it should
> be well thought out first though before you go coding.
>

Yes, I agree. That's the reason for this code to take a raw struct mce, 
and do the translation independent of EDAC, etc. (for the MCA case).

The same will be the goal for CXL. This code will take a raw struct 
pci_dev, or whatever, and do the translation independent of other CXL code.

>> It seems like things go into EDAC because it's thought of as the de
>> facto RAS location.  But why have something in EDAC if it doesn't
>> provide EDAC functionality?  Other RAS things, like AER, APEI, etc.,
>> don't live in EDAC.
> 
> AER is part of PCI so we haven't considereed tying it into EDAC. And
> there wasn't any desire to do so.
> 
> As to APEI, there's ghes_edac...
> 

True. Though the intent for ghes_edac is for it to provide the EDAC 
functionality.

The other features that don't need EDAC functionality aren't in EDAC.


Going back to the dependency concerns... there are a lot of 
inter-dependencies between RAS code in various subsystems. Maybe some of 
these can be streamlined by moving common things to drivers/ras? But I 
guess this can be a discussion for a later time...

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ