linux-kernel - Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4096ae55-62bb-4705-94dc-ccf90ee64988@amd.com>
Date: Wed, 14 Feb 2024 10:26:41 -0500
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: yazen.ghannam@....com, tony.luck@...el.com, linux-edac@...r.kernel.org,
 linux-kernel@...r.kernel.org, avadhut.naik@....com, john.allen@....com,
 muralidhara.mk@....com, naveenkrishna.chatradhi@....com,
 sathyapriya.k@....com
Subject: Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

On 2/14/2024 6:36 AM, Borislav Petkov wrote:
> On Tue, Feb 13, 2024 at 09:35:16PM -0600, Yazen Ghannam wrote:
>> +static void update_fru_record(struct fru_rec *rec, struct mce *m)
>> +{
>> +	struct cper_sec_fru_mem_poison *fmp = get_fmp(rec);
>> +	struct cper_fru_poison_desc fpd;
>> +	u32 entry = 0;
>> +
>> +	mutex_lock(&fmpm_update_mutex);
>> +
>> +	init_fpd(&fpd, m);
>> +
>> +	/* This is the first entry, so just save it. */
>> +	if (!has_valid_entries(fmp->validation_bits))
>> +		goto save_fpd;
> 
> Not needed - if it is the first entry, it'll get saved there.
>

Get saved where?

For brand new records, the module will allocate them with the headers
and no descriptor entries (empty list).
  
>> +	/* Ignore already recorded errors. */
>> +	if (is_dup_fpd(rec, &fpd))
>> +		goto out_unlock;
>> +
>> +	if (fmp->nr_entries >= max_nr_entries) {
>> +		pr_warn("Exceeded number of entries for FRU 0x%016llx", fmp->fru_id);
>> +		goto out_unlock;
>> +	}
>> +
>> +	entry = fmp->nr_entries;
> 
> ...
> 
>> +static void retire_dram_row(u64 addr, u64 id, u32 cpu)
>> +{
>> +	struct atl_err a_err;
> 
> Yap, exactly, this should use atl_err and not struct mce.
>

Yes, tried to do *some* things generic.
  
>> +
>> +	memset(&a_err, 0, sizeof(struct atl_err));
>> +
>> +	a_err.addr = addr;
>> +	a_err.ipid = id;
>> +	a_err.cpu  = cpu;
>> +
>> +	amd_retire_dram_row(&a_err);
>> +}
>> +
>> +static int fru_mem_poison_handler(struct notifier_block *nb, unsigned long val, void *data)
>> +{
>> +	struct mce *m = (struct mce *)data;
>> +	struct fru_rec *rec;
>> +
>> +	if (!mce_is_memory_error(m))
>> +		return NOTIFY_DONE;
>> +
>> +	retire_dram_row(m->addr, m->ipid, m->extcpu);
>> +
>> +	/*
>> +	 * This should not happen on real errors. But it could happen from
> 
> What exactly is "This" here?
>

Ah right. The module should have created, or restored, a record for each FRU
in the system during module init. So the runtime handler should always find
a valid record for a FRU. The only exception I could think of, besides bugs,
is if the user does software error injection and a valid FRU ID doesn't get
set.
  
>> +	 * software error injection, etc.
>> +	 */
>> +	rec = get_fru_record(m->ppin);
>> +	if (!rec)
>> +		return NOTIFY_DONE;
>> +
>> +	update_fru_record(rec, m);
>> +
>> +	return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block fru_mem_poison_nb = {
>> +	.notifier_call  = fru_mem_poison_handler,
>> +	.priority	= MCE_PRIO_LOWEST,
>> +};
>> +
>> +static u32 get_cpu_from_fru_id(u64 fru_id)
> 
> Fold into the single callsite.
>

Ack.
  
>> +{
>> +	unsigned int cpu = 0;
>> +
>> +	/* Should there be more robust error handling if none found? */
>> +	for_each_online_cpu(cpu) {
>> +		if (topology_ppin(cpu) == fru_id)
>> +			break;
>> +	}
>> +
>> +	return cpu;
>> +}
>> +
>> +static void retire_mem_fmp(struct fru_rec *rec, u32 nr_entries, u32 cpu)
>> +{
>> +	struct cper_fru_poison_desc *fpd;
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < nr_entries; i++) {
>> +		fpd = get_fpd(rec, i);
>> +
>> +		if (fpd->hw_id_type != FPD_HW_ID_TYPE_MCA_IPID)
>> +			continue;
>> +
>> +		if (fpd->addr_type != FPD_ADDR_TYPE_MCA_ADDR)
>> +			continue;
>> +
>> +		retire_dram_row(fpd->addr, fpd->hw_id, cpu);
>> +	}
>> +}
>> +
>> +static void retire_mem_records(void)
>> +{
>> +	struct cper_sec_fru_mem_poison *fmp;
>> +	struct fru_rec *rec;
>> +	unsigned int i;
>> +	u32 cpu;
>> +
>> +	for_each_fru(i, rec) {
>> +		fmp = get_fmp(rec);
>> +
>> +		if (!has_valid_entries(fmp->validation_bits))
>> +			continue;
>> +
>> +		cpu = get_cpu_from_fru_id(fmp->fru_id);
> 
> Pass in that fmp thing into retire_dram_row() so that you can delay
> that get_cpu_from_fru_id() call until the moment you actually need it.
>

Okay.
  
>> +static int save_new_records(void)
>> +{
>> +	struct fru_rec *rec;
>> +	unsigned int i;
>> +	int ret = 0;
>> +
>> +	for_each_fru(i, rec) {
>> +		/* Skip restored records. Should these be fixed up? */
> 
> I don't understand that question.
>

I think there's a case where the record on storage can space for fewer
than the target number of entries.

For example, module wants to have space for 8 entries per record. A record
from storage has valid entries (which should be restored and memory retired),
but it only has space for 4 entries.

So should the module fix up the record from storage by adjusting its record
length and then writing it back down?
  
>> +		if (rec->hdr.record_length)
>> +			continue;
>> +
>> +		set_rec_fields(rec);
>> +
>> +		ret = update_record_on_storage(rec);
>> +		if (ret)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static bool is_valid_fmp(struct fru_rec *rec)
> 
> fmp_is_valid()
>

Ack.
  
>> +{
>> +	struct cper_sec_fru_mem_poison *fmp = get_fmp(rec);
>> +	u32 len = get_fmp_len(rec);
>> +
>> +	if (!fmp)
>> +		return false;
>> +
>> +	if (!len)
>> +		return false;
>> +
>> +	/* Checksum must sum to zero for the entire section. */
>> +	if (do_fmp_checksum(fmp, len))
>> +		return false;
>> +
>> +	if (!(fmp->validation_bits & FMP_VALID_ARCH_TYPE))
>> +		return false;
>> +
>> +	if (fmp->fru_arch_type != FMP_ARCH_TYPE_X86_CPUID_1_EAX)
>> +		return false;
>> +
>> +	if (!(fmp->validation_bits & FMP_VALID_ARCH))
>> +		return false;
>> +
>> +	if (fmp->fru_arch != cpuid_eax(1))
>> +		return false;
>> +
>> +	if (!(fmp->validation_bits & FMP_VALID_ID_TYPE))
>> +		return false;
>> +
>> +	if (fmp->fru_id_type != FMP_ID_TYPE_X86_PPIN)
>> +		return false;
>> +
>> +	if (!(fmp->validation_bits & FMP_VALID_ID))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +static void restore_record(struct fru_rec *new, struct fru_rec *old)
>> +{
>> +	/* Records larger than max_rec_len were skipped earlier. */
>> +	size_t len = min(max_rec_len, old->hdr.record_length);
>> +
>> +	memcpy(new, old, len);
>> +}
> 
> Fold into the single call site.
>

Ack.
  
>> +
>> +static bool valid_record(struct fru_rec *old)
>> +{
>> +	struct fru_rec *new;
>> +
>> +	if (!is_valid_fmp(old)) {
>> +		pr_debug("Ignoring invalid record");
>> +		return false;
>> +	}
>> +
>> +	new = get_fru_record(old->fmp.fru_id);
>> +	if (!new) {
>> +		pr_debug("Ignoring record for absent FRU");
>> +		return false;
>> +	}
>> +
>> +	/* What if ERST has duplicate FRU entries? */
>> +	restore_record(new, old);
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * Fetch saved records from persistent storage.
>> + *
>> + * For each found record:
>> + * - If it was not created by this module, then ignore it.
>> + * - If it is valid, then copy its data to the local cache.
>> + * - If it is not valid, then erase it.
>> + */
>> +static int get_saved_records(void)
>> +{
>> +	struct fru_rec *old;
>> +	u64 record_id;
>> +	int ret, pos;
>> +	ssize_t len;
>> +
>> +	/*
>> +	 * Assume saved records match current max size.
>> +	 *
>> +	 * However, this may not be true depending on module parameters.
> 
> This must work with module parameters, though. Or, as said and
> preferrably, there should not be any module parameters at all.
>

I agree though I'm not sure which course to take.

Generally, I think the code is *safe* as-is. But it isn't ideal.

If the module expects records with 8 entries, and storage has records
with 4, then we can still get the data. And the "fix up" question applies
from above.

If the module expects records with 4 entries, and storage has records
with 8, then the module will ignore the stored records. This isn't
ideal as the module misses out on possible valid information.
  
>> +	 */
>> +	old = kmalloc(max_rec_len, GFP_KERNEL);
>> +	if (!old) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	ret = erst_get_record_id_begin(&pos);
>> +	if (ret < 0)
>> +		goto out_end;
>> +
>> +	while (!erst_get_record_id_next(&pos, &record_id)) {
>> +		/*
>> +		 * Make sure to clear temporary buffer between reads to avoid
>> +		 * leftover data from records of various sizes.
>> +		 */
>> +		memset(old, 0, max_rec_len);
>> +
>> +		len = erst_read_record(record_id, &old->hdr, max_rec_len,
>> +				       sizeof(struct fru_rec), &CPER_CREATOR_FMP);
>> +
>> +		/* Should this be retried if the temporary buffer is too small? */
> 
> Only when it turns out that it is necessary.
>

Right, but how do we know? I think it is necessary given what I just wrote above.
We don't want to miss out on valid info.

>> +		if (len < 0)
>> +			continue;
>> +
>> +		if (!valid_record(old))
>> +			erst_clear(record_id);
> 
> Where is the check which ignores the record not created by this module?
> 
> Because this clears all records it deems not valid and that thing needs
> to be really careful here and be sure what exactly it clears...
>

erst_read_record() only returns records with the given creator_id. If the
record doesn't have a matching creator_id, then we'll get a (len < 0).

Thanks,
Yazen