lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Aug 2015 10:59:23 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	"Zhang, Jonathan Zhixiong" <zjzhang@...eaurora.org>
Cc:	Will Deacon <will.deacon@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H . Peter Anvin" <hpa@...or.com>,
	"linux-kernel @ vger . kernel . org" <linux-kernel@...r.kernel.org>,
	"linux-efi @ vger . kernel . org" <linux-efi@...r.kernel.org>,
	Matt Fleming <matt.fleming@...el.com>,
	Borislav Petkov <bp@...e.de>,
	Ard Biesheuvel <ard.biesheuvel@...aro.org>,
	Catalin Marinas <Catalin.Marinas@....com>,
	Matt Fleming <matt@...eblueprint.co.uk>
Subject: Re: [PATCH 2/2] acpi, apei: use appropriate pgprot_t to map GHES
 memory


* Zhang, Jonathan Zhixiong <zjzhang@...eaurora.org> wrote:

> 
> 
> On 8/22/2015 2:24 AM, Ingo Molnar wrote:
> >
> >* Jonathan (Zhixiong) Zhang <zjzhang@...eaurora.org> wrote:
> >
> >>From: "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>
> >>
> >>With ACPI APEI firmware first handling, generic hardware error
> >>record is updated by firmware in GHES memory region. On an arm64
> >>platform, firmware updates GHES memory region with uncached
> >>access attribute, and then Linux reads stale data from cache.
> >
> >This paragraph *still* doesn't parse for me. It's not any English
> >I can recognize: what is a 'With ACPI APEI firmware first handling'?
> APEI is ACPI Platform Error Interface; it is part of ACPI spec,
> defining the aspect of hardware error handling. "firmware first
> handling" is a terminology used in APEI. It describes such mechanism
> that when hardware error happens, firmware intersects/handles such
> hardware error, formulates hardware error record and writes the record
> to GHES memory region, notifies the kernel through NMI/interrupt, then
> the kernel GHES driver grabs the error record from the GHES memory
> region.

Argh. So how about translating that to English and putting that misnomer into 
scare quotes, and saying something like:

  If the ACPI APEI firmware handles the error first (called "firmware first 
  handling"), the generic hardware error record is updated by the firmware in the 
  GHES memory region.

( Also note all the missing articles I added for readability. The rest of the 
  changelog is missing articles as well. )

> > ... plus what this changelog still doesn't mention is the most important part 
> > of any bug fix description: how does the user notice this in practice and why 
> > does he care?
>
> The changelog mentioned that Linux would read stale data from cache. When stale 
> data is read, kernel reports there is no new hardware error when there actually 
> is.

Note that this is the most valuable sentence so far, in this whole changelog and 
discussion. And we needed how many emails to get to this point?

obviously saying 'stale data' in itself does not mean much - it could mean a 
harmless inconsistency nobody really cares about, or in fact it could mean 
something more serious:

> [...] This may lead to further damage in various scenarios, such as error 
> propagation caused data corruption.

Please outline this better. How users are affected in practice is far more 
important than any other detail.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ