lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 3 Jun 2019 13:10:13 +0000
From:   Robert Richter <rrichter@...vell.com>
To:     James Morse <james.morse@....com>
CC:     Borislav Petkov <bp@...en8.de>, Tony Luck <tony.luck@...el.com>,
        "Mauro Carvalho Chehab" <mchehab@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 11/21] EDAC, ghes: Unify trace_mc_event() code with
 edac_mc driver

On 29.05.19 16:12:38, James Morse wrote:
> Hi Robert,
> 
> On 29/05/2019 09:44, Robert Richter wrote:
> > Almost duplicate code, remove it.
> 
> almost?

The grain ... as noted below.

> 
> 
> > Note: there is a difference in the calculation of the grain_bits,
> > using the edac_mc's version here.
> 
> But is it the right thing to do?
> 
> Is this an off-by-one bug being papered over as some cleanup?
> If so could you post a separate fix that can be picked up for an rc.
> 
> Do Marvell have firmware that populates this field?
> 
> ...
> 
> Unless the argument is no one cares about this...
> 
> >From ghes_edac_report_mem_error():
> |	/* Error grain */
> |	if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> |		e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> 
> Fishy, why would the kernel page-size be relevant here?

That looked broken to me too, I did not put to much effort in fixing
the grain yet. So I just took the edac_mc version first in the
assumption, that one is working.

It looks like the intention here is to limit the grain to the page
size. But right, the calculation is wrong here. I am also going to
reply to your patch you sent on this.

> 
> If physical_addr_mask were the same as PAGE_MASK this wouldn't this always give ~0?
> (masking logic like this always does my head in)
> 
> /me gives it ago:
> | {1}[Hardware Error]:   physical_address: 0x00000000deadbeef
> | {1}[Hardware Error]:   physical_address_mask: 0xffffffffffff0000
> | {1}[Hardware Error]:   error_type: 6, master abort
> | EDAC MC0: 1 CE Master abort on unknown label ( page:0xdead offset:0xbeef
> | grain:-1 syndrome:0x0 - status(0x0000000000000001): reserved)
> 
> That 'grain:-1' is because the calculated e->grain was an unlikely 0xffffffffffffffff.
> Patch incoming, if you could test it on your platform that'd be great.
> 
> I don't think ghes_edac.c wants this '+1'.

The +1 looks odd to me also for the edac_mc driver, but I need to take
a closer look here as well as some logs suggest the grain is
calculated correctly.

I will do some further examination here and also respond to your
patch.

Thank you for review.

-Robert

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ