lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFQmdRb9vsWyF06jppS5U7Wzuc+SzRgWL+hs5+es-GC=5e_8qg@mail.gmail.com>
Date:	Wed, 9 Jul 2014 16:00:04 -0700
From:	Havard Skinnemoen <hskinnemoen@...gle.com>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	Borislav Petkov <bp@...en8.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ewout van Bekkum <ewout@...gle.com>
Subject: Re: [PATCH 5/6] x86-mce: check if no_way_out applies before deciding
 not to clear MCE banks.

On Wed, Jul 9, 2014 at 2:00 PM, Luck, Tony <tony.luck@...el.com> wrote:
> +       if (!(no_way_out && cfg->tolerant < 3))
>                 mce_clear_state(toclear);
>
> Style - I think this is easier to grok:
>
>         if (!no_way_out || cfg->tolerant >=3)
>                 mce_clear_state(toclear);
>
> but not too strongly if other like !(a && b) form.

I tend to agree with you. It came up during our internal review, and
others argued the other way. But since I'm in charge now, I'll change
it back ;-)

> I'm never sure how to treat the crazy levels of "tolerant" though.  Do
> we really want to clear the banks?  In one sense we do ... we are still
> running and might see more UC errors. Since newer UC errors don't
> overwrite older ones, clearing the banks allows us to see how many
> errors are piling up and being ignored.
>
> But running with tolerant==3 is likely to end in tears ... should we erase
> the evidence on what bad things happened?

It probably doesn't make a huge difference since you're not supposed
to run with tolerant=3, but I kind of understood the logic to be that
if we're going to keep running, we need to clear the banks, and if
we're going to crash, we need to leave them intact so whatever runs
next gets a chance to look at them. So with tolerant==3, we are going
to continue running, and I think for debugging purposes, it's useful
to see how many additional errors are happening.

Havard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ