[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090417131730.GN14687@one.firstfloor.org>
Date: Fri, 17 Apr 2009 15:17:30 +0200
From: Andi Kleen <andi@...stfloor.org>
To: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Cc: Andi Kleen <andi@...stfloor.org>, hpa@...or.com,
linux-kernel@...r.kernel.org, mingo@...e.hu, tglx@...utronix.de
Subject: Re: [PATCH] [28/28] x86: MCE: Implement new status bits
On Fri, Apr 17, 2009 at 08:24:23PM +0900, Hidetoshi Seto wrote:
Note. I have some fixes on my own for this one already. I wrote
some new validation tools for the grader which detected some problems.
> Andi Kleen wrote:
> > static struct severity {
> > u64 mask;
> > u64 result;
> > unsigned char sev;
> > unsigned char mcgmask;
> > unsigned char mcgres;
> > + unsigned char ser;
> > + unsigned char context;
> > char *msg;
> > } severities[] = {
> > +#define KERNEL .context = IN_KERNEL
> > +#define USER .context = IN_USER
> > +#define SER .ser = 1
> > +#define NOSER .ser = -1
>
> ser is unsigned or signed?
We only really use it as a abstract flag that is only compared for
equality so it doesn't matter. I can change it to 2, or better define
another enum.
>
> > int mce_severity(struct mce *a, int tolerant, char **msg)
> > {
> > struct severity *s;
> > @@ -51,11 +101,14 @@
> > continue;
> > if ((a->mcgstatus & s->mcgmask) != s->mcgres)
> > continue;
> > - if (s->sev > MCE_NO_SEVERITY && (a->status & MCI_STATUS_UC) &&
> > - tolerant < 1)
> > - return MCE_PANIC_SEVERITY;
> > + if ((s->ser == 1 && !mce_ser) || (s->ser == -1 && mce_ser))
> > + continue;
> > + if (s->context && error_context(a) != s->context)
> > + continue;
> > if (msg)
> > *msg = s->msg;
> > + if (s->context == IN_KERNEL && panic_on_oops)
> > + return MCE_PANIC_SEVERITY;
> > return s->sev;
> > }
> > }
>
> Where did you throw away the statements for "tolerant < 1"?
You mean why?
It didn't really fit into the new status bits and didn't improve
behaviour with recovery. I had originally
planned to fit it in, but after trying hard I gave up on that.
it only has its old meaning now, which means whether to risk
do_exit in kernel context (slight risk of deadlock) or not.
This has the advantage that it doesn't change behaviour
(although at least without mca recovery it didn't really matter
because you tended to always panic anyways)
-Andi
--
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists