linux-kernel - Re: [PATCH 17/63] edac_mce: Add an interface driver to report mce errors via edac

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090925135626.GA8145@aftab>
Date:	Fri, 25 Sep 2009 15:56:26 +0200
From:	Borislav Petkov <borislav.petkov@....com>
To:	Mauro Carvalho Chehab <mchehab@...hat.com>
CC:	bluesmoke-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH 17/63] edac_mce: Add an interface driver to report mce
 errors via edac

Hi,

On Fri, Sep 25, 2009 at 09:11:30AM -0300, Mauro Carvalho Chehab wrote:
> > >  		entry = rcu_dereference(mcelog.next);
> > >  		for (;;) {
> > >  			/*
> > > +			 * If edac_mce is enabled, it will check the error type
> > > +			 * and will process it, if it is a known error.
> > > +			 * Otherwise, the error will be sent through mcelog
> > > +			 * interface
> > > +			 */
> > > +			if (edac_mce_parse(mce))
> > > +				return;
> > 
> > for the third time (!): this may run in NMI context and as such does not
> > obey to normal kernel locking rules and you cannot safely use almost any
> > kernel resources involving locking. This way, your hook calls into a
> > module, which is a very bad idea. Please remove that hook and put in the
> > polling routine or somewhere more appropriate.
> 
> I had answered you already, but let me give a more complete explanation.
> 
> For sure all the code called at this point should be carefully analyzed. So,
> let's see the complete implementation:
> 
> 1) edac_mce is not a module (see patch 18). So, just calling a routine on
> edac_mce should be safe, even at NMI;

no, I mean the ->check_error member - it could call into a module if
i7core_edac is compiled as such.

<snip the obvious non-registered module case>

> 3) i7core_edac will only start handling mce events after being loaded on memory
> and registered on edac_mce. If an error occurs before it, normal mce handling
> will happen;
> 
> 4) after registered, edac_mce will call this hook, at i7core_edac:
> 
> static int i7core_mce_check_error(void *priv, struct mce *mce)
> {
> 	struct mem_ctl_info *mci = priv;
> 	struct i7core_pvt *pvt = mci->pvt_info;
> 	unsigned long flags;
> 
> 	/*
> 	 * Just let mcelog handle it if the error is
> 	 * outside the memory controller
> 	 */
> 	if (((mce->status & 0xffff) >> 7) != 1)
> 		return 0;
> 
> 	/* Bank 8 registers are the only ones that we know how to handle */
> 	if (mce->bank != 8)
> 		return 0;
> 
> 	/* Only handle if it is the right mc controller */
> 	if (cpu_data(mce->cpu).phys_proc_id != pvt->i7core_dev->socket) {
> 		debugf0("mc%d: ignoring mce log for socket %d. "
> 			"Another mc should get it.\n",
> 			pvt->i7core_dev->socket,
> 			cpu_data(mce->cpu).phys_proc_id);
> 			return 0;
> 	}

One problem here is the debug call which is a printk() and you may
deadlock while doing a printk in an NMI context. That's why you add MCEs
to the lockless buffer in mce_log and decode them later - otherwise you
could just as well printk them here.

Generally, you need to keep the NMI handlers as short as possible and
postpone the parsing of the MCEs for later.

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/