linux-kernel - [PATCH] e1000e: prevent concurrent access to NVRAM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0810030041060.5549@apollo.tec.linutronix.de>
Date:	Fri, 3 Oct 2008 01:42:14 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Olaf Kirch <okir@...e.de>, Jiri Kosina <jkosina@...e.cz>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-netdev@...r.kernel.org, kkeil@...e.de, agospoda@...hat.com,
	arjan@...ux.intel.com, david.graham@...el.com,
	bruce.w.allan@...el.com, john.ronciak@...el.com,
	chris.jones@...onical.com, tim.gardner@...el.com,
	airlied@...il.com, Ingo Molnar <mingo@...e.hu>
Subject: [PATCH] e1000e: prevent concurrent access to NVRAM

On Thu, 2 Oct 2008, Olaf Kirch wrote:

> On Thursday 02 October 2008 16:28:42 Jiri Kosina wrote:
> >
> > 15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
> > 15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
> > 15:50:52 linux-pr0e kernel: Call Trace:
> > 15:50:52 linux-pr0e kernel:  [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
> > 15:50:52 linux-pr0e kernel:  [<ffffffff80493716>] dump_stack+0x69/0x6f
> > 15:50:52 linux-pr0e kernel:  [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
> > 15:50:52 linux-pr0e kernel:  [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
> > 15:50:52 linux-pr0e kernel:  [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
> > 15:50:52 linux-pr0e kernel:  [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
> > 15:50:52 linux-pr0e kernel:  [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
> > 15:50:52 linux-pr0e kernel:  [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
> > 15:50:52 linux-pr0e kernel:  [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
> > 15:50:52 linux-pr0e kernel:  [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
> > 15:50:52 linux-pr0e kernel:  [<ffffffff80251fe5>] kthread+0x47/0x73
> > 15:50:52 linux-pr0e kernel:  [<ffffffff8020d7a9>] child_rip+0xa/0x11
> 
> Looks like the e1000 watchdog racing with some dhclient activity (upping the interface).
> 
> I just noticed that the driver actually uses register pages. So it looks like it's
> possible to have something like this without the mutex:
> 
> 	process A selects page A
> 	process B selects page B
> 	process A writes to register at offset A'
> 
> So we may end up writing to the wrong register. I think I heard Vojtech mention
> that the e1000e also has a register based interface to erase/rewrite the NVM
> programmatically. Do we know at which offsets these registers live?

Linus,

can you please apply the patch below which prevents the concurrent
access to the NVRAM. It triggered the trace which Olaf reported above.

I worked out that patch on Sept 24th and it triggered a couple of
problems in the e1000e code right away. These have been addressed by
Jesse and are part of the patch series he posted in the last days.

Still, all we have in mainline is some "hopefully bug preventing"
patch which does not address the root cause at all.

The confirmed bugs where the nvram acquire code was called
concurrently are still in your tree and the prevention patch along
with the resulting bugfixes are stuck in some obscure intel QA
process.

Please apply at least the bug prevention patch below.

Thanks,

	tglx
---
Subject: e1000e prevent concurrent access and debug contention on NVM SWFALG
Date: Wed, 24 Sep 2008 00:45:08 -0700
From: Thomas Gleixner <tglx@...utronix.de>

Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Cc: jesse.brandeburg@...el.com
---

 drivers/net/e1000e/ich8lan.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: linux-2.6/drivers/net/e1000e/ich8lan.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000e/ich8lan.c
+++ linux-2.6/drivers/net/e1000e/ich8lan.c
@@ -382,6 +382,9 @@ static s32 e1000_get_variants_ich8lan(st
 	return 0;
 }
 
+static DEFINE_MUTEX(nvm_mutex);
+static pid_t nvm_owner = -1;
+
 /**
  *  e1000_acquire_swflag_ich8lan - Acquire software control flag
  *  @hw: pointer to the HW structure
@@ -395,6 +398,15 @@ static s32 e1000_acquire_swflag_ich8lan(
 	u32 extcnf_ctrl;
 	u32 timeout = PHY_CFG_TIMEOUT;
 
+	WARN_ON(preempt_count());
+
+	if (!mutex_trylock(&nvm_mutex)) {
+		WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
+		     nvm_owner);
+		mutex_lock(&nvm_mutex);
+	}
+	nvm_owner = current->pid;
+
 	while (timeout) {
 		extcnf_ctrl = er32(EXTCNF_CTRL);
 		extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
@@ -409,6 +421,8 @@ static s32 e1000_acquire_swflag_ich8lan(
 
 	if (!timeout) {
 		hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
+		nvm_owner = -1;
+		mutex_unlock(&nvm_mutex);
 		return -E1000_ERR_CONFIG;
 	}
 
@@ -430,6 +444,9 @@ static void e1000_release_swflag_ich8lan
 	extcnf_ctrl = er32(EXTCNF_CTRL);
 	extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
 	ew32(EXTCNF_CTRL, extcnf_ctrl);
+
+	nvm_owner = -1;
+	mutex_unlock(&nvm_mutex);
 }
 
 /**



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/