lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 2 Oct 2008 16:28:42 +0200 (CEST)
From:	Jiri Kosina <jkosina@...e.cz>
To:	Jesse Brandeburg <jesse.brandeburg@...el.com>
cc:	linux-kernel@...r.kernel.org, linux-netdev@...r.kernel.org,
	kkeil@...e.de, agospoda@...hat.com, arjan@...ux.intel.com,
	david.graham@...el.com, bruce.w.allan@...el.com,
	john.ronciak@...el.com, Thomas Gleixner <tglx@...utronix.de>,
	chris.jones@...onical.com, tim.gardner@...el.com,
	airlied@...il.com, Thomas Gleixner <tglx@...utronix.de>,
	Olaf Kirch <okir@...e.de>
Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Mon, 29 Sep 2008, Jesse Brandeburg wrote:

> From: Thomas Gleixner <tglx@...utronix.de>
> 
> This patch adds a mutex to the e1000e driver that would help
> catch any collisions of two e1000e threads accessing hardware
> at the same time.
> 
> description and patch updated by Jesse
> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@...el.com>
> ---
> 
>  drivers/net/e1000e/ich8lan.c |   17 +++++++++++++++++
>  1 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
> index a076079..57c6d2f 100644
> --- a/drivers/net/e1000e/ich8lan.c
> +++ b/drivers/net/e1000e/ich8lan.c
> @@ -366,6 +366,9 @@ static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
>  	return 0;
>  }
>  
> +static DEFINE_MUTEX(nvm_mutex);
> +static pid_t nvm_owner = -1;
> +
>  /**
>   *  e1000_acquire_swflag_ich8lan - Acquire software control flag
>   *  @hw: pointer to the HW structure
> @@ -379,6 +382,15 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
>  	u32 extcnf_ctrl;
>  	u32 timeout = PHY_CFG_TIMEOUT;
>  
> +	WARN_ON(preempt_count());
> +
> +	if (!mutex_trylock(&nvm_mutex)) {
> +		WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
> +		     nvm_owner);
> +		mutex_lock(&nvm_mutex);
> +	}
> +	nvm_owner = current->pid;
> +
>  	while (timeout) {
>  		extcnf_ctrl = er32(EXTCNF_CTRL);
>  		extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
> @@ -393,6 +405,8 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
>  
>  	if (!timeout) {
>  		hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
> +		nvm_owner = -1;
> +		mutex_unlock(&nvm_mutex);
>  		return -E1000_ERR_CONFIG;
>  	}
>  
> @@ -414,6 +428,9 @@ static void e1000_release_swflag_ich8lan(struct e1000_hw *hw)
>  	extcnf_ctrl = er32(EXTCNF_CTRL);
>  	extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
>  	ew32(EXTCNF_CTRL, extcnf_ctrl);
> +
> +	nvm_owner = -1;
> +	mutex_unlock(&nvm_mutex);
>  }

A few minutes ago, I have actually just hit this, while debugging the 
issue on a kernel that had this patch included. 

I was not successful reproducing it yet though, but still it might be a 
pointer into direction where the real bug is.

15:49:07 linux-pr0e dhclient: Listening on LPF/eth1/00:15:58:c6:4a:ff
15:49:07 linux-pr0e dhclient: Sending on   LPF/eth1/00:15:58:c6:4a:ff
15:49:07 linux-pr0e dhclient: Sending on   Socket/fallback
15:49:07 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
15:49:10 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
15:49:18 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9
15:49:27 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9
15:49:36 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 17
15:49:53 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 12
15:50:05 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
15:50:08 linux-pr0e dhclient: No DHCPOFFERS received.
15:50:08 linux-pr0e dhclient: No working leases in persistent database - sleeping.
15:50:52 linux-pr0e kernel: ------------[ cut here ]------------
15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
15:50:52 linux-pr0e kernel: Modules linked in: af_packet i915 drm ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tulip arc4 ecb snd_hda_intel snd_pcm crypto_blkcipher rtc_cmos snd_timer ppdev iwl3945 thinkpad_acpi pcmcia uvcvideo parport_pc rtc_core snd_page_alloc video rfkill i2c_i801 mac80211 iTCO_wdt compat_ioctl32 rtc_lib yenta_socket pcspkr joydev ohci1394 snd_hwdep rsrc_nonstatic output i2c_core btusb parport battery led_class videodev ac ieee1394 v4l1_compat e1000e wmi iTCO_vendor_support pcmcia_core button snd soundcore intel_agp cfg80211 bluetooth sg sr_mod cdrom sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix ahci pata_acpi libata scsi_mod dock thermal processor
15:50:52 linux-pr0e kernel: Pid: 7, comm: events/0 Tainted: G          2.6.27-rc7-7.10-default #1
15:50:52 linux-pr0e kernel: 
15:50:52 linux-pr0e kernel: Call Trace:
15:50:52 linux-pr0e kernel:  [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
15:50:52 linux-pr0e kernel:  [<ffffffff80493716>] dump_stack+0x69/0x6f
15:50:52 linux-pr0e kernel:  [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
15:50:52 linux-pr0e kernel:  [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
15:50:52 linux-pr0e kernel:  [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
15:50:52 linux-pr0e kernel:  [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
15:50:52 linux-pr0e kernel:  [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
15:50:52 linux-pr0e kernel:  [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
15:50:52 linux-pr0e kernel:  [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
15:50:52 linux-pr0e kernel:  [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
15:50:52 linux-pr0e kernel:  [<ffffffff80251fe5>] kthread+0x47/0x73
15:50:52 linux-pr0e kernel:  [<ffffffff8020d7a9>] child_rip+0xa/0x11
15:50:52 linux-pr0e kernel: 
15:50:52 linux-pr0e kernel: ---[ end trace 6f68a3c748ede326 ]---
15:51:25 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
15:51:28 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
15:51:36 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
15:51:49 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
15:52:02 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 18
15:52:15 linux-pr0e kernel: Machine check events logged

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ