[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4F1993AF.1020303@majjas.com>
Date: Fri, 20 Jan 2012 11:17:51 -0500
From: Michael Breuer <mbreuer@...jas.com>
To: Stephen Hemminger <shemminger@...tta.com>
Cc: Jarek Poplawski <jarkao2@...il.com>,
David Miller <davem@...emloft.net>,
Stephen Hemminger <shemminger@...ux-foundation.org>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good
3.0.9)
On 1/20/2012 11:10 AM, Stephen Hemminger wrote:
> On Fri, 20 Jan 2012 09:24:38 -0500
> Michael Breuer<mbreuer@...jas.com> wrote:
>
>> On 1/16/2012 11:39 AM, Michael Breuer wrote:
>>> Synopsis:
>>>
>>> Receiving DMAR and other errors after approximately three days of
>>> uptime. The symptoms exactly match errors seen and then fixed around
>>> 2.6.32.4.
>>>
>>> While the system remains unaffected for too long to do a bisect, I was
>>> able to confirm that the problem exists in the 3.1 stable branch (I
>>> jumped from 3.0 to 3.2 when 3.2. was released).
>>>
>>> For now I reverted to the sky2.c from 3.0.9 and am running the rest of
>>> the kernel from 3.1.2, but won't be certain that this works until
>>> later in the week.
>>>
>>> Note that 20 seconds prior to the log extract below were DHCP renewal
>>> attempts on eth1, the issue below was on eth0. Not sure it's relevant,
>>> however back in 2010 a preceding DHCP event did turn out to be
>>> relevant to the manifestation of the bug.
>>>
>>> The 3.2.1-dirty I'm running is from git with a single local patch -
>>> for sidewinder force-feedback support (shouldn't be relevant to the
>>> sky2 issue).
>>>
>>> Log extract:
>>>
>>> Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault
>>> status reg 2
>>> Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error
>>> interrupt status=0x80000000
>>> Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request
>>> device [06:00.0] fault addr fff78000
>>> Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06]
>>> PTE Read access is not set
>>> Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI
>>> hardware error (0x2010)
>>> Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to
>>> 10.240.184.29 port 67
>>> Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here
>>> ]------------
>>> Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at
>>> net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
>>> Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System
>>> Product Name
>>> Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0
>>> (sky2): transmit queue 0 timed out
>>> Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp
>>> cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns
>>> nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables
>>> iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun
>>> bridge stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp
>>> nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP
>>> w83627ehf xt_mark xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc
>>> zlib_deflate libcrc32c snd_hda_codec_analog snd_ens1371 gameport
>>> snd_hda_intel snd_rawmidi snd_ac97_codec snd_hda_codec snd_hwdep
>>> ac97_bus snd_seq snd_seq_device snd_pcm gspca_spca505 snd_timer
>>> gspca_main snd videodev media soundcore i2c_i801 iTCO_wdt microcode
>>> v4l2_compat_ioctl32 snd_page_alloc i7core_edac sky2 edac_core pcspkr
>>> iTCO_vendor_support virtio_net virtio virtio_ring kvm_intel kvm uinput
>>> ipv6 raid456 async_raid6_recov async_pq raid6_pq async_xor
>>> firewire_ohci firewire_core pata_acpi ata_generic xor async_memcpy
>>> async_tx crc_itu_t pata_marvell nouveau ttm d
>>> Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core
>>> mxm_wmi video [last unloaded: nf_conntrack_broadcast]
>>> Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0
>>> Tainted: G W 3.2.1-dirty #1
>>> Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
>>> Jan 16 05:50:01 mail kernel: [198246.288491]<IRQ>
>>> [<ffffffff81050a4f>] warn_slowpath_common+0x7f/0xc0
>>> Jan 16 05:50:01 mail kernel: [198246.288501] [<ffffffff8101f0bd>] ?
>>> lapic_next_event+0x1d/0x30
>>> Jan 16 05:50:01 mail kernel: [198246.288504] [<ffffffff81050b46>]
>>> warn_slowpath_fmt+0x46/0x50
>>> Jan 16 05:50:01 mail kernel: [198246.288509] [<ffffffff81009319>] ?
>>> read_tsc+0x9/0x20
>>> Jan 16 05:50:01 mail kernel: [198246.288513] [<ffffffff814a81e7>]
>>> dev_watchdog+0x247/0x250
>>> Jan 16 05:50:01 mail kernel: [198246.288518] [<ffffffff8105fbbb>]
>>> run_timer_softirq+0x12b/0x3b0
>>> Jan 16 05:50:01 mail kernel: [198246.288521] [<ffffffff814a7fa0>] ?
>>> qdisc_reset+0x50/0x50
>>> Jan 16 05:50:01 mail kernel: [198246.288525] [<ffffffff81057d18>]
>>> __do_softirq+0xa8/0x210
>>> Jan 16 05:50:01 mail kernel: [198246.288529] [<ffffffff8157496c>]
>>> call_softirq+0x1c/0x30
>>> Jan 16 05:50:01 mail kernel: [198246.288533] [<ffffffff810041e5>]
>>> do_softirq+0x65/0xa0
>>> Jan 16 05:50:01 mail kernel: [198246.288536] [<ffffffff810580fe>]
>>> irq_exit+0x8e/0xb0
>>> Jan 16 05:50:01 mail kernel: [198246.288539] [<ffffffff815750a3>]
>>> do_IRQ+0x63/0xe0
>>> Jan 16 05:50:01 mail kernel: [198246.288543] [<ffffffff8156ad2e>]
>>> common_interrupt+0x6e/0x6e
>>> Jan 16 05:50:01 mail kernel: [198246.288545]<EOI>
>>> [<ffffffff81307b6d>] ? intel_idle+0xed/0x150
>>> Jan 16 05:50:01 mail kernel: [198246.288551] [<ffffffff81307b4f>] ?
>>> intel_idle+0xcf/0x150
>>> Jan 16 05:50:01 mail kernel: [198246.288555] [<ffffffff8144d331>]
>>> cpuidle_idle_call+0xc1/0x280
>>> Jan 16 05:50:01 mail kernel: [198246.288559] [<ffffffff8100122a>]
>>> cpu_idle+0xca/0x120
>>> Jan 16 05:50:01 mail kernel: [198246.288563] [<ffffffff8154741e>]
>>> rest_init+0x72/0x74
>>> Jan 16 05:50:01 mail kernel: [198246.288568] [<ffffffff81b6abdd>]
>>> start_kernel+0x3b5/0x3c0
>>> Jan 16 05:50:01 mail kernel: [198246.288572] [<ffffffff81b6a32b>]
>>> x86_64_start_reservations+0x132/0x136
>>> Jan 16 05:50:01 mail kernel: [198246.288576] [<ffffffff81b6a140>] ?
>>> early_idt_handlers+0x140/0x140
>>> Jan 16 05:50:01 mail kernel: [198246.288580] [<ffffffff81b6a431>]
>>> x86_64_start_kernel+0x102/0x111
>>> Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace
>>> bb26011d21a2b1d7 ]---
>>> Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0:
>>> tx timeout
>>> Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0:
>>> transmit ring 115 .. 10 report=115 done=115
>>>
>>>
>>>
>> FYI - I've been up for four days now without issues running on 3.2.1 +
>> sky2.c from 3.0.9. Looks like the issue is in fact in one of the
>> modifications made in sky2.c between those two releases.
> Since only you seem to be able to reproduce it, most likely the
> bisect burden will be on you. If you know it is only one file,
> then bisecting that file is fairly quick.
>
As of now, I have no reliable way to reproduce... so this is likely to
take about 3-4 days per bisect run... more if it doesn't fail.
If there are suggestions as to diagnostic code to put in; or specific
bias towards one version or another that may reduce the time significantly.
I've also got some windows where I have to leave a stable version up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists