[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ab7a83c84d6398ffc089f925da89658.squirrel@www.microway.com>
Date: Tue, 23 Aug 2011 13:16:03 -0400
From: rick@...roway.com
To: "Rafael J. Wysocki" <rjw@...k.pl>
Cc: linux-kernel@...r.kernel.org,
"Richard Houghton" <rhoughton@...roway.com>,
"ACPI Devel Mailing List" <linux-acpi@...r.kernel.org>,
"Len Brown" <lenb@...nel.org>,
"Matthew Garrett" <mjg59@...f.ucam.org>
Subject: Re: kernel oops and panic in acpi_atomic_read under 2.6.39.3. call
trace included
Hi,
> Hi,
>
> On Monday, August 22, 2011, Rick Warner wrote:
> ...
>> Hi Rafael,
>>
>> Thanks for the off-list help in getting you this info.
>>
>> I had already rebuilt the kernel using the change I mentioned earlier
>> (test on
>> !&g->error_status_address) since the call trace I got.
>>
>> I luckily still had a copy of the kernel and modules I built previously
>> using
>> just your patch, so I undid my change to the ghes.c source, leaving just
>> your
>> patch but not mine so it would match the ghes.ko module I ran on. This
>> is the
>> output of gdb on that ghes.ko now:
>>
>> (gdb) l *ghes_read_estatus+0x38
>> 0x258 is in ghes_read_estatus (drivers/acpi/apei/ghes.c:296).
>> warning: Source file is more recent than executable.
>> 291 int rc;
>> 292 if (!g)
>> 293 return -EINVAL;
>> 294
>> 295 rc = acpi_atomic_read(&buf_paddr,
>> &g->error_status_address);
>> 296 if (rc) {
>> 297 if (!silent && printk_ratelimit())
>> 298 pr_warning(FW_WARN GHES_PFX
>> 299 "Failed to read error status block address for hardware error
>> source:
>> %d.\n",
>> 300 g->header.source_id);
>>
>> The warning about the source being newer is because of the reverted
>> change in
>> the ghes.c source mentioned above.
>
> OK, since &buf_addr cannot be NULL, perhaps ghes is. Please check if the
> appended patch makes a difference.
>
> Thanks,
> Rafael
>
> ---
> drivers/acpi/apei/ghes.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux/drivers/acpi/apei/ghes.c
> ===================================================================
> --- linux.orig/drivers/acpi/apei/ghes.c
> +++ linux/drivers/acpi/apei/ghes.c
> @@ -393,11 +393,16 @@ static void ghes_copy_tofrom_phys(void *
>
> static int ghes_read_estatus(struct ghes *ghes, int silent)
> {
> - struct acpi_hest_generic *g = ghes->generic;
> + struct acpi_hest_generic *g;
> u64 buf_paddr;
> u32 len;
> int rc;
>
> + if (!ghes || !ghes->generic)
> + return -EINVAL;
> +
> + g = ghes->generic;
> +
> rc = acpi_atomic_read(&buf_paddr, &g->error_status_address);
> if (rc) {
> if (!silent && printk_ratelimit())
>
Unfortunately it had another panic with this patch in place. Here is the
latest call trace:
[64614.937968] BUG: unable to handle kernel NULL pointer dereference at
(null)
[64614.945851] IP: [<ffffffff812a211d>] acpi_atomic_read+0x8d/0xcb
[64614.951817] PGD 2f8d40067 PUD 2f8cf8067 PMD 0
[64614.956346] Oops: 0000 [#1] PREEMPT SMP
[64614.960344] last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
[64614.968265] CPU 14
[64614.970203] Modules linked in: md5 nfsd lockd nfs_acl auth_rpcgss
sunrpc ipt_MASQUERADE iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables af_packet
edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq
mperf xfs dm_mod igb joydev sr_mod cdrom pcspkr sg ioatdma button iTCO_wdt
iTCO_vendor_support dca ghes hed i2c_i801 i7core_edac edac_core ext4 jbd2
crc16 raid456 async_raid6_recov async_pq raid6_pq async_xor xor
async_memcpy async_tx raid10 raid1 raid0 fan processor thermal thermal_sys
ata_generic pata_atiixp arcmsr
[64615.024806]
[64615.026305] Pid: 10723, comm: cluster Not tainted
2.6.39.3-microwaycustom #5 Supermicro X8DTH-i/6/iF/6F/X8DTH
[64615.036291] RIP: 0010:[<ffffffff812a211d>] [<ffffffff812a211d>]
acpi_atomic_read+0x8d/0xcb
[64615.044671] RSP: 0000:ffff88063fcc7da8 EFLAGS: 00010046
[64615.049994] RAX: 0000000000000000 RBX: ffff88063fcc7df0 RCX:
00000000bf7b6000
[64615.057132] RDX: 0000000000000000 RSI: 00000000bf7b6010 RDI:
00000000bf7b5ff0
[64615.064271] RBP: ffff88063fcc7dd8 R08: 00000000bf7b7000 R09:
0000000000000002
[64615.071411] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffc90003044c20
[64615.078549] R13: 0000000000000000 R14: 00000000bf7b5ff0 R15:
0000000000000000
[64615.085688] FS: 0000000000000000(0000) GS:ffff88063fcc0000(0000)
knlGS:0000000000000000
[64615.093771] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[64615.099517] CR2: 0000000000000000 CR3: 00000003015b1000 CR4:
00000000000006e0
[64615.106658] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[64615.113795] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[64615.120928] Process cluster (pid: 10723, threadinfo ffff8802fb3b6000,
task ffff880301534640)
[64615.129361] Stack:
[64615.131386] 0000000000000000 00000000bf7b5ff0 00000000ffffffea
ffff88032b1c3d40
[64615.138871] 0000000000000001 ffffc90003044ca8 ffff88063fcc7e18
ffffffffa01b7245
[64615.146354] 0000000000000000 0000000000000000 ffff88032b1c3d40
0000000000000000
[64615.153840] Call Trace:
[64615.156293] <NMI>
[64615.158442] [<ffffffffa01b7245>] ghes_read_estatus+0x55/0x180 [ghes]
[64615.164900] [<ffffffffa01b760c>] ghes_notify_nmi+0xbc/0x190 [ghes]
[64615.171182] [<ffffffff8150ddfd>] notifier_call_chain+0x4d/0x70
[64615.177116] [<ffffffff8150de63>] __atomic_notifier_call_chain+0x43/0x60
[64615.183824] [<ffffffff8150de91>] atomic_notifier_call_chain+0x11/0x20
[64615.190373] [<ffffffff8150dece>] notify_die+0x2e/0x30
[64615.195535] [<ffffffff8150b4f2>] do_nmi+0xa2/0x260
[64615.200430] [<ffffffff8150b150>] nmi+0x20/0x30
[64615.204981] [<ffffffff81029f6a>] ? native_write_msr_safe+0xa/0x10
[64615.211170] <<EOE>>
[64615.213276] <IRQ>
[64615.215609] [<ffffffff81011568>] intel_pmu_disable_all+0x38/0xb0
[64615.221710] [<ffffffff81010efa>] x86_pmu_disable+0x4a/0x50
[64615.227306] [<ffffffff810ea842>] perf_event_task_tick+0x1a2/0x2a0
[64615.233495] [<ffffffff81050750>] scheduler_tick+0x1b0/0x290
[64615.239165] [<ffffffff81066c29>] update_process_times+0x69/0x80
[64615.245193] [<ffffffff81088098>] tick_sched_timer+0x58/0x150
[64615.250956] [<ffffffff8107b7ef>] __run_hrtimer+0x6f/0x250
[64615.256459] [<ffffffff81088040>] ? tick_init_highres+0x20/0x20
[64615.262393] [<ffffffff8107bf7a>] hrtimer_interrupt+0xda/0x230
[64615.268244] [<ffffffff8101f5c6>] smp_apic_timer_interrupt+0x66/0xa0
[64615.274622] [<ffffffff815120f3>] apic_timer_interrupt+0x13/0x20
[64615.280633] <EOI>
[64615.282570] Code: fc 10 74 1f 77 08 41 80 fc 08 75 49 eb 0e 41 80 fc 20
74 17 41 80 fc 40 75 3b eb 15 8a 00 0f b6 c0 eb 11 66 8b 00 0f b7 c0 eb 09
<8b> 00 89 c0 eb 03 48 8b 00 48 89 03 e8 62 55 e2 ff eb 1d 41 0f
[64615.303108] RIP [<ffffffff812a211d>] acpi_atomic_read+0x8d/0xcb
[64615.309163] RSP <ffff88063fcc7da8>
[64615.312668] CR2: 0000000000000000
[64615.316007] ---[ end trace 3ab5dd3ba3391edf ]---
[64615.320637] Kernel panic - not syncing: Fatal exception in interrupt
[64615.326999] Pid: 10723, comm: cluster Tainted: G D
2.6.39.3-microwaycustom #5
[64615.334914] Call Trace:
[64615.337371] <NMI> [<ffffffff815071ee>] panic+0x9b/0x1b0
[64615.342837] [<ffffffff8150bb4a>] oops_end+0xea/0xf0
[64615.347828] [<ffffffff81031dc3>] no_context+0xf3/0x260
[64615.353081] [<ffffffff81032055>] __bad_area_nosemaphore+0x125/0x1e0
[64615.359456] [<ffffffff8103211e>] bad_area_nosemaphore+0xe/0x10
[64615.365389] [<ffffffff8150dd10>] do_page_fault+0x500/0x5a0
[64615.370985] [<ffffffff810eb839>] ? __perf_event_overflow+0x99/0x210
[64615.377357] [<ffffffff8150ae95>] page_fault+0x25/0x30
[64615.382516] [<ffffffff812a211d>] ? acpi_atomic_read+0x8d/0xcb
[64615.388365] [<ffffffff812a20f0>] ? acpi_atomic_read+0x60/0xcb
[64615.394224] [<ffffffffa01b7245>] ghes_read_estatus+0x55/0x180 [ghes]
[64615.400685] [<ffffffffa01b760c>] ghes_notify_nmi+0xbc/0x190 [ghes]
[64615.406959] [<ffffffff8150ddfd>] notifier_call_chain+0x4d/0x70
[64615.412887] [<ffffffff8150de63>] __atomic_notifier_call_chain+0x43/0x60
[64615.419594] [<ffffffff8150de91>] atomic_notifier_call_chain+0x11/0x20
[64615.426138] [<ffffffff8150dece>] notify_die+0x2e/0x30
[64615.431292] [<ffffffff8150b4f2>] do_nmi+0xa2/0x260
[64615.436180] [<ffffffff8150b150>] nmi+0x20/0x30
[64615.440730] [<ffffffff81029f6a>] ? native_write_msr_safe+0xa/0x10
[64615.446911] <<EOE>> <IRQ> [<ffffffff81011568>]
intel_pmu_disable_all+0x38/0xb0
[64615.454467] [<ffffffff81010efa>] x86_pmu_disable+0x4a/0x50
[64615.460050] [<ffffffff810ea842>] perf_event_task_tick+0x1a2/0x2a0
[64615.466233] [<ffffffff81050750>] scheduler_tick+0x1b0/0x290
[64615.471908] [<ffffffff81066c29>] update_process_times+0x69/0x80
[64615.477933] [<ffffffff81088098>] tick_sched_timer+0x58/0x150
[64615.483691] [<ffffffff8107b7ef>] __run_hrtimer+0x6f/0x250
[64615.489202] [<ffffffff81088040>] ? tick_init_highres+0x20/0x20
[64615.495138] [<ffffffff8107bf7a>] hrtimer_interrupt+0xda/0x230
[64615.500989] [<ffffffff8101f5c6>] smp_apic_timer_interrupt+0x66/0xa0
[64615.507362] [<ffffffff815120f3>] apic_timer_interrupt+0x13/0x20
[64615.513375] <EOI>
What should I try next?
Thanks,
Rick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists