linux-kernel - Re: Panic when cpu hot-remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20151109202149.GA28496@roeck-us.net>
Date:	Mon, 9 Nov 2015 12:21:49 -0800
From:	Guenter Roeck <linux@...ck-us.net>
To:	Jiang Liu <jiang.liu@...ux.intel.com>
Cc:	fandongdong <fandd@...pur.com>,
	Alex Williamson <alex.williamson@...hat.com>,
	Joerg Roedeljoro <joro@...tes.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Panic when cpu hot-remove

Gerry,

On Thu, Jun 25, 2015 at 04:11:36PM +0800, Jiang Liu wrote:
> On 2015/6/18 15:54, fandongdong wrote:
> > 
> > 
> > 在 2015/6/18 15:27, fandongdong 写道:
> >>
> >>
> >> 在 2015/6/18 13:40, Jiang Liu 写道:
> >>> On 2015/6/17 22:36, Alex Williamson wrote:
> >>>> On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:
> >>>>> On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote:
> >>>>>> Hi maintainer,
> >>>>>>
> >>>>>> We found a problem that a panic happen when cpu was hot-removed.
> >>>>>> We also trace the problem according to the calltrace information.
> >>>>>> An endless loop happen because value head is not equal to value
> >>>>>> tail forever in the function qi_check_fault( ).
> >>>>>> The location code is as follows:
> >>>>>>
> >>>>>>
> >>>>>> do {
> >>>>>>          if (qi->desc_status[head] == QI_IN_USE)
> >>>>>>          qi->desc_status[head] = QI_ABORT;
> >>>>>>          head = (head - 2 + QI_LENGTH) % QI_LENGTH;
> >>>>>>      } while (head != tail);
> >>>>> Hmm, this code interates only over every second QI descriptor, and
> >>>>> tail
> >>>>> probably points to a descriptor that is not iterated over.
> >>>>>
> >>>>> Jiang, can you please have a look?
> >>>> I think that part is normal, the way we use the queue is to always
> >>>> submit a work operation followed by a wait operation so that we can
> >>>> determine the work operation is complete.  That's done via
> >>>> qi_submit_sync().  We have had spurious reports of the queue getting
> >>>> impossibly out of sync though.  I saw one that was somehow linked to
> >>>> the
> >>>> I/O AT DMA engine.  Roland Dreier saw something similar[1]. I'm not
> >>>> sure if they're related to this, but maybe worth comparing. Thanks,
> >>> Thanks, Alex and Joerg!
> >>>
> >>> Hi Dongdong,
> >>>     Could you please help to give some instructions about how to
> >>> reproduce this issue? I will try to reproduce it if possible.
> >>> Thanks!
> >>> Gerry
> >> Hi Gerry,
> >>
> >> We're running kernel 4.1.0 on a 4-socket system and  we want to
> >> offline socket 1.
> >> Steps as follows:
> >>
> >> echo 1 > /sys/firmware/acpi/hotplug/force_remove
> >> echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject
> Hi Dongdong,
> 	I failed to reproduce this issue on my side. Some please help
> to confirm?
> 1) Is this issue reproducible on your side?
> 2) Does this issue happen if you disable irqbalance service on you
>    system?
> 3) Has the corresponding PCI host bridge been removed before removing
>    the socket?
> 
> From the log message, we only noticed log messages for CPU and memory,
> but not messages for PCI (IOMMU) devices. And this log message
> 	"[ 149.976493] acpi ACPI0004:01: Still not present"
> implies that the socket has been powered off during the ejection.
> So the story may be that you powered off the socket while the host
> bridge on the socket is still in use.
> Thanks!
> Gerry
> 

Was this problem ever resolved ?

We are seeing the same (or a similar) problem randomly with our hardware.
No CPU hotplug is involved.

Any idea what I can do (or help) to track down the problem ?

Thanks,
Guenter

---
Sample traceback:

[  485.547997] Uhhuh. NMI received for unknown reason 29 on CPU 0.
[  485.633519] Do you have a strange power saving mode enabled?
[  485.715262] Kernel panic - not syncing: NMI: Not continuing^M
[  485.795750] CPU: 0 PID: 25109 Comm: cty Tainted: P        W  O 4.1.12-juniper-00687-g3de457e-dirty #1
[  485.932825] Hardware name: Juniper Networks, Inc. 0576/HSW RCB PTX, BIOS NGRE_v0.44 04/07/2015
[  486.057327]  0000000000000029 ffff88085f605df8 ffffffff80a9e179 0000000000000000
[  486.164220]  ffffffff80e53b4a ffff88085f605e78 ffffffff80a99b6f ffff88085f605e18^M
[  486.271116]  ffffffff00000008 ffff88085f605e88 ffff88085f605e28 ffffffff81019a00
[  486.378012] Call Trace:
[  486.413225]  <NMI>  [<ffffffff80a9e179>] dump_stack+0x4f/0x7b
[  486.496228]  [<ffffffff80a99b6f>] panic+0xbb/0x1e9
[  486.565393]  [<ffffffff802070ac>] unknown_nmi_error+0x9c/0xa0
[  486.648394]  [<ffffffff8020724c>] default_do_nmi+0x19c/0x1c0
[  486.730138]  [<ffffffff80207356>] do_nmi+0xe6/0x160
[  486.800564]  [<ffffffff80aa859b>] end_repeat_nmi+0x1a/0x1e
[  486.879793]  [<ffffffff8072a896>] ? qi_submit_sync+0x186/0x3f0
[  486.964051]  [<ffffffff8072a896>] ? qi_submit_sync+0x186/0x3f0
[  487.048307]  [<ffffffff8072a896>] ? qi_submit_sync+0x186/0x3f0
[  487.132564]  <<EOE>>  [<ffffffff80731823>] modify_irte+0x93/0xd0
[  487.219342]  [<ffffffff80731bd3>] intel_ioapic_set_affinity+0x113/0x1a0
[  487.314918]  [<ffffffff80732130>] set_remapped_irq_affinity+0x20/0x30
[  487.407979]  [<ffffffff802c5fec>] irq_do_set_affinity+0x1c/0x50
[  487.493494]  [<ffffffff802c607d>] setup_affinity+0x5d/0x80
[  487.572725]  [<ffffffff802c68b4>] __setup_irq+0x2c4/0x580
[  487.650695]  [<ffffffff8070ce80>] ? serial8250_modem_status+0xd0/0xd0
[  487.743755]  [<ffffffff802c6cf4>] request_threaded_irq+0xf4/0x1b0
[  487.831786]  [<ffffffff8070febf>] univ8250_setup_irq+0x24f/0x290
[  487.918560]  [<ffffffff80710c27>] serial8250_do_startup+0x117/0x5f0
[  488.009108]  [<ffffffff80711125>] serial8250_startup+0x25/0x30
[  488.093365]  [<ffffffff8070b779>] uart_startup.part.16+0x89/0x1f0
[  488.181397]  [<ffffffff8070c475>] uart_open+0x115/0x160
[  488.256852]  [<ffffffff806e9537>] ? check_tty_count+0x57/0xc0
[  488.339854]  [<ffffffff806ed95c>] tty_open+0xcc/0x610
[  488.412793]  [<ffffffff8073dc92>] ? kobj_lookup+0x112/0x170
[  488.493283]  [<ffffffff803b7e6f>] chrdev_open+0x9f/0x1d0
[  488.569992]  [<ffffffff803b1297>] do_dentry_open+0x217/0x340
[  488.651735]  [<ffffffff803b7dd0>] ? cdev_put+0x30/0x30
[  488.725934]  [<ffffffff803b2577>] vfs_open+0x57/0x60
[  488.797616]  [<ffffffff803bffbb>] do_last+0x3fb/0xee0
[  488.870557]  [<ffffffff803c2620>] path_openat+0x80/0x640^M
[  488.947270]  [<ffffffff803c3eda>] do_filp_open+0x3a/0x90
[  489.023984]  [<ffffffff80aa6098>] ? _raw_spin_unlock+0x18/0x40
[  489.108240]  [<ffffffff803d0ba7>] ? __alloc_fd+0xa7/0x130
[  489.186213]  [<ffffffff803b2909>] do_sys_open+0x129/0x220^M
[  489.264184]  [<ffffffff80402a4b>] compat_SyS_open+0x1b/0x20
[  489.344670]  [<ffffffff80aa8d65>] ia32_do_call+0x13/0x13

---
Similar traceback, but during PCIe hotplug:

Call Trace:
 <NMI>  [<ffffffff80a9218a>] dump_stack+0x4f/0x7b^M
 [<ffffffff80a8df39>] panic+0xbb/0x1df
 [<ffffffff8020728c>] unknown_nmi_error+0x9c/0xa0
 [<ffffffff8020742c>] default_do_nmi+0x19c/0x1c0
 [<ffffffff80207536>] do_nmi+0xe6/0x160^M
 [<ffffffff80a9b31b>] end_repeat_nmi+0x1a/0x1e
 [<ffffffff80723dc6>] ? qi_submit_sync+0x186/0x3f0
 [<ffffffff80723dc6>] ? qi_submit_sync+0x186/0x3f0
 [<ffffffff80723dc6>] ? qi_submit_sync+0x186/0x3f0
 <<EOE>>  [<ffffffff8072a325>] free_irte+0xe5/0x130
 [<ffffffff8072ba0f>] free_remapped_irq+0x2f/0x40^M
 [<ffffffff8023af33>] arch_teardown_hwirq+0x23/0x70
 [<ffffffff802c32d8>] irq_free_hwirqs+0x38/0x60
 [<ffffffff8023e0e3>] native_teardown_msi_irq+0x13/0x20
 [<ffffffff8020777f>] arch_teardown_msi_irq+0xf/0x20
 [<ffffffff8069e08f>] default_teardown_msi_irqs+0x5f/0xa0
 [<ffffffff8020775f>] arch_teardown_msi_irqs+0xf/0x20
 [<ffffffff8069e159>] free_msi_irqs+0x89/0x1a0
 [<ffffffff8069f165>] pci_disable_msi+0x45/0x50
 [<ffffffff80696d05>] cleanup_service_irqs+0x25/0x40
 [<ffffffff8069749e>] pcie_port_device_remove+0x2e/0x40
 [<ffffffff8069760e>] pcie_portdrv_remove+0xe/0x10

---
Similar, but at another location in qi_submit_sync:

Call Trace:
 <NMI>  [<ffffffff80a9218a>] dump_stack+0x4f/0x7b^M
 [<ffffffff80a8df39>] panic+0xbb/0x1df
 [<ffffffff8020728c>] unknown_nmi_error+0x9c/0xa0
 [<ffffffff8020742c>] default_do_nmi+0x19c/0x1c0
 [<ffffffff80207536>] do_nmi+0xe6/0x160^M
 [<ffffffff80a9b31b>] end_repeat_nmi+0x1a/0x1e
 [<ffffffff80a98c58>] ? _raw_spin_lock+0x38/0x40^M
 [<ffffffff80a98c58>] ? _raw_spin_lock+0x38/0x40
 [<ffffffff80a98c58>] ? _raw_spin_lock+0x38/0x40
 <<EOE>>  [<ffffffff80723e9d>] qi_submit_sync+0x25d/0x3f0
 [<ffffffff8072a325>] free_irte+0xe5/0x130
 [<ffffffff8072ba0f>] free_remapped_irq+0x2f/0x40
 [<ffffffff8023af33>] arch_teardown_hwirq+0x23/0x70
 [<ffffffff802c32d8>] irq_free_hwirqs+0x38/0x60
 [<ffffffff8023e0e3>] native_teardown_msi_irq+0x13/0x20
 [<ffffffff8020777f>] arch_teardown_msi_irq+0xf/0x20
 [<ffffffff8069e08f>] default_teardown_msi_irqs+0x5f/0xa0
 [<ffffffff8020775f>] arch_teardown_msi_irqs+0xf/0x20
 [<ffffffff8069e159>] free_msi_irqs+0x89/0x1a0
 [<ffffffff8069f165>] pci_disable_msi+0x45/0x50
 [<ffffffff80696d05>] cleanup_service_irqs+0x25/0x40
 [<ffffffff8069749e>] pcie_port_device_remove+0x2e/0x40
 [<ffffffff8069760e>] pcie_portdrv_remove+0xe/0x10
 [<ffffffff806896ed>] pci_device_remove+0x3d/0xc0

The NMIs during PCIe hotplug seem to be more likely (possibly because our
testing generates a large number of PCIe hotplug events).

---
CPU information:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Genuine Intel(R) CPU @ 1.80GHz
stepping	: 1
microcode	: 0x14
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/