lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171227161801.GE1410@arch-chirva.localdomain>
Date:   Wed, 27 Dec 2017 11:18:01 -0500
From:   Alexandru Chirvasitu <achirvasub@...il.com>
To:     Dou Liyang <douly.fnst@...fujitsu.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Dexuan Cui <decui@...rosoft.com>, Pavel Machek <pavel@....cz>,
        kernel list <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        "Maciej W. Rozycki" <macro@...ux-mips.org>,
        Mikael Pettersson <mikpelinux@...il.com>,
        Josh Poulson <jopoulso@...rosoft.com>,
        "Mihai Costache (Cloudbase Solutions SRL)" <v-micos@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Marc Zyngier <marc.zyngier@....com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Simon Xiao <sixiao@...rosoft.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jork Loeser <Jork.Loeser@...rosoft.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
        KY Srinivasan <kys@...rosoft.com>
Subject: Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

As per instructions, I did the following:

(1)

Checked out

464e1d5 Linux 4.15-rc5

(after getting my copy up to date, fetching, pulling ,etc.) and
compiled it as-is. Config attached (the one labeled 'np' for 'no
patch').

Result:

Boot with no extraparameters locks up after login, as before;

apic=debug does not panic, but locks up after login, as before;

noapic logs me in fine, but disables my wired connection (this is also
behaviour I noted previously). Sees the thernet card and brings it up,
but dhclient willt not connect.

(2)

Applied the patch you sent below to 464e1d5; again config attached,
labeled 'p' for 'patch'. I applied it manually because git apply was
giving me errors and I didn't want to hold us back while I debug (or
rather I should say 'learn to use git apply'; first time doing it).

In any case though, the changes were as you indicated. The diff I get
with 'git show' were

---------------------------------------------------------------
    irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system()

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 0ba0dd8..9292d79 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -143,8 +143,8 @@ void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit,
        BUG_ON(m->online_maps > 1 || (m->online_maps && !replace));
 
        set_bit(bit, m->system_map);
-       if (replace) {
-               BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));
+
+       if (replace && test_and_clear_bit(bit, cm->alloc_map)){
                cm->allocated--;
                m->total_allocated--;
        }
---------------------------------------------------------------

I deleted / inserted those lines myself, but I do believe they
precisely match what you sent. Compiled and installed the modified kernel. 

Result:

*Exactly* as above, on all three attempts (no parameters, 'apic=debug'
 and 'noapic').

---

So the patch doesn't seem to have an effect, and the panicking is no
longer happening in 4.15-rc5 anyway (with 'apic=debug' and no patch).

Perhaps if I go back to the original bad commit in that bisect I did
and Apply the patch to *that*.. I'll try, but cannot at this precise
moment. I'll get back in a bit.


On Wed, Dec 27, 2017 at 04:14:23PM +0800, Dou Liyang wrote:
> Hi Alexandru,
> 
> At 12/24/2017 04:01 AM, Alexandru Chirvasitu wrote:
> > On Sat, Dec 23, 2017 at 02:32:52PM +0100, Thomas Gleixner wrote:
> > > On Sat, 23 Dec 2017, Dexuan Cui wrote:
> > > 
> > > > > From: Alexandru Chirvasitu [mailto:achirvasub@...il.com]
> > > > > Sent: Friday, December 22, 2017 14:29
> > > > > 
> > > > > The output of that precise command run just now on a freshly-compiled
> > > > > copy of that commit is attached.
> > > > > 
> > > > > On Fri, Dec 22, 2017 at 09:31:28PM +0000, Dexuan Cui wrote:
> > > > > > > From: Alexandru Chirvasitu [mailto:achirvasub@...il.com]
> > > > > > > Sent: Friday, December 22, 2017 06:21
> > > > > > > 
> > > > > > > In the absence of logs, the best I can do at the moment is attach a
> > > > > > > picture of the screen I am presented with on the  boot
> > > > > > > attempt.
> > > > > > > Alex
> > > > > > 
> > > > > > The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture.
> > > > > > IMO we should find which line of code causes the panic. I suppose
> > > > > > "objdump -D kernel/irq/matrix.o" can help to do that.
> > > > > > 
> > > > > > Thanks,
> > > > > > -- Dexuan
> > > > 
> > > > The BUG_ON panic happens at line 147:
> > > >                     BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));
> > > > 
> 
> There are 2 bugs in your laptop:
> 
>   1. Hard lockups on both CPUs after login
>   2. panic with "apic=debug"
> 
> For the 2th bug, please try the following patch(need Thomas confirmation
> :) ) in Linux 4.15-rc5. I think it can fix the panic.
> 
> If the 2th bug fixed, let's back to the 1th bug:
> 
> Is Linus current head 4.15-rc5 bad as well?
> 
> If yes, Please using "apic=debug" and give the dmesg log.
> 
> Thanks,
> 	dou.
> 
> ------------------------8<-------------------------------------------
> 
> irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system()
> 
> Currently, x86 marks the preallocated legacy interrupts when initializing
> IRQ(native_init_IRQ), but will clear them if they are not activated in
> vector_configure_legacy().
> 
> So, in irq_matrix_assign_system(), replacing an legacy vector which may
> not allocated in a cpumap->alloc_map[] with a system vector will trigger
> the BUGON();
> 
> Remove the BUGON().
> 
> Signed-off-by: Dou Liyang <douly.fnst@...fujitsu.com>
> ---
>  kernel/irq/matrix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
> index 0ba0dd8863a7..876cbeab9ca2 100644
> --- a/kernel/irq/matrix.c
> +++ b/kernel/irq/matrix.c
> @@ -143,11 +143,12 @@ void irq_matrix_assign_system(struct irq_matrix *m,
> unsigned int bit,
>  	BUG_ON(m->online_maps > 1 || (m->online_maps && !replace));
> 
>  	set_bit(bit, m->system_map);
> -	if (replace) {
> -		BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));
> +
> +	if (replace && test_and_clear_bit(bit, cm->alloc_map)){
>  		cm->allocated--;
>  		m->total_allocated--;
>  	}
> +
>  	if (bit >= m->alloc_start && bit < m->alloc_end)
>  		m->systembits_inalloc++;
> 
> -- 
> 
> 

View attachment "config-np" of type "text/plain" (194513 bytes)

View attachment "config-p" of type "text/plain" (194513 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ