linux-kernel - Re: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110928140933.GR5795@redhat.com>
Date:	Wed, 28 Sep 2011 10:09:33 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Seiji Aguchi <seiji.aguchi@....com>
Cc:	"Luck, Tony" <tony.luck@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Vivek Goyal <vgoyal@...hat.com>,
	Matthew Garrett <mjg@...hat.com>,
	"Chen, Gong" <gong.chen@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"dle-develop@...ts.sourceforge.net" 
	<dle-develop@...ts.sourceforge.net>,
	Satoru Moriya <satoru.moriya@....com>
Subject: Re: [RFC][PATCH -next] pstore: replace spin_lock with
 spin_trylock_irqsave in panic path

On Tue, Sep 27, 2011 at 03:46:08PM -0400, Seiji Aguchi wrote:
> Hi,
> 
> >Yes we care - saving panic data is most likely the single most important
> >thing that pstore does.  I just have severe doubts that it will actually
> >save anything useful if we just blindly continue if we can't get the lock.
> 
> I agree with Tony. We may not get useful information if pstore just blindly continues
> while other cpus are running.
> 
> >Is this patch based on a real-life case of a system deadlocking? I'd
> >like to know if we are just talking around the theoretical case that
> >the lock may be held at panic time - or something that has actually been
> >seen in real life.
> 
> This patch is _not_ based on real-life case. I would like to avoid potential deadlock.
> 
> If Don disagrees to  my "return" code, I have another idea which moves pstore_dump() behind smp_send_stop().
> smp_send_stop() stops other cpus by sending IPI.
> So pstore can continue reliably and get useful information by just busting spinlock.

Yeah, Vivek had a similar idea to have the common panic path mimic what
they do with kdump, stop all the cpus except for the crashing one, to
serialize the crashing path.  This would allow us to more easily bust
spinlocks without worrying about what the other cpus are doing.

The kdump solution involves using NMI whereas smp_send_stop (on x86)
avoids it because of past issues and instead uses the IRQ line.  This
won't work if pstore_dump uses a spin_try_lock_irqsave() because the IRQ
line will be disable and never get the smp_send_stop() message (unless I
am reading the code wrong).

[reads the kernel/panic.c code] oh, I see this already exists, you would
just move the smp_send_stop() command up a couple lines of code.

[Side note] perhaps we should change the behaviour of smp_send_stop to use
NMI and create a blacklist of machines to use the IRQ line instead.  I
assume the list of broken machines is small as Red Hat has been kdumping
kernels since 2.6.18 with little evidence that machines were failing
because NMI wasn't working properly.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/