lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 7 Dec 2012 23:43:03 +0000
From:	Seiji Aguchi <seiji.aguchi@....com>
To:	"Luck, Tony" <tony.luck@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:	"cbouatmailru@...il.com" <cbouatmailru@...il.com>,
	"ccross@...roid.com" <ccross@...roid.com>,
	"keescook@...omium.org" <keescook@...omium.org>,
	"dzickus@...hat.com" <dzickus@...hat.com>,
	"dle-develop@...ts.sourceforge.net" 
	<dle-develop@...ts.sourceforge.net>,
	Satoru Moriya <satoru.moriya@....com>
Subject: RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

> Can all these things really happen (did you run into this problem on a real system?). Or is this just a theoretical problem.  Ugly (but
> practical) hacks might be OK to solve real problems. 

It is a theoretical problem right now.
But it is a timing issue and there is a possibility to happen actually.

> But do we really want them to fix problems that actually never happen?

If we find a problem (even if it is theoretical), we can't say "It actually never happen.".

I have some reasons to submit this patch before reproducing actually.

1)
It is too late if we fix a problem after it actually happened in case where we apply Linux, including pstore, 
to mission critical systems, because the failure of those systems has a great impact on a whole society.
Customers in this area ask us to fix a problem as soon as possible.
On the other hand, this kind of timing issue is hard to reproduce.
So, our support service engineers often work all night to reproduce it.
It is a nightmare for us.

If we can fix it with a small patch in adance, it is really helpful for us.

2)
In the long term, I plan to add a kmsg_dump to a kexec path because kdump may fail in the real world.
In that case, we need another troubleshooting material like pstore to detect a root cause of failure.

Actually, someone blamed for a reliability of kdump in LinuxCON Europe.
http://events.linuxfoundation.org/images/stories/pdf/lceu2012_holzheu.pdf

To convince a kexec maintainer to add a kmsg_dump, I need to prove that there is no problem in pstore code
causing a failure of kdump.

Seiji

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ