linux-kernel - RE: [RFC][PATCH v2 2/3] Hold multiple logs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A5ED84D3BB3A384992CBB9C77DEDA4D40FB28206@USINDEM103.corp.hds.com>
Date:	Fri, 20 Jul 2012 00:39:24 +0000
From:	Seiji Aguchi <seiji.aguchi@....com>
To:	"Luck, Tony" <tony.luck@...el.com>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mikew@...gle.com" <mikew@...gle.com>,
	"dzickus@...hat.com" <dzickus@...hat.com>,
	"Matthew Garrett (mjg@...hat.com)" <mjg@...hat.com>
CC:	"dle-develop@...ts.sourceforge.net" 
	<dle-develop@...ts.sourceforge.net>,
	Satoru Moriya <satoru.moriya@....com>
Subject: RE: [RFC][PATCH v2 2/3] Hold multiple logs


Thank you for describing this in detail.

> Yes - if the OOPs is instrumental in the path leading to the hang/panic - then the OOPS is the first place to look for the root cause of
> the problem. But it will be a case by case analysis.
> Sometimes the OOPS might be unconnected. If possible we'd like to log more information to allow detective work to decide whether
> there is a connection. But as I mentioned above there are severe limits to how much better things are by storing more information.

I understand the reason why you think 3 or 4 logs are reasonable.
There are some cases  2nd or 3rd oops is critical....

I have some enterprise customers who are sensitive for a software failure  and specify panic_on_oops=1.
In this case, they don't need 3,4 logs. 2 logs  are enough.

So, kernel parameter should be as follows.

Log_num =1
  - For users who want to hold just one log.

Log_num=2
  - For users who can handle multiple logs and 1st oops is concerned. (by specifying panic_on_oops=1)

Log_num=3,4
 -  for users who care about 2nd or 3rd oops.

Log_num=5 or more
Invalid value.

If there is misunderstanding, please let me know.

Seiji

> -----Original Message-----
> From: Luck, Tony [mailto:tony.luck@...el.com]
> Sent: Thursday, July 19, 2012 7:42 PM
> To: Seiji Aguchi; linux-doc@...r.kernel.org; linux-kernel@...r.kernel.org; mikew@...gle.com; dzickus@...hat.com; Matthew
> Garrett (mjg@...hat.com)
> Cc: dle-develop@...ts.sourceforge.net; Satoru Moriya
> Subject: RE: [RFC][PATCH v2 2/3] Hold multiple logs
> 
> > If you are concerned about multiple OOPS case, I think an user app which logs from /dev/pstore to /var/log should be developed.
> 
> Agreed - we need an app/daemon to do this.
> 
> > Once it is developed, we don't need to care about multiple oops case and the appropriate number is two.
> 
> Only if you can guarantee that the app/daemon will run and save the first OOPS before the next occurs. Even if the system were
> running normally this might be difficult to achieve.. but in this case we know the system isn't running normally (it just OOPSed twice!).
> 
> However - there is progressively less value in collecting additional consecutive OOPS. Perhaps one is enough 90% or even 99% of the
> time. I'm naturally paranoid so having two or three would make me feel happy that most of the remaining 10% or 1% of the cases
> were covered.
> 
> > - In case where system is workable after oops.
> > The user app will erase an entry in NVRAM.
> > And we can get the message via /var/log.
> 
> Yes - the system can keep running after many types of OOPs - so the OOPS will be logged in /var/log (or by the app/daemon copying
> from pstore, or both).
> 
> > - In case where system hangs up or panics due to the oops.
> > Oops is the critical message and we don't need care about subsequent events.
> 
> Yes - if the OOPs is instrumental in the path leading to the hang/panic - then the OOPS is the first place to look for the root cause of
> the problem. But it will be a case by case analysis.
> Sometimes the OOPS might be unconnected. If possible we'd like to log more information to allow detective work to decide whether
> there is a connection. But as I mentioned above there are severe limits to how much better things are by storing more information.
> 
> -Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/