lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090629031804.GA6764@elte.hu>
Date:	Mon, 29 Jun 2009 05:18:04 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Yinghai Lu <yinghai@...nel.org>
Cc:	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	akpm@...ux-foundation.org, netdev@...r.kernel.org, x86@...nel.org
Subject: Re: kerneloops.org report for the week


* Arjan van de Ven <arjan@...radead.org> wrote:

> Few "highlights" this week
> * mem_cgroup_add_lru_list (rank 2) is a high rising issue;
>   it's list corruption, question is why this is new
> * rank 13 (memcmp in the raid code) is also new
> * the warning in get_free_pages that has been discussed on lkml is dropping
>   from the ranks again
> 
> 
> This week, a total of 15273 oopses and warnings have been reported,
> compared to 13384 reports in the previous week.
> 
> 
> Rank 2: mem_cgroup_add_lru_list (warn)
> 	Reported 1554 times (1622 total reports)
> 	List corruption in the VM code
> 	This oops was last seen in version 2.6.30-git19, and first seen in 2.6.29.
> 	More info: http://www.kerneloops.org/searchweek.php?search=mem_cgroup_add_lru_list

At least one list corruption bug was fixed by:

   cb4cbcf: mm: fix incorrect page removal from LRU

> Rank 3: getnstimeofday (warning)
> 	Reported 1319 times (4893 total reports)
> 	[suspend resume] getnstimeofday() is called before timekeeping is resumed
> 	This oops was last seen in version 2.6.30, and first seen in 2.6.24.
> 	More info: http://www.kerneloops.org/searchweek.php?search=getnstimeofday

Probably caused by some buggy driver callback?

> Rank 7: hres_timers_resume (warning)
> 	Reported 763 times (2368 total reports)
> 	[suspend resume] hres_timers_resume() is incorrectly called with interrupts on
> 	This warning was last seen in version 2.6.30, and first seen in 2.6.24.7.
> 	More info: http://www.kerneloops.org/searchweek.php?search=hres_timers_resume

This is probably a driver incorrectly enabling irqs in a resume 
callback. This should be easier and more specific to debug with the 
lockdep based annotation i suggested for the suspend code in various 
`mails.

> Rank 8: generic_get_mtrr (warning)
> 	Reported 544 times (2061 total reports)
> 	BIOS bug where the MTRRs are not set up correctly
> 	This warning was last seen in version 2.6.30, and first seen in 2.6.25.3.
> 	More info: http://www.kerneloops.org/searchweek.php?search=generic_get_mtrr

I think this calls for enabling the x86 MTRR sanitizer by default - 
500 out of 15000 reports suggests a significant proportion of Linux 
systems is affected by MTRR setup problems.

I.e. we should change:

config MTRR_SANITIZER_ENABLE_DEFAULT
        int "MTRR cleanup enable value (0-1)"
        range 0 1
        default "0"

To 'default "1"'. Any objections?

If the MTRR sanitizer is enabled then i think the above warning in 
generic_get_mtrr() should never trigger.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ