lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 23 Jan 2008 01:00:17 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Andi Kleen <ak@...e.de>
Cc:	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	jbeulich@...ell.com, venkatesh.pallipadi@...el.com,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: CPA boot crash (was: [PATCH] [0/36] Great change_page_attr
	patch series v3)


* Andi Kleen <ak@...e.de> wrote:

> > because it interferes/interacts with CPA and the page table code. So
> 
> No that is not its main problem I believe. Main problem are all the 
> driver and other subsystem interactions (it is a little bit similar to 
> power management where you have lots of little bits all over right 
> instead of a single big one). [...]

that is (yet another) major misconception on your part. "Drivers" are an 
easy to blame target (i guess because there's no one out there to defend 
a vague "drivers" accusation), and they are not the problem here _at 
all_.

Drivers tell the architecture code which physical pages they'd like to 
have access to (or which page range they'd like to see different cache 
attributes on) and that's it. They are plain users of the ioremap() and 
change_page_attr() APIs. Nothing more, nothing less.

It is the utmost duty of architecture code to make those APIs 
fool-proof. Hardware _will_ mess up the physical parameters that get 
passed in every possible way - and drivers just try to use what the 
hardware tells them to use. So robustness is key and there's just no 
"driver reason" why these APIs cannot be robust.

so you are delusional if you think that the c_p_a() problems are "driver 
and other subsystem interactions".

And your analogy with power management could not be more mistaken. Power 
management and suspend/resume in particular is so complex because it is 
analogous to a _full bootup and shutdown cycle_, with the following, 
hard to meet expectation from the user: 'this stuff must work all the 
time, and must be instantaneous'. Suspend/resume is an _incredibly 
complex_ machinery and the user does not realize (and does not accept 
the concequences) of this complexity. It is a codepath that is affected 
by tens and tens of thousands of driver and core kernel code. Just one 
single mistake and "resume does not work".

ioremap() and change_page_attr() on the other hand is a small, few 
hundred lines codebase for a stable and well-defined purpose. There's no 
significant "subsystem interactions" whatsoever.

by far the most intense and most high-frequency user of the 
change_page_attr() code is CONFIG_DEBUG_PAGEALLOC=y. It does a cpa call 
for every single page and slab allocation/freeing. But this debug 
feature ... is not enabled on the 64-bit side - why? So unfortunately we 
dont have any real robustness track record of the 64-bit side of the CPA 
code, and that's exactly the code your clflush and gbpages code changes.

oh, and due to that i'll probably revert these two patches of yours:

  Subject: x86: c_p_a(), change kernel_map_pages to not use c_p_a()
  Subject: x86: c_p_a(), change 32-bit back to init_mm semaphore locking

as with these changes you've removed _the_ most important stress-tester 
for the c_p_a() code: DEBUG_PAGEALLOC.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ