lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <49BFECF9.6090204@goop.org>
Date:	Tue, 17 Mar 2009 11:33:29 -0700
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Jan Beulich <jbeulich@...ell.com>
CC:	mingo@...e.hu, tglx@...utronix.de, linux-kernel@...r.kernel.org,
	hpa@...or.com
Subject: Re: [PATCH] x86: create a non-zero sized bm_pte only when	 needed

Jan Beulich wrote:
>> Does this depend on CONFIG_EARLY_PRINTK_DBGP being set?  And what's so 
>> special about FIX_DBGP_BASE, that we should hard-code it in here?  Is it 
>> just that its the first non-arch-dependent fixmap slot?  Or something 
>> else?  Will it break if we move FIX_DBGP_BASE?
>>     
>
> No, it is indeed tied to that one fixmap entry, as this is what the 'early
> initialization of the fixmap area' (commented such in head_32.S, and
> uncommented equivalent exists in head_64.S) is about, albeit without
> explicit tying to the respective fixmap entry (which makes this code
> even more fragile than my change might seem).
>   

I had to dig back to mid 2007 to find Eric's changeset referring to "USB 
debug", but as far as I can see the DBGP code didn't appear in-tree 
until mid 2008.  That's a lot of archaeology to dig through to 
understand this change.

>> Is the space saving here just the 1 page for bm_pte[]?
>>     
>
> Yes.
>
>   
>> Wouldn't we do as well by making it initdata?
>>     
>
> No, because the table may be retained past boot.
>   

No, early_ioremaps are not allowed to exist beyond boot-time.  The 
ioremap code will complain about any extant mappings in 
check_early_ioremap_leak().

But bm_pte might be used for other fixmap slots, so it can't really be 
released unless we carefully rearrange things.

>> I'm picking on this change because its breaking Xen PV booting...
>>     
>
> Hmm, I don't think there's anything that should make it break. Any
> details?
>   

dmi_present() faults because the pointer returned from early_ioremap is 
bad: the L2 entry is empty.  Xen boot doesn't go through head.S, and has 
no particular requirement for extremely early fixmap access, so there's 
no implicit fixmap pte setup.

user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 00000000000a0000 - 0000000000100000 (reserved)
 user: 0000000000100000 - 0000000000eaf000 (usable)
 user: 0000000000eaf000 - 0000000000f32000 (reserved)
 user: 0000000000f32000 - 0000000010000000 (usable)
(XEN) d1:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffffffff400000:
(XEN)  L4[0x1ff] = 0000000078c80067 0000000000000203
(XEN)  L3[0x1ff] = 0000000078c41067 0000000000000204
(XEN)  L2[0x1fa] = 0000000000000000 ffffffffffffffff 
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 1 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-3.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e033:[<ffffffff8034e155>]
(XEN) RFLAGS: 0000000000000207   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffffff410000   rbx: ffffffff80665e88   rcx: 0000000000000003
(XEN) rdx: 000000000000000f   rsi: ffffffffff400000   rdi: ffffffff80665e88
(XEN) rbp: ffffffff80665e78   rsp: ffffffff80665e30   r8:  ffffffff80665c68
(XEN) r9:  ffffffff80621080   r10: 0000000000000002   r11: 0000000000000519
(XEN) r12: ffffffffff400000   r13: ffffffffff400000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000078e01000   cr2: ffffffffff400000



>>>  static __initdata int after_paging_init;
>>> -static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
>>> +#define __FIXADDR_TOP (-PAGE_SIZE)
>>>   
>>>       
>> Will this break in a 32-bit PV kernel where FIXADDR_TOP is shifted?
>>     
>
> Not as long as the shifting happens in 2Mb steps (and when I wrote the
> patch [which was a while back] I checked that there are other assumptions
> about the shift only happening in 2Mb increments).
>
>   
>> This seriously needs a good inline comment.
>>     
>
> Why only is it always me who is asked for extensive inline comments, when
> other code in the same area is happily being accepted without even being
> self-commenting (which, if you read the construct carefully, I believe my
> change is)? As noted above, the dependency on which page table slot
> need early initialization is completely hidden behind hardcoded literal numbers
> at least for x86-64. This is what indeed would need a comment (or better
> yet, replacing of the hardcoded numbers by proper symbolics, in which
> case I would think a comment would quickly become redundant).
>   

The change needs to stand on its own.  I'm fairly familiar with this 
area of this area of code, but the implicit dependency on the fixmap 
setup in head*.S wasn't at all obvious.  I searched the tree for 
references to FIX_DBGP_BASE to work out why it was the slot you were 
using here, but again, I couldn't work it out.  And conceptually, making 
the init of something as central as early_ioremap a side-effect of the 
init for a debug device just doesn't make much sense to me.

My concern is that this change makes everything a bit more brittle, with 
more non-obvious long-range dependencies which we'll need to keep under 
control as the code changes, but with only a tiny (potential) memory 
saving as an upside.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ