lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 15 Jan 2009 12:03:52 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Jan Beulich <jbeulich@...ell.com>
Cc:	tglx@...utronix.de, hpa@...or.com, Nick Piggin <npiggin@...e.de>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] i386: fix assumed to be contiguous leaf page tables
	for kmap_atomic region


* Jan Beulich <jbeulich@...ell.com> wrote:

> Debugging and original patch from Nick Piggin <npiggin@...e.de>
> 
> The early fixmap pmd entry inserted at the very top of the KVA is causing the
> subsequent fixmap mapping code to not provide physically linear pte pages over
> the kmap atomic portion of the fixmap (which relies on said property to
> calculate pte addresses).
> 
> This has caused weird boot failures in kmap_atomic much later in the boot
> process (initial userspace faults) on a 32-bit PAE system with a larger number
> of CPUs (smaller CPU counts tend not to run over into the next page so don't
> show up the problem).
> 
> Solve this by attempting to clear out the page table, and copy any of its
> entries to the new one. Also, add a bug if a nonlinear condition is encounted
> and can't be resolved, which might save some hours of debugging if this fragile
> scheme ever breaks again...
> 
> Signed-off-by: Nick Piggin <npiggin@...e.de>
> 
> Once we have such logic, we can also use it to eliminate the early ioremap
> trickery around the page table setup for the fixmap area. This also fixes
> potential issues with FIX_* entries sharing the leaf page table with the early
> ioremap ones getting discarded by early_ioremap_clear() and not restored by
> early_ioremap_reset(). It at once eliminates the temporary (and configuration,
> namely NR_CPUS, dependent) unavailability of early fixed mappings during the
> time the fixmap area page tables get constructed.
> 
> Finally, also replace the hard coded calculation of the initial table space
> needed for the fixmap area with a proper one, allowing kernels configured for
> large CPU counts to actually boot.
> 
> Signed-off-by: Jan Beulich <jbeulich@...ell.com>
> 
> ---
>  arch/x86/include/asm/io.h |    1 -
>  arch/x86/mm/init_32.c     |   46 +++++++++++++++++++++++++++++++++++++++++++---
>  arch/x86/mm/ioremap.c     |   25 -------------------------
>  3 files changed, 43 insertions(+), 29 deletions(-)

looks good, but could you please do the following cleanup:

> @@ -154,6 +154,11 @@ page_table_range_init(unsigned long star
>  	unsigned long vaddr;
>  	pgd_t *pgd;
>  	pmd_t *pmd;
> +#ifdef CONFIG_HIGHMEM
> +	pte_t *lastpte = NULL;
> +	int pmd_idx_kmap_begin = fix_to_virt(FIX_KMAP_END) >> PMD_SHIFT;
> +	int pmd_idx_kmap_end = fix_to_virt(FIX_KMAP_BEGIN) >> PMD_SHIFT;
> +#endif
>  
>  	vaddr = start;
>  	pgd_idx = pgd_index(vaddr);
> @@ -165,7 +170,43 @@ page_table_range_init(unsigned long star
>  		pmd = pmd + pmd_index(vaddr);
>  		for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
>  							pmd++, pmd_idx++) {
> -			one_page_table_init(pmd);
> +			pte_t *pte;
> +
> +			pte = one_page_table_init(pmd);
> +#ifdef CONFIG_HIGHMEM
> +			/*
> +			 * Something (early fixmap) may already have put a pte
> +			 * page here, which causes the page table allocation
> +			 * to become nonlinear. Attempt to fix it, and if it
> +			 * is still nonlinear then we have to bug.
> +			 */
> +			if (pmd_idx_kmap_begin != pmd_idx_kmap_end
> +			    && (vaddr >> PMD_SHIFT) >= pmd_idx_kmap_begin
> +			    && (vaddr >> PMD_SHIFT) <= pmd_idx_kmap_end
> +			    && ((__pa(pte) >> PAGE_SHIFT) < table_start
> +				|| (__pa(pte) >> PAGE_SHIFT) >= table_end)) {
> +				pte_t *newpte;
> +				int i;
> +
> +				BUG_ON(after_init_bootmem);
> +				newpte = alloc_low_page();
> +				for (i = 0; i < PTRS_PER_PTE; i++)
> +					set_pte(newpte + i, pte[i]);
> +
> +				paravirt_alloc_pte(&init_mm,
> +						   __pa(newpte) >> PAGE_SHIFT);
> +				set_pmd(pmd, __pmd(__pa(newpte)|_PAGE_TABLE));
> +				BUG_ON(newpte != pte_offset_kernel(pmd, 0));
> +				__flush_tlb_all();
> +
> +				paravirt_release_pte(__pa(pte) >> PAGE_SHIFT);
> +				pte = newpte;
> +			}
> +			BUG_ON(vaddr < fix_to_virt(FIX_KMAP_BEGIN - 1)
> +			       && vaddr > fix_to_virt(FIX_KMAP_END)
> +			       && lastpte && lastpte + PTRS_PER_PTE != pte);
> +			lastpte = pte;
> +#endif

please introduce a helper function here. That way both the ugly #ifdefs 
move out of this function (to a more logical place), and the very visible 
80-line length artifacts from the code vanish as well.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ