[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EEB8320.4010809@oracle.com>
Date:	Fri, 16 Dec 2011 09:42:56 -0800
From:	Yinghai Lu <yinghai.lu@...cle.com>
To:	Jacob Shin <jacob.shin@....com>
CC:	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"Herrmann3, Andreas" <Andreas.Herrmann3@....com>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Joerg Roedel <joerg.roedel@....com>
Subject: Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes
 above 4 GB from direct mapping.
On 12/16/2011 08:20 AM, Jacob Shin wrote:
> On Wed, Dec 14, 2011 at 05:14:25PM -0600, Jacob Shin wrote:
>> On Wed, Dec 14, 2011 at 02:42:50PM -0800, H. Peter Anvin wrote:
>>> On 10/20/2011 03:26 PM, Jacob Shin wrote:
>>>> On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
>>>>> On 10/20/2011 02:15 PM, Jacob Shin wrote:
>>>>>> On systems with very large memory (1 TB in our case), BIOS may report a
>>>>>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>>>>> these from the direct mapping.
>>>>>
>>>>>> +			if (ei->type == E820_RESERVED)
>>>>>> +				continue;
>>>>>
>>>>> This should probably be ei->type != E820_RAM or something similar.  I
>>>>> haven't looked yet, what does the < 4 GiB code do?
>>>>
>>>> Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
>>>> E820_RAM.
>>>>
>>>> I'll fix this, test, then resend.
>>>>
>>>
>>> I never got any kind of updated patch, did I?
>>
>> No, I never sent one out, because it would have still only covered > 4GB, and
>> in later emails, you said that you wanted a general one that covered all x86.
>>
>> I'll give it another shot at the generic patch, making a special case for the
>> < 1MB ISA region.
>>
> 
> Here is the new patch, thanks!
> 
> From dad99fe54eb26d4022a48f1f9b88c21f77809282 Mon Sep 17 00:00:00 2001
> From: Jacob Shin <jacob.shin@....com>
> Date: Thu, 15 Dec 2011 10:56:14 -0500
> Subject: [PATCH] x86: Only include address ranges marked as E820_RAM in kernel direct mapping
> 
> Currently, 0 ~ max_low_pfn is first mapped, then 4GB ~ max_pfn is
> mapped. On some systems that have large memory holes that occur
> within those two regions, we end up with PATs that mark pages that
> are not backed by actual DRAM -- as cacheable.
> 
> This patch first maps 0 ~ 1MB ISA region, then iterates over the
> E820 to map useable E820_RAM ranges.
> 
> Cc: stable@...nel.org   # > 2.6.32
> Signed-off-by: Jacob Shin <jacob.shin@....com>
> Reviewed-by: Andreas Herrmann <Andreas.Herrmann3@....com>
> ---
>  arch/x86/kernel/setup.c |   29 ++++++++++++++++++++++++++---
>  1 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index cf0ef98..eae6b41 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -691,6 +691,8 @@ early_param("reservelow", parse_reservelow);
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	int i;
> +
>  #ifdef CONFIG_X86_32
>  	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
>  	visws_early_detect();
> @@ -932,13 +934,34 @@ void __init setup_arch(char **cmdline_p)
>  	init_gbpages();
>  
>  	/* max_pfn_mapped is updated here */
> -	max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
> +	max_low_pfn_mapped = init_memory_mapping(0, 0x100000);
>  	max_pfn_mapped = max_low_pfn_mapped;
>  
> +	for (i = 0; i < e820.nr_map; i++) {
> +		struct e820entry *ei = &e820.map[i];
> +		u64 start = ei->addr;
> +		u64 end = ei->addr + ei->size;
> +
> +		if (ei->type != E820_RAM)
> +			continue;
> +
> +		if (start < 0x100000)
> +			continue;
> +#ifdef CONFIG_X86_32
> +		if ((start >> PAGE_SHIFT) >= max_low_pfn)
> +			continue;
> +
> +		if ((end >> PAGE_SHIFT) > max_low_pfn)
> +			end = max_low_pfn << PAGE_SHIFT;
> +#endif
> +		max_pfn_mapped = init_memory_mapping(start, end);
> +
> +		if ((end >> PAGE_SHIFT) == max_low_pfn)
> +			max_low_pfn_mapped = max_pfn_mapped;
> +	}
> +
>  #ifdef CONFIG_X86_64
>  	if (max_pfn > max_low_pfn) {
> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> -						     max_pfn<<PAGE_SHIFT);
>  		/* can we preseve max_low_pfn ?*/
>  		max_low_pfn = max_pfn;
>  	}
no, you change the meaning max_low_pfn_mapped and max_pfn_mapped for x86_64 at least.
before your patch:
max_low_pfn_mapped is the mapped pfn beblow 4g.
max_pfn_mapped: is mapped pfn.
after your patch, those two variables does not mean the memory [0, max_low_pfn_mapped) and [4g<<12, max_pfn_mapped)
are really mapped.
so in arch/x86/platform/efi/efi.c
                if (end_pfn <= max_low_pfn_mapped
                    || (end_pfn > (1UL << (32 - PAGE_SHIFT))
                        && end_pfn <= max_pfn_mapped))
                        va = __va(md->phys_addr);
                else
                        va = efi_ioremap(md->phys_addr, size, md->type);
and others will have problem.
to solve your problem:
1. unmap the HT range ?
2. or introduce mapped_pfn_range array?
Thanks
Yinghai Lu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
