[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EEB8320.4010809@oracle.com>
Date: Fri, 16 Dec 2011 09:42:56 -0800
From: Yinghai Lu <yinghai.lu@...cle.com>
To: Jacob Shin <jacob.shin@....com>
CC: "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"Herrmann3, Andreas" <Andreas.Herrmann3@....com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Joerg Roedel <joerg.roedel@....com>
Subject: Re: [PATCH 1/1] x86: Exclude E820_RESERVED regions and memory holes
above 4 GB from direct mapping.
On 12/16/2011 08:20 AM, Jacob Shin wrote:
> On Wed, Dec 14, 2011 at 05:14:25PM -0600, Jacob Shin wrote:
>> On Wed, Dec 14, 2011 at 02:42:50PM -0800, H. Peter Anvin wrote:
>>> On 10/20/2011 03:26 PM, Jacob Shin wrote:
>>>> On Thu, 2011-10-20 at 17:20 -0500, H. Peter Anvin wrote:
>>>>> On 10/20/2011 02:15 PM, Jacob Shin wrote:
>>>>>> On systems with very large memory (1 TB in our case), BIOS may report a
>>>>>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>>>>> these from the direct mapping.
>>>>>
>>>>>> + if (ei->type == E820_RESERVED)
>>>>>> + continue;
>>>>>
>>>>> This should probably be ei->type != E820_RAM or something similar. I
>>>>> haven't looked yet, what does the < 4 GiB code do?
>>>>
>>>> Hm, okay, it calls e820_end_of_low_ram_pfn() which effectively is !=
>>>> E820_RAM.
>>>>
>>>> I'll fix this, test, then resend.
>>>>
>>>
>>> I never got any kind of updated patch, did I?
>>
>> No, I never sent one out, because it would have still only covered > 4GB, and
>> in later emails, you said that you wanted a general one that covered all x86.
>>
>> I'll give it another shot at the generic patch, making a special case for the
>> < 1MB ISA region.
>>
>
> Here is the new patch, thanks!
>
> From dad99fe54eb26d4022a48f1f9b88c21f77809282 Mon Sep 17 00:00:00 2001
> From: Jacob Shin <jacob.shin@....com>
> Date: Thu, 15 Dec 2011 10:56:14 -0500
> Subject: [PATCH] x86: Only include address ranges marked as E820_RAM in kernel direct mapping
>
> Currently, 0 ~ max_low_pfn is first mapped, then 4GB ~ max_pfn is
> mapped. On some systems that have large memory holes that occur
> within those two regions, we end up with PATs that mark pages that
> are not backed by actual DRAM -- as cacheable.
>
> This patch first maps 0 ~ 1MB ISA region, then iterates over the
> E820 to map useable E820_RAM ranges.
>
> Cc: stable@...nel.org # > 2.6.32
> Signed-off-by: Jacob Shin <jacob.shin@....com>
> Reviewed-by: Andreas Herrmann <Andreas.Herrmann3@....com>
> ---
> arch/x86/kernel/setup.c | 29 ++++++++++++++++++++++++++---
> 1 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index cf0ef98..eae6b41 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -691,6 +691,8 @@ early_param("reservelow", parse_reservelow);
>
> void __init setup_arch(char **cmdline_p)
> {
> + int i;
> +
> #ifdef CONFIG_X86_32
> memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
> visws_early_detect();
> @@ -932,13 +934,34 @@ void __init setup_arch(char **cmdline_p)
> init_gbpages();
>
> /* max_pfn_mapped is updated here */
> - max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
> + max_low_pfn_mapped = init_memory_mapping(0, 0x100000);
> max_pfn_mapped = max_low_pfn_mapped;
>
> + for (i = 0; i < e820.nr_map; i++) {
> + struct e820entry *ei = &e820.map[i];
> + u64 start = ei->addr;
> + u64 end = ei->addr + ei->size;
> +
> + if (ei->type != E820_RAM)
> + continue;
> +
> + if (start < 0x100000)
> + continue;
> +#ifdef CONFIG_X86_32
> + if ((start >> PAGE_SHIFT) >= max_low_pfn)
> + continue;
> +
> + if ((end >> PAGE_SHIFT) > max_low_pfn)
> + end = max_low_pfn << PAGE_SHIFT;
> +#endif
> + max_pfn_mapped = init_memory_mapping(start, end);
> +
> + if ((end >> PAGE_SHIFT) == max_low_pfn)
> + max_low_pfn_mapped = max_pfn_mapped;
> + }
> +
> #ifdef CONFIG_X86_64
> if (max_pfn > max_low_pfn) {
> - max_pfn_mapped = init_memory_mapping(1UL<<32,
> - max_pfn<<PAGE_SHIFT);
> /* can we preseve max_low_pfn ?*/
> max_low_pfn = max_pfn;
> }
no, you change the meaning max_low_pfn_mapped and max_pfn_mapped for x86_64 at least.
before your patch:
max_low_pfn_mapped is the mapped pfn beblow 4g.
max_pfn_mapped: is mapped pfn.
after your patch, those two variables does not mean the memory [0, max_low_pfn_mapped) and [4g<<12, max_pfn_mapped)
are really mapped.
so in arch/x86/platform/efi/efi.c
if (end_pfn <= max_low_pfn_mapped
|| (end_pfn > (1UL << (32 - PAGE_SHIFT))
&& end_pfn <= max_pfn_mapped))
va = __va(md->phys_addr);
else
va = efi_ioremap(md->phys_addr, size, md->type);
and others will have problem.
to solve your problem:
1. unmap the HT range ?
2. or introduce mapped_pfn_range array?
Thanks
Yinghai Lu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists