[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51F0969C.8000001@jp.fujitsu.com>
Date:	Thu, 25 Jul 2013 12:08:12 +0900
From:	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
To:	Hush Bensen <hush.bensen@...il.com>
CC:	Toshi Kani <toshi.kani@...com>, Ingo Molnar <mingo@...nel.org>,
	<akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>, <dave@...1.net>,
	<kosaki.motohiro@...il.com>, <tangchen@...fujitsu.com>,
	<vasilis.liaskovitis@...fitbricks.com>
Subject: Re: [PATCH v2] mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
(2013/07/25 9:56), Hush Bensen wrote:
> On 07/25/2013 12:02 AM, Toshi Kani wrote:
>> On Wed, 2013-07-24 at 08:18 +0800, Hush Bensen wrote:
>>> On 07/24/2013 04:45 AM, Toshi Kani wrote:
>>>> On Tue, 2013-07-23 at 10:01 +0200, Ingo Molnar wrote:
>>>>> * Toshi Kani <toshi.kani@...com> wrote:
>>>>>
>>>>>>> Could we please also fix it to never crash the kernel, even if stupid
>>>>>>> ranges are provided?
>>>>>> Yes, this probe interface can be enhanced to verify the firmware
>>>>>> information before adding a given memory address.  However, such change
>>>>>> would interfere its test use of "fake" hotplug, which is only the known
>>>>>> use-case of this interface on x86.
>>>>> Not crashing the kernel is not a novel concept even for test interfaces...
>>>> Agreed.
>>>>
>>>>> Where does the possible crash come from - from using invalid RAM ranges,
>>>>> right? I.e. on x86 to fix the crash we need to check the RAM is present in
>>>>> the e820 maps, is marked RAM there, and is not already registered with the
>>>>> kernel, or so?
>>>> Yes, the crash comes from using invalid RAM ranges.  How to check if the
>>>> RAM is present is different if the system supports hotplug or not.
>>> Could you explain different methods to check the RAM is present if the
>>> system supports hotplkug or not?
>> e820 and UEFI memory descriptor tables are the boot-time interfaces.
>> These interfaces are not required to reflect any run-time changes.
>>
>> ACPI memory device objects can be used at both boot-time and run-time,
>> which reflect any run-time changes.  But they are optional to implement.
>> They typically are not implemented unless the system supports hotplug.
>>
>>>>>> In order to verify if a given memory address is enabled at run-time (as
>>>>>> opposed to boot-time), we need to check with ACPI memory device objects
>>>>>> on x86.  However, system vendors tend to not implement memory device
>>>>>> objects unless their systems support memory hotplug.  Dave Hansen is
>>>>>> using this interface for his testing as a way to fake a hotplug event on
>>>>>> a system that does not support memory hotplug.
>>>>> All vendors implement e820 maps for the memory present at boot time.
>>>> Yes for boot time.  At run-time, e820 is not guaranteed to represent a
>>>> new memory added.  Here is a quote from ACPI spec.
>>>>
>>>> ===
>>>> 15.1 INT 15H, E820H - Query System Address Map
>>>>    :
>>>> The memory map conveyed by this interface is not required to reflect any
>>>> changes in available physical memory that have occurred after the BIOS
>>>> has initially passed control to the operating system. For example, if
>>>> memory is added dynamically, this interface is not required to reflect
>>>> the new system memory configuration.
>>>> ===
>>>>
>>>> By definition, the "probe" interface is used for the kernel to recognize
>>>> a new memory added at run-time.  So, it should check ACPI memory device
>>>> objects (which represents run-time state) for the verification.  On x86,
>>>> however, ACPI also sends a hotplug event to the kernel, which triggers
>>>> the kernel to recognize the new physical memory properly.  Hence, users
>>>> do not need this "probe" interface.
>>>>
>>>>> How is the testing done by Dave Hansen? If it's done by booting with less
>>>>> RAM than available (via say the mem=1g boot parameter), and then
>>>>> hot-adding some of the missing RAM, then this could be made safe via the
>>>>> e820 maps and by consultig the physical memory maps (to avoid double
>>>>> registry), right?
>>>> If we focus on this test scenario on a system that does not support
>>>> hotplug, yes, I agree that we can check with e820 since it is safe to
>>>> assume that the system has no change after boot.  IOW, it is unsafe to
>>>> check with e820 if the system supports hotplug, but there is no use in
>>>> this interface for testing if the system supports hotplug.  So, this may
>>>> be a good idea.
>>>>
>>>> Dave, is this how you are testing?  Do you always specify a valid memory
>>>> address for your testing?
>>>>
>>>>> How does the hotplug event based approach solve double adds? Relies on the
>>>>> hardware not sending a hot-add event twice for the same memory area or for
>>>>> an invalid memory area, or does it include fail-safes and double checks as
>>>>> well to avoid double adds and adding invalid memory? If yes then that
>>>>> could be utilized here as well.
>>>> In high-level, here is how ACPI memory hotplug works:
>>>>
>>>> 1. ACPI sends a hotplug event to a new ACPI memory device object that is
>>>> hot-added.
>>>> 2. The kernel is notified, and verifies if the new memory device object
>>>> has not been attached by any handler yet.
>>>> 3. The memory handler is called, and obtains a new memory range from the
>>>> ACPI memory device object.
>>>> 4. The memory handler calls add_memory() with the new address range.
>>>>
>>>> The above step 1-4 proceeds automatically within the kernel.  No user
>>>> input (nor sysfs interface) is necessary.  Step 2 prevents double adds
>>>> and step 3 gets a valid address range from the firmware directly.  Step
>>>> 4 is basically the same as the "probe" interface, but with all the
>>>> verification up front, this step is safe.
>>> This is hot-added part, could you also explain how ACPI memory hotplug
>>> works for hot-remove?
>> Sure.  Here is high-level.
>>
>> 1. ACPI sends a hotplug event to an ACPI memory device object that is
>> requested to hot-remove.
>> 2. The kernel is notified, and verifies if the memory device object is
>> attached by a handler.
>> 3. The memory handler is called (which is being attached), and obtains
>> its memory range.
>> 4. The memory handler calls remove_memory() with the address range.
>> 5. The kernel calls eject method of the ACPI memory device object.
>
> If hot remove the memory device by the hardware, or writing 1 to
> /sys/bus/acpi/devices/PNP0C80:XX/eject both will call eject method?
Yes.
Both operations will call eject method.
> What's the difference between these two methods? I guess the former will send SCI and the latter won't.
Triggers are different. Former is triggered by SCI, latter is triggered by
writing sysfs.
Thanks,
Yasuaki Ishimatsu
>
>>
>> Thanks,
>> -Toshi
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
