[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59AD15B6.7080304@huawei.com>
Date: Mon, 4 Sep 2017 16:58:30 +0800
From: Xishi Qiu <qiuxishi@...wei.com>
To: Michal Hocko <mhocko@...nel.org>
CC: Andrew Morton <akpm@...ux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Reza Arbab <arbab@...ux.vnet.ibm.com>,
Yasuaki Ishimatsu <yasu.isimatu@...il.com>,
Igor Mammedov <imammedo@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>, <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory
On 2017/9/4 16:21, Michal Hocko wrote:
> From: Michal Hocko <mhocko@...e.com>
>
> We have a hardcoded 120s timeout after which the memory offline fails
> basically since the hot remove has been introduced. This is essentially
> a policy implemented in the kernel. Moreover there is no way to adjust
> the timeout and so we are sometimes facing memory offline failures if
> the system is under a heavy memory pressure or very intensive CPU
> workload on large machines.
>
> It is not very clear what purpose the timeout actually serves. The
> offline operation is interruptible by a signal so if userspace wants
Hi Michal,
If the user know what he should do if migration for a long time,
it is OK, but I don't think all the users know this operation
(e.g. ctrl + c) and the affect.
Thanks,
Xishi Qiu
> some timeout based termination this can be done trivially by sending a
> signal.
>
> If there is a strong usecase to do this from the kernel then we should
> do it properly and have a it tunable from the userspace with the timeout
> disabled by default along with the explanation who uses it and for what
> purporse.
>
> Signed-off-by: Michal Hocko <mhocko@...e.com>
> ---
> mm/memory_hotplug.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c9dcbe6d2ac6..b8a85c11360e 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1593,9 +1593,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
> }
>
> static int __ref __offline_pages(unsigned long start_pfn,
> - unsigned long end_pfn, unsigned long timeout)
> + unsigned long end_pfn)
> {
> - unsigned long pfn, nr_pages, expire;
> + unsigned long pfn, nr_pages;
> long offlined_pages;
> int ret, node;
> unsigned long flags;
> @@ -1633,12 +1633,8 @@ static int __ref __offline_pages(unsigned long start_pfn,
> goto failed_removal;
>
> pfn = start_pfn;
> - expire = jiffies + timeout;
> repeat:
> /* start memory hot removal */
> - ret = -EBUSY;
> - if (time_after(jiffies, expire))
> - goto failed_removal;
> ret = -EINTR;
> if (signal_pending(current))
> goto failed_removal;
> @@ -1711,7 +1707,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
> /* Must be protected by mem_hotplug_begin() or a device_lock */
> int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
> {
> - return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
> + return __offline_pages(start_pfn, start_pfn + nr_pages);
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
Powered by blists - more mailing lists