[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20230215223355.102508-1-sj@kernel.org>
Date: Wed, 15 Feb 2023 22:33:55 +0000
From: SeongJae Park <sj@...nel.org>
To: David Hildenbrand <david@...hat.com>
Cc: SeongJae Park <sj@...nel.org>, akpm@...ux-foundation.org,
osalvador@...e.de, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success
On Wed, 15 Feb 2023 21:00:50 +0100 David Hildenbrand <david@...hat.com> wrote:
> On 15.02.23 19:03, SeongJae Park wrote:
> > On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@...hat.com> wrote:
> >
> >> On 14.02.23 23:32, SeongJae Park wrote:
> >>> do_migrate_range() returns migrate_pages() return value, which zero
> >>> means perfect success, in usual cases. If all pages are failed to be
> >>> isolated, however, it returns isolate_{lru,movalbe}_page() return
> >>> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned. So
> >>> do_migrate_range() returning zero means either perfect success, or
> >>> special cases of isolation total failure.
> >>>
> >>> Actually, the return value is not checked by any caller, so it might be
> >>> better to simply make it a void function. However, there is a TODO for
> >>> checking the return value.
> >>
> >> I'd prefer to not add more dead code ;) Let's not return an error instead.
> >
> > Makes sense, I will send next spin soon.
> >
> >>
> >> It's still unclear which kind of fatal migration issues we actually care
> >> about and how to really detect them.
> >
> > What do you think about treating the isolation/migration rate limit
> > (migrate_rs) hit in do_migrate_range() as fatal? It warns for the event
> > already, so definitely a bad sign.
> >
> > If that's not that bad enough to be treated as fatal, I think we could have yet
> > another rate limit to be considered fatal.
>
> IIRC, there are some setups where offlining might take several minutes
> (e.g., heavy O_DIRECT load) and that's to be expected.
>
> So the existing code warns for better debugging, but keeps trying. So
> the ratelimit is rather to not produce too much debug output, not to
> really indicate that something is fatal.
Thank you for clarification, David!
Thanks,
SJ
>
> --
> Thanks,
>
> David / dhildenb
Powered by blists - more mailing lists