lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20230215223355.102508-1-sj@kernel.org>
Date:   Wed, 15 Feb 2023 22:33:55 +0000
From:   SeongJae Park <sj@...nel.org>
To:     David Hildenbrand <david@...hat.com>
Cc:     SeongJae Park <sj@...nel.org>, akpm@...ux-foundation.org,
        osalvador@...e.de, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/memory_hotplug: return zero from do_migrate_range() for only success

On Wed, 15 Feb 2023 21:00:50 +0100 David Hildenbrand <david@...hat.com> wrote:

> On 15.02.23 19:03, SeongJae Park wrote:
> > On Wed, 15 Feb 2023 14:16:05 +0100 David Hildenbrand <david@...hat.com> wrote:
> > 
> >> On 14.02.23 23:32, SeongJae Park wrote:
> >>> do_migrate_range() returns migrate_pages() return value, which zero
> >>> means perfect success, in usual cases.  If all pages are failed to be
> >>> isolated, however, it returns isolate_{lru,movalbe}_page() return
> >>> values, or zero if all pfn were invalid, were hugetlb or hwpoisoned.  So
> >>> do_migrate_range() returning zero means either perfect success, or
> >>> special cases of isolation total failure.
> >>>
> >>> Actually, the return value is not checked by any caller, so it might be
> >>> better to simply make it a void function.  However, there is a TODO for
> >>> checking the return value.
> >>
> >> I'd prefer to not add more dead code ;) Let's not return an error instead.
> > 
> > Makes sense, I will send next spin soon.
> > 
> >>
> >> It's still unclear which kind of fatal migration issues we actually care
> >> about and how to really detect them.
> > 
> > What do you think about treating the isolation/migration rate limit
> > (migrate_rs) hit in do_migrate_range() as fatal?  It warns for the event
> > already, so definitely a bad sign.
> > 
> > If that's not that bad enough to be treated as fatal, I think we could have yet
> > another rate limit to be considered fatal.
> 
> IIRC, there are some setups where offlining might take several minutes 
> (e.g., heavy O_DIRECT load) and that's to be expected.
> 
> So the existing code warns for better debugging, but keeps trying. So 
> the ratelimit is rather to not produce too much debug output, not to 
> really indicate that something is fatal.

Thank you for clarification, David!


Thanks,
SJ

> 
> -- 
> Thanks,
> 
> David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ