[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3636af63-8865-4ef5-a2f3-2a6538aca873@redhat.com>
Date: Thu, 1 Aug 2024 15:15:31 +0200
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>,
akpm@...ux-foundation.org, willy@...radead.org
Cc: anshuman.khandual@....com, catalin.marinas@....com, cl@...two.org,
vbabka@...e.cz, mhocko@...e.com, apopple@...dia.com, osalvador@...e.de,
baolin.wang@...ux.alibaba.com, dave.hansen@...ux.intel.com, will@...nel.org,
baohua@...nel.org, ioworker0@...il.com, gshan@...hat.com,
mark.rutland@....com, kirill.shutemov@...ux.intel.com, hughd@...gle.com,
aneesh.kumar@...nel.org, yang@...amperecomputing.com, peterx@...hat.com,
broonie@...nel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: Race condition observed between page migration and page fault
handling on arm64 machines
>> What I am still missing is why this is (a) arm64 only; and (b) if this is
>> something we should really worry about. There are other reasons (e.g.,
>> speculative references) why migration could temporarily fail, does it happen
>> that often that it is really something we have to worry about?
>
> The test fails consistently on arm64. It's my rough understanding that it's
> failing due to migration backing off because the fault handler has raised the
> ref count? (Dev correct me if I'm wrong).
>
> So the real question is, is it a valid test in the first place? Should we just
> delete the test or do we need to strengthen the kernel's guarrantees around
> migration success?
I think the test should retry migration a number of times in case it
fails. But if it is a persistent migration failure, the test should fail.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists