linux-kernel - Re: [PATCH RFC v2 19/27] mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for mprotect(PROT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <92833873-cd70-44b0-9f34-f4ac11b9e498@redhat.com>
Date:   Thu, 30 Nov 2023 15:39:35 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Alexandru Elisei <alexandru.elisei@....com>
Cc:     Hyesoo Yu <hyesoo.yu@...sung.com>, catalin.marinas@....com,
        will@...nel.org, oliver.upton@...ux.dev, maz@...nel.org,
        james.morse@....com, suzuki.poulose@....com, yuzenghui@...wei.com,
        arnd@...db.de, akpm@...ux-foundation.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, mhiramat@...nel.org,
        rppt@...nel.org, hughd@...gle.com, pcc@...gle.com,
        steven.price@....com, anshuman.khandual@....com,
        vincenzo.frascino@....com, eugenis@...gle.com, kcc@...gle.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        kvmarm@...ts.linux.dev, linux-fsdevel@...r.kernel.org,
        linux-arch@...r.kernel.org, linux-mm@...ck.org,
        linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH RFC v2 19/27] mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS
 for mprotect(PROT_MTE)

On 30.11.23 15:33, Alexandru Elisei wrote:
> On Thu, Nov 30, 2023 at 02:43:48PM +0100, David Hildenbrand wrote:
>> On 30.11.23 14:32, Alexandru Elisei wrote:
>>> Hi,
>>>
>>> On Thu, Nov 30, 2023 at 01:49:34PM +0100, David Hildenbrand wrote:
>>>>>>> +
>>>>>>> +out_retry:
>>>>>>> +	put_page(page);
>>>>>>> +	if (vmf->flags & FAULT_FLAG_VMA_LOCK)
>>>>>>> +		vma_end_read(vma);
>>>>>>> +	if (fault_flag_allow_retry_first(vmf->flags)) {
>>>>>>> +		err = VM_FAULT_RETRY;
>>>>>>> +	} else {
>>>>>>> +		/* Replay the fault. */
>>>>>>> +		err = 0;
>>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> Unfortunately, if the page continues to be pinned, it seems like fault will continue to occur.
>>>>>> I guess it makes system stability issue. (but I'm not familiar with that, so please let me know if I'm mistaken!)
>>>>>>
>>>>>> How about migrating the page when migration problem repeats.
>>>>>
>>>>> Yes, I had the same though in the previous iteration of the series, the
>>>>> page was migrated out of the VMA if tag storage couldn't be reserved.
>>>>>
>>>>> Only short term pins are allowed on MIGRATE_CMA pages, so I expect that the
>>>>> pin will be released before the fault is replayed. Because of this, and
>>>>> because it makes the code simpler, I chose not to migrate the page if tag
>>>>> storage couldn't be reserved.
>>>>
>>>> There are still some cases that are theoretically problematic: vmsplice()
>>>> can pin pages forever and doesn't use FOLL_LONGTERM yet.
>>>>
>>>> All these things also affect other users that rely on movability (e.g., CMA,
>>>> memory hotunplug).
>>>
>>> I wasn't aware of that, thank you for the information. Then to ensure that the
>>> process doesn't hang by replying the loop indefinitely, I'll migrate the page if
>>> tag storage cannot be reserved. Looking over the code again, I think I can reuse
>>> the same function that migrates tag storage pages out of the MTE VMA (added in
>>> patch #21), so no major changes needed.
>>
>> It's going to be interesting if migrating that page fails because it is
>> pinned :/
> 
> I imagine that having both the page **and** its tag storage pinned longterm
> without FOLL_LONGTERM is going to be exceedingly rare.

Yes. I recall that the rule of thumb is that some O_DIRECT I/O can take 
up to 10 seconds, although extremely rare (and maybe not applicable on 
arm64).

> 
> Am I mistaken in believing that the problematic vmsplice() behaviour is
> recognized as something that needs to be fixed?

Yes, for a couple of years  I'm hoping this will actually get fixed now 
that O_DIRECT mostly uses FOLL_PIN instead of FOLL_GET.

-- 
Cheers,

David / dhildenb