[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bada8934-5b94-46f2-b44d-4d75e630a4d6@redhat.com>
Date: Mon, 23 Jun 2025 17:18:38 +0200
From: David Hildenbrand <david@...hat.com>
To: Tal Zussman <tz2294@...umbia.edu>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Peter Xu <peterx@...hat.com>,
"Jason A. Donenfeld" <Jason@...c4.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v2 2/4] userfaultfd: remove (VM_)BUG_ON()s
On 18.06.25 01:10, Tal Zussman wrote:
> On Tue, Jun 10, 2025 at 3:26 AM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 07.06.25 08:40, Tal Zussman wrote:
>>>
>>> if (ctx->features & UFFD_FEATURE_SIGBUS)
>>> goto out;
>>> @@ -411,12 +411,11 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>>> * to be sure not to return SIGBUS erroneously on
>>> * nowait invocations.
>>> */
>>> - BUG_ON(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);
>>> + VM_WARN_ON_ONCE(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);
>>> #ifdef CONFIG_DEBUG_VM
>>> if (printk_ratelimit()) {
>>> - printk(KERN_WARNING
>>> - "FAULT_FLAG_ALLOW_RETRY missing %x\n",
>>> - vmf->flags);
>>> + pr_warn("FAULT_FLAG_ALLOW_RETRY missing %x\n",
>>> + vmf->flags);
>>
>> You didn't cover that in the patch description.
>>
>> I do wonder if we really still want the dump_stack() here and could
>> simplify to
>>
>> pr_warn_ratelimited().
>>
>> But I recall that the stack was helpful at least once for me (well, I
>> was able to reproduce and could have figured it out differently.).
>
> I'll include this in the description as well. Given that you found the stack
> helpful before, I'll leave it as-is for now.
>
>> [...]
>>
>>> err = uffd_move_lock(mm, dst_start, src_start, &dst_vma, &src_vma);
>>> if (err)
>>> @@ -1867,9 +1865,9 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
>>> up_read(&ctx->map_changing_lock);
>>> uffd_move_unlock(dst_vma, src_vma);
>>> out:
>>> - VM_WARN_ON(moved < 0);
>>> - VM_WARN_ON(err > 0);
>>> - VM_WARN_ON(!moved && !err);
>>> + VM_WARN_ON_ONCE(moved < 0);
>>> + VM_WARN_ON_ONCE(err > 0);
>>> + VM_WARN_ON_ONCE(!moved && !err);
>>> return moved ? moved : err;
>>
>>
>> Here you convert VM_WARN_ON to VM_WARN_ON_ONCE without stating it in the
>> description (including the why).
>
> Will update in the description. These checks represent impossible conditions
> and are analogous to the BUG_ON()s in move_pages(), but were added later.
> So instead of BUG_ON(), they use VM_WARN_ON() as a replacement when
> VM_WARN_ON_ONCE() is likely a better fit, as per other conversions.
>
>>> @@ -1956,9 +1954,9 @@ int userfaultfd_register_range(struct userfaultfd_ctx *ctx,
>>> for_each_vma_range(vmi, vma, end) {
>>> cond_resched();
>>>
>>> - BUG_ON(!vma_can_userfault(vma, vm_flags, wp_async));
>>> - BUG_ON(vma->vm_userfaultfd_ctx.ctx &&
>>> - vma->vm_userfaultfd_ctx.ctx != ctx);
>>> + VM_WARN_ON_ONCE(!vma_can_userfault(vma, vm_flags, wp_async));
>>> + VM_WARN_ON_ONCE(vma->vm_userfaultfd_ctx.ctx &&
>>> + vma->vm_userfaultfd_ctx.ctx != ctx);
>>> WARN_ON(!(vma->vm_flags & VM_MAYWRITE));
>>
>> Which raises the question, why this here should still be a WARN
>
> After checking it, it looks like the relevant condition is enforced in the
> registration path, so this can be converted to a debug check. I'll update
> the patch accordingly.
Sorry for the late reply. Yeah, a lot of that probably needs a cleanup.
>
> However, I think the checks in userfaultfd_register() may be redundant.
> First, it checks !(cur->vm_flags & VM_MAYWRITE). Then, after a hugetlb
> check, it checks
> ((vm_flags & VM_UFFD_WP) && !(cur->vm_flags & VM_MAYWRITE)), which should
> never be hit.
Yes, likely.
>
> Am I missing something?
>
> Additionally, the comment above the first !(cur->vm_flags & VM_MAYWRITE)
> check is a bit confusing. It seems to be based on the fact that file-backed
> MAP_SHARED mappings without write permissions will not have VM_MAYWRITE set,
> while MAP_PRIVATE mappings will always(?) have it set, but doesn't say it as
> explicitly. Am I understanding this check correctly?
private anon mappings should always have VM_MAYWRITE. (there is no
driver to really change that).
For other mappings (MAP_SHARED), VM_MAYWRITE is cleared for R/O files I
think. VM_MAYWRITE really just tells you that you can do
mprotect(PROT_WRITE) later, to effectively set VM_WRITE.
So if VM_MAYWRITE is clear, we cannot possibly write to such mappings
later (not even after mprotect()).
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists