[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f5a65bf5-5105-4376-9c1c-164a15a4ab79@redhat.com>
Date: Mon, 9 Dec 2024 15:22:31 +0100
From: David Hildenbrand <david@...hat.com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: yuzhao@...gle.com, akpm@...ux-foundation.org, willy@...radead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] mm: remove an avoidable load of page refcount in
page_ref_add_unless
On 09.12.24 13:33, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 11:56 AM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 09.12.24 11:25, Mateusz Guzik wrote:
>>> On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@...hat.com> wrote:
>>>>
>>>> On 07.12.24 09:29, Mateusz Guzik wrote:
>>>>> Explicitly pre-checking the count adds nothing as atomic_add_unless
>>>>> starts with doing the same thing. iow no functional changes.
>>>>
>>>> I recall that we added that check because with the hugetlb vmemmap
>>>> optimization, some of the tail pages we don't ever expect to be modified
>>>> (because they are fake-duplicated) might be mapped R/O.
>>>>
>>>> If the arch implementation of atomic_add_unless() would trigger an
>>>> unconditional write fault, we'd be in trouble. That would likely only be
>>>> the case if the arch provides a dedicate instruction.
>>>>
>>>> atomic_add_unless()->raw_atomic_add_unless()
>>>>
>>>> Nobody currently defines arch_atomic_add_unless().
>>>>
>>>> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
>>>> on some architectures.
>>>>
>>>> I scanned some of the inline-asm, and I think most of them perform a
>>>> check first.
>>>>
>>>
>>> Huh.
>>>
>>> Some arch triggering a write fault despite not changing the value is
>>> not something I thought about. Sounds pretty broken to me if any arch
>>> was to do it, but then stranger things did happen.
>>
>> Yeah, it really depends on what the architecture defines. For example,
>> on s390x for "COMPARE AND SWAP" the spec states something like
>>
> [snip]
>
> Well in this context you need to do the initial load to even know what
> to CAS with, unless you want to blindly do it hoping to get lucky,
> which I'm assuming no arch is doing.
>
> Granted, if there was an architecture which had an actual "cas unless
> the value is x" then this would not hold, but I don't know of any.
> [such an extension would be most welcome fwiw]
Apparently, we prepared for that via arch_atomic_add_unless(), which has no users.
>
> Assuming you indeed want the patch after all, can you sort out adding
> a comment to atomic_add_unless yourself? ;) I presume you know the
> right people and whatnot, so this would cut down on back and forth.
>
> That is to say I think this thread just about exhausted the time
> warranted by this patch. No hard feelz if it gets dropped, but then I
> do strongly suggest adding a justification to the extra load.
Maybe it's sufficient for now to simply do your change with a comment:
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 8c236c651d1d6..1efc992ad5687 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
rcu_read_lock();
/* avoid writing to the vmemmap area being remapped */
- if (!page_is_fake_head(page) && page_ref_count(page) != u)
+ if (!page_is_fake_head(page))
+ /*
+ * atomic_add_unless() will currently never modify the value
+ * if it already is u. If that ever changes, we'd have to have
+ * a separate check here, such that we won't be writing to
+ * write-protected vmemmap areas.
+ */
ret = atomic_add_unless(&page->_refcount, nr, u);
rcu_read_unlock();
It would bail out during testing ... hopefully, such that we can detect any such change.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists