[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231127084217.13110-1-v-songbaohua@oppo.com>
Date: Mon, 27 Nov 2023 21:42:17 +1300
From: Barry Song <21cnbao@...il.com>
To: david@...hat.com
Cc: akpm@...ux-foundation.org, andreyknvl@...il.com,
anshuman.khandual@....com, ardb@...nel.org,
catalin.marinas@....com, dvyukov@...gle.com, glider@...gle.com,
james.morse@....com, jhubbard@...dia.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, mark.rutland@....com, maz@...nel.org,
oliver.upton@...ux.dev, ryabinin.a.a@...il.com,
ryan.roberts@....com, suzuki.poulose@....com,
vincenzo.frascino@....com, wangkefeng.wang@...wei.com,
will@...nel.org, willy@...radead.org, yuzenghui@...wei.com,
yuzhao@...gle.com, ziy@...dia.com
Subject: Re: Re: [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork()
>> + for (i = 0; i < nr; i++, page++) {
>> + if (anon) {
>> + /*
>> + * If this page may have been pinned by the
>> + * parent process, copy the page immediately for
>> + * the child so that we'll always guarantee the
>> + * pinned page won't be randomly replaced in the
>> + * future.
>> + */
>> + if (unlikely(page_try_dup_anon_rmap(
>> + page, false, src_vma))) {
>> + if (i != 0)
>> + break;
>> + /* Page may be pinned, we have to copy. */
>> + return copy_present_page(
>> + dst_vma, src_vma, dst_pte,
>> + src_pte, addr, rss, prealloc,
>> + page);
>> + }
>> + rss[MM_ANONPAGES]++;
>> + VM_BUG_ON(PageAnonExclusive(page));
>> + } else {
>> + page_dup_file_rmap(page, false);
>> + rss[mm_counter_file(page)]++;
>> + }
>> }
>> - rss[MM_ANONPAGES]++;
>> - } else if (page) {
>> - folio_get(folio);
>> - page_dup_file_rmap(page, false);
>> - rss[mm_counter_file(page)]++;
>> +
>> + nr = i;
>> + folio_ref_add(folio, nr);
>
> You're changing the order of mapcount vs. refcount increment. Don't.
> Make sure your refcount >= mapcount.
>
> You can do that easily by doing the folio_ref_add(folio, nr) first and
> then decrementing in case of error accordingly. Errors due to pinned
> pages are the corner case.
>
> I'll note that it will make a lot of sense to have batch variants of
> page_try_dup_anon_rmap() and page_dup_file_rmap().
>
i still don't understand why it is not a entire map+1, but an increment
in each basepage.
as long as it is a CONTPTE large folio, there is no much difference with
PMD-mapped large folio. it has all the chance to be DoubleMap and need
split.
When A and B share a CONTPTE large folio, we do madvise(DONTNEED) or any
similar things on a part of the large folio in process A,
this large folio will have partially mapped subpage in A (all CONTPE bits
in all subpages need to be removed though we only unmap a part of the
large folioas HW requires consistent CONTPTEs); and it has entire map in
process B(all PTEs are still CONPTES in process B).
isn't it more sensible for this large folios to have entire_map = 0(for
process B), and subpages which are still mapped in process A has map_count
=0? (start from -1).
> Especially, the batch variant of page_try_dup_anon_rmap() would only
> check once if the folio maybe pinned, and in that case, you can simply
> drop all references again. So you either have all or no ptes to process,
> which makes that code easier.
>
> But that can be added on top, and I'll happily do that.
>
> --
> Cheers,
>
> David / dhildenb
Thanks
Barry
Powered by blists - more mailing lists