[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40bad572-501d-e4cf-80e3-9a8daa98dc7e@redhat.com>
Date: Mon, 16 Aug 2021 16:10:28 +0200
From: David Hildenbrand <david@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Khalid Aziz <khalid.aziz@...cle.com>,
"Longpeng (Mike, Cloud Infrastructure Service Product Dept.)"
<longpeng2@...wei.com>, Steven Sistare <steven.sistare@...cle.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"Gonglei (Arei)" <arei.gonglei@...wei.com>
Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC
>>> Until recently, the CPUs only having 4 1GB TLB entries. I'm sure we
>>> still have customers using that generation of CPUs. 2MB pages perform
>>> better than 1GB pages on the previous generation of hardware, and I
>>> haven't seen numbers for the next generation yet.
>>
>> I read that somewhere else before, yet we have heavy 1 GiB page users,
>> especially in the context of VMs and DPDK.
>
> I wonder if those users actually benchmarked. Or whether the memory
> savings worked out so well for them that the loss of TLB performance
> didn't matter.
These applications are extremely performance sensitive (i.e., RT
workloads), that's why I'm wondering. I recall that they are most
certainly using more than 4 GiB memory in real applications.
E.g., the doc [1] even has a note that "For 64-bit applications, it is
recommended to use 1 GB hugepages if the platform supports them."
[1] https://doc.dpdk.org/guides-16.04/linux_gsg/sys_reqs.html
>
>> So, it only works for hugetlbfs in case uffd is not in place (-> no
>> per-process data in the page table) and we have an actual shared mappings.
>> When unsharing, we zap the PUD entry, which will result in allocating a
>> per-process page table on next fault.
>
> I think uffd was a huge mistake. It should have been a filesystem
> instead of a hack on the side of anonymous memory.
Yes it was. Especially, looking at all the special-casing, for example,
even in mm/pagewalk.c.
>
>> I will rephrase my previous statement "hugetlbfs just doesn't raise these
>> problems because we are special casing it all over the place already". For
>> example, not allowing to swap such pages. Disallowing MADV_DONTNEED. Special
>> hugetlbfs locking.
>
> Sure, that's why I want to drag this feature out of "oh this is a
> hugetlb special case" and into "this is something Linux supports".
I would have understood the move to optimize SHMEM internally - similar
to how we seem to optimize hugetlbfs SHMEM right now internally.
(although sharing page tables for shmem can still be quite tricky)
I did not follow why we have to play games with MAP_PRIVATE, and having
private anonymous pages shared between processes that don't COW,
introducing new syscalls etc.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists