linux-kernel - Re: [PATCH v2] hugetlb: Add hugetlb.*.numa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <89a1fa3f-2c9b-34c9-761c-d0f21d0ba233@oracle.com>
Date:   Mon, 1 Nov 2021 16:19:34 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Shuah Khan <shuah@...nel.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Rientjes <rientjes@...gle.com>,
        Jue Wang <juew@...gle.com>, Yang Yao <ygyao@...gle.com>,
        Joanna Li <joannali@...gle.com>,
        Cannon Matthews <cannonmatthews@...gle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hugetlb: Add hugetlb.*.numa_stat file

On 10/31/21 1:39 PM, Mina Almasry wrote:
> On Wed, Oct 27, 2021 at 4:36 PM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>>
>> On 10/20/21 12:09 PM, Mina Almasry wrote:
>>
>> I have no objections to adding this functionality, and do not see any
>> blocking issues in hugetlb code.  However, it would be GREAT if someone
>> more familiar/experienced with cgroups would comment.  My cgroup
>> experience is very limited.
>>
> 
> I will send V2 to Shakeel as well from our team and ask him to take a
> look and he has more than enough experience to review. If anyone else
> reading with cgroups knowledge can Review/Ack that would be great.
> 
> It's possible I'm a bit off the mark here, but FWIW I don't think
> there is much that is continuous or ambiguous here. For
> memory.numa_stat it's a bit nuanced because there is anon, rss, shmem,
> etc.. but for hugetlb we just have hugetlb memory and the only care
> needed is hierarchical vs not in cgroups v1.
> 

I agree.  It is straight forward from a hugetlb POV.  Just would like to
get an ack from someone more familiar with cgroups.  To me it also looks
fine from a cgroups POV, but my cgroup knowledge is so limited that does
not mean much.

>> alloc_mem_cgroup_per_node_info provides similar functionality and has
>> the following comment.
>>
>>          * TODO: this routine can waste much memory for nodes which will
>>          *       never be onlined. It's better to use memory hotplug callback
>>          *       function.
>>
> 
> So the memory allocated per node in total is (HUGE_MAX_HSTATE *
> unsigned long * num_css_on_the system). HUGE_MAX_HSTATE is 2 on x86.
> This is significant, but to address this comment I have to complicate
> the hugetlb_cgroup code quite a bit so I thought I'd check with you if
> you think it's still OK/worth it. slub.c implements these changes
> (slab_memory_callback) and they are:
> 
> - When creating a new hugetlb_cgroup, I create per_node data for only
> online nodes.
> - On node online I need to loop over all existing hugetlb_cgroups and
> allocate per_node data. I need rcu_read_lock() here and below.
> - On node offline I need to loop over all existing hugetlb_cgroups and
> free per_node data.
> - If I follow the slub example, I need a lock to synchronize
> onlining/offlining with references to per_node data in commit_charge()
> uncharge_page() and show_numa_stats().
> 
> I don't think it's worth it TBH, but I'm happy to oblige if you think so.
> 

No need to do anything extra/special here.  I just noted that you would
be allocating memory for offline nodes and took a look to see if the
memory cgroup code handled this case.  They do not, and have this as a
TODO.  I think that is sufficient here.

Thanks,
-- 
Mike Kravetz