linux-kernel - Re: [RFC PATCH 1/8] hugetlb: add per-hstate mutex to synchronize user adjustments

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YFmdPMBKcc858fUg@dhcp22.suse.cz>
Date:   Tue, 23 Mar 2021 08:48:12 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Shakeel Butt <shakeelb@...gle.com>,
        Oscar Salvador <osalvador@...e.de>,
        David Hildenbrand <david@...hat.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Rientjes <rientjes@...gle.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Matthew Wilcox <willy@...radead.org>,
        HORIGUCHI NAOYA <naoya.horiguchi@....com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
        Waiman Long <longman@...hat.com>, Peter Xu <peterx@...hat.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH 1/8] hugetlb: add per-hstate mutex to synchronize
 user adjustments

On Mon 22-03-21 09:57:14, Mike Kravetz wrote:
> On 3/22/21 6:59 AM, Michal Hocko wrote:
> > On Fri 19-03-21 15:42:02, Mike Kravetz wrote:
> >> The number of hugetlb pages can be adjusted by writing to the
> >> sysps/proc files nr_hugepages, nr_hugepages_mempolicy or
> >> nr_overcommit_hugepages.  There is nothing to prevent two
> >> concurrent modifications via these files.  The underlying routine
> >> set_max_huge_pages() makes assumptions that only one occurrence is
> >> running at a time.  Specifically, alloc_pool_huge_page uses a
> >> hstate specific variable without any synchronization.
> > 
> > From the above it is not really clear whether the unsynchronized nature
> > of set_max_huge_pages is really a problem or a mere annoynce. I suspect
> > the later because counters are properly synchronized with the
> > hugetlb_lock. It would be great to clarify that.
> >  
> 
> It is a problem and an annoyance.
> 
> The problem is that alloc_pool_huge_page -> for_each_node_mask_to_alloc is
> called after dropping the hugetlb lock.  for_each_node_mask_to_alloc
> uses the helper hstate_next_node_to_alloc which uses and modifies
> h->next_nid_to_alloc.  Worst case would be two instances of set_max_huge_pages
> trying to allocate pages on different sets of nodes.  Pages could get
> allocated on the wrong nodes.

Yes, what I meant by the annoyance. On the other hand a parallel access
to a global knob mantaining a global resource should be expected to
have some side effects without an external synchronization unless it is
explicitly documented that such an access is synchronized internally.

> I really doubt this problem has ever been experienced in practice.
> However, when looking at the code in was a real annoyance. :)

IMHO it would be a bit of a stretch to consider it a real life problem.
 
> I'll update the commit message to be more clear.

Thanks! Clarification will definitely help.
-- 
Michal Hocko
SUSE Labs