[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20131210083244.GA24992@lge.com>
Date: Tue, 10 Dec 2013 17:32:44 +0900
From: Joonsoo Kim <iamjoonsoo.kim@....com>
To: Davidlohr Bueso <davidlohr@...com>
Cc: David Gibson <david@...son.dropbear.id.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
Michal Hocko <mhocko@...e.cz>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Hugh Dickins <hughd@...gle.com>,
Davidlohr Bueso <davidlohr.bueso@...com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Hillf Danton <dhillf@...il.com>
Subject: Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate and
there is concurrent user
On Mon, Dec 09, 2013 at 08:36:23AM -0800, Davidlohr Bueso wrote:
> On Mon, 2013-09-30 at 16:47 +0900, Joonsoo Kim wrote:
> > On Mon, Sep 16, 2013 at 10:09:09PM +1000, David Gibson wrote:
> > > > >
> > > > > > + *do_dequeue = false;
> > > > > > spin_unlock(&hugetlb_lock);
> > > > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
> > > > > > if (!page) {
> > > > >
> > > > > I think the counter also needs to be incremented in the case where we
> > > > > call alloc_buddy_huge_page() from alloc_huge_page(). Even though it's
> > > > > new, it gets added to the hugepage pool at this point and could still
> > > > > be a contended page for the last allocation, unless I'm missing
> > > > > something.
> > > >
> > > > Your comment has reasonable point to me, but I have a different opinion.
> > > >
> > > > As I already mentioned, the point is that we want to avoid the race
> > > > which kill the legitimate users of hugepages by out of resources.
> > > > I increase 'h->nr_dequeue_users' when the hugepage allocated by
> > > > administrator is dequeued. It is because what the hugepage I want to
> > > > protect from the race is the one allocated by administrator via
> > > > kernel param or /proc interface. Administrator may already know how many
> > > > hugepages are needed for their application so that he may set nr_hugepage
> > > > to reasonable value. I want to guarantee that these hugepages can be used
> > > > for his application without any race, since he assume that the application
> > > > would work fine with these hugepages.
> > > >
> > > > To protect hugepages returned from alloc_buddy_huge_page() from the race
> > > > is different for me. Although it will be added to the hugepage pool, this
> > > > doesn't guarantee certain application's success more. If certain
> > > > application's success depends on the race of this new hugepage, it's death
> > > > by the race doesn't matter, since nobody assume that it works fine.
> > >
> > > Hrm. I still think this path should be included. Although I'll agree
> > > that failing in this case is less bad.
> > >
> > > However, it can still lead to a situation where with two processes or
> > > threads, faulting on exactly the same shared page we have one succeed
> > > and the other fail. That's a strange behaviour and I think we want to
> > > avoid it in this case too.
> >
> > Hello, David.
> >
> > I don't think it is a strange behaviour. Similar situation can occur
> > even though we use the mutex. Hugepage allocation can be failed when
> > the first process try to allocate the hugepage while second process is blocked
> > by the mutex. And then, second process will go into the fault handler. And
> > at this time, it can succeed. So result is that we have one succeed and
> > the other fail.
> >
> > It is slightly different from the case you mentioned, but I think that
> > effect for user is same. We cannot avoid this kind of race completely and
> > I think that avoiding the race for administrator managed hugepage pool is
> > good enough to use.
>
> What was the final decision on this issue? Is Joonsoo's approach to
> removing this mutex viable, or are we stuck with it?
Hello.
After rebasing on current kernel, I will repost it soon.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists