[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1375075929-6119-1-git-send-email-iamjoonsoo.kim@lge.com>
Date: Mon, 29 Jul 2013 14:31:51 +0900
From: Joonsoo Kim <iamjoonsoo.kim@....com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
Michal Hocko <mhocko@...e.cz>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Hugh Dickins <hughd@...gle.com>,
Davidlohr Bueso <davidlohr.bueso@...com>,
David Gibson <david@...son.dropbear.id.au>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Joonsoo Kim <js1304@...il.com>,
Wanpeng Li <liwanp@...ux.vnet.ibm.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Hillf Danton <dhillf@...il.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: [PATCH 00/18] mm, hugetlb: remove a hugetlb_instantiation_mutex
Without a hugetlb_instantiation_mutex, if parallel fault occur, we can
fail to allocate a hugepage, because many threads dequeue a hugepage
to handle a fault of same address. This makes reserved pool shortage
just for a little while and this cause faulting thread who is ensured
to have enough reserved hugepages to get a SIGBUS signal.
To solve this problem, we already have a nice solution, that is,
a hugetlb_instantiation_mutex. This blocks other threads to dive into
a fault handler. This solve the problem clearly, but it introduce
performance degradation, because it serialize all fault handling.
Now, I try to remove a hugetlb_instantiation_mutex to get rid of
performance problem reported by Davidlohr Bueso [1].
It is implemented by following 3-steps.
Step 1. Protect region tracking via per region spin_lock.
Currently, region tracking is protected by a
hugetlb_instantiation_mutex, so before removing it, we should
replace it with another solution.
Step 2. Decide whether we use reserved page pool or not by an uniform way.
We need a graceful failure handling if there is no lock like as
hugetlb_instantiation_mutex. To decide whether we need to handle
a failure or not, we need to know current status properly.
Step 3. Graceful failure handling if we failed with reserved page or
failed to allocate with use_reserve.
Failure handling consist of two cases. One is if we failed with
having reserved page, we return back to reserved pool properly.
Current code doesn't recover a reserve count properly, so we need
to fix it. The other is if we failed to allocate a new huge page
with use_reserve indicator, we return 0 to fault handler,
instead of SIGBUS. This makes this thread retrying fault handling.
With above handlings, we can succeed to handle a fault
on any situation without a hugetlb_instantiation_mutex.
Patch 1: Fix a minor problem
Patch 2-5: Implement Step 1.
Patch 6-11: Implement Step 2.
Patch 12-18: Implement Step 3.
These patches are based on my previous patchset [2].
[2] is based on v3.10.
With applying these, I passed a libhugetlbfs test suite clearly which
have allocation-instantiation race test cases.
If there is a something I should consider, please let me know!
Thanks.
[1] http://lwn.net/Articles/558863/
"[PATCH] mm/hugetlb: per-vma instantiation mutexes"
[2] https://lkml.org/lkml/2013/7/22/96
"[PATCH v2 00/10] mm, hugetlb: clean-up and possible bug fix"
Joonsoo Kim (18):
mm, hugetlb: protect reserved pages when softofflining requests the
pages
mm, hugetlb: change variable name reservations to resv
mm, hugetlb: unify region structure handling
mm, hugetlb: region manipulation functions take resv_map rather
list_head
mm, hugetlb: protect region tracking via newly introduced resv_map
lock
mm, hugetlb: remove vma_need_reservation()
mm, hugetlb: pass has_reserve to dequeue_huge_page_vma()
mm, hugetlb: do hugepage_subpool_get_pages() when avoid_reserve
mm, hugetlb: unify has_reserve and avoid_reserve to use_reserve
mm, hugetlb: call vma_has_reserve() before entering alloc_huge_page()
mm, hugetlb: move down outside_reserve check
mm, hugetlb: remove a check for return value of alloc_huge_page()
mm, hugetlb: grab a page_table_lock after page_cache_release
mm, hugetlb: clean-up error handling in hugetlb_cow()
mm, hugetlb: move up anon_vma_prepare()
mm, hugetlb: return a reserved page to a reserved pool if failed
mm, hugetlb: retry if we fail to allocate a hugepage with use_reserve
mm, hugetlb: remove a hugetlb_instantiation_mutex
fs/hugetlbfs/inode.c | 12 +-
include/linux/hugetlb.h | 10 ++
mm/hugetlb.c | 361 +++++++++++++++++++++++++----------------------
3 files changed, 217 insertions(+), 166 deletions(-)
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists