[<prev] [next>] [day] [month] [year] [list]
Message-ID: <6aecc2e3-030b-5a3f-2fee-14ee90a47f5a@oracle.com>
Date: Thu, 7 Mar 2019 15:50:55 -0800
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Yufen Yu <yuyufen@...wei.com>, linux-mm@...ck.org,
linux-kernel <linux-kernel@...r.kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Michal Hocko <mhocko@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2] hugetlbfs: fix memory leak for resv_map
Adding others on Cc to see if they have comments or opinions.
On 3/6/19 3:52 PM, Mike Kravetz wrote:
> On 3/5/19 10:10 PM, Yufen Yu wrote:
>> When .mknod create a block device file in hugetlbfs, it will
>> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
>> For now, inode->i_mapping->private_data is used to point the resv_map.
>> However, when open the device, bd_acquire() will set i_mapping as
>> bd_inode->imapping, result in resv_map memory leak.
>>
>> We fix it by waiting until a call to hugetlb_reserve_pages() to allocate
>> the inode specific resv_map. We could then remove the resv_map allocation
>> at inode creation time.
>>
>> Programs to reproduce:
>> mount -t hugetlbfs nodev hugetlbfs
>> mknod hugetlbfs/dev b 0 0
>> exec 30<> hugetlbfs/dev
>> umount hugetlbfs/
>>
>> Signed-off-by: Yufen Yu <yuyufen@...wei.com>
>
> Thank you. That is the approach I had in mind.
>
> Unfortunately, this patch causes several regressions in the libhugetlbfs
> test suite. I have not debugged to determine exact cause.
>
> I was unsure about one thing with this approach. We set
> inode->i_mapping->private_data while holding the inode lock, so there
> should be no problem there. However, we access inode_resv_map() in the
> page fault path without the inode lock. The page fault path should get
> NULL or a resv_map. I just wonder if there may be some races where the
> fault path may still be seeing NULL.
>
> I can do more debug, but it will take a couple days as I am busy with
> other things right now.
My apologies. Calling resv_map_alloc() only from hugetlb_reserve_pages()
is not going to work. The reason why is that the reserv_map is used to track
page allocations even if there are not reservations. So, if reservations
are not created other huge page accounting is impacted.
Sorry for suggesting that approach.
As mentioned, I do not like your original approach as it adds an extra word
to every hugetlbfs inode. How about something like the following which
only adds the resv_map to inodes which can have associated page allocations?
I have only done limited regression testing with this.
From: Mike Kravetz <mike.kravetz@...cle.com>
Date: Thu, 7 Mar 2019 15:37:31 -0800
Subject: [PATCH] hugetlbfs: only allocate reserve map for inodes that can
allocate pages
Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
---
fs/hugetlbfs/inode.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a7fa037b876b..a3a3d256fb0e 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -741,11 +741,17 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
umode_t mode, dev_t dev)
{
struct inode *inode;
- struct resv_map *resv_map;
+ struct resv_map *resv_map = NULL;
- resv_map = resv_map_alloc();
- if (!resv_map)
- return NULL;
+ /*
+ * Reserve maps are only needed for inodes that can have associated
+ * page allocations.
+ */
+ if (S_ISREG(mode) || S_ISLNK(mode)) {
+ resv_map = resv_map_alloc();
+ if (!resv_map)
+ return NULL;
+ }
inode = new_inode(sb);
if (inode) {
@@ -780,8 +786,10 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
break;
}
lockdep_annotate_inode_mutex_key(inode);
- } else
- kref_put(&resv_map->refs, resv_map_release);
+ } else {
+ if (resv_map)
+ kref_put(&resv_map->refs, resv_map_release);
+ }
return inode;
}
--
2.17.2
Powered by blists - more mailing lists