[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <201101271413.01204.arnd@arndb.de>
Date: Thu, 27 Jan 2011 14:13:00 +0100
From: Arnd Bergmann <arnd@...db.de>
To: Nick Piggin <npiggin@...il.com>
Cc: linux-kernel@...r.kernel.org,
Nick Bowler <nbowler@...iptictech.com>,
Evgeniy Dushistov <dushistov@...l.ru>
Subject: Re: [PATCH 15/20] ufs: remove the BKL
On Thursday 27 January 2011, Nick Piggin wrote:
> On Wed, Jan 26, 2011 at 9:17 AM, Arnd Bergmann <arnd@...db.de> wrote:
> > This introduces a new per-superblock mutex in UFS to replace
> > the big kernel lock. I have been careful to avoid nested
> > calls to lock_ufs and to get the lock order right with
> > respect to other mutexes, in particular lock_super.
>
> When I looked at removing bkl from minix a long time ago,
> I was a bit worried about reclaim and fs/io recursion in some
> of the filesystems with bkl.
Thanks for looking at this, you certainly understand these interactions
much better than me!
> > @@ -436,7 +439,8 @@ int ufs_getfrag_block(struct inode *inode, sector_t fragment, struct buffer_head
> > ret = 0;
> > bh = NULL;
> >
> > - lock_kernel();
> > + if (needs_lock)
> > + lock_ufs(sb);
> >
> > UFSD("ENTER, ino %lu, fragment %llu\n", inode->i_ino, (unsigned long long)fragment);
> > if (fragment >
> So, get_block can be called for .writepage in page reclaim,
> which takes the lock. ufs_lookup takes the lock and winds
> up calling ufs_alloc_inode. And ufs_alloc_inode does
> GFP_KERNEL, which can enter reclaim with __GFP_FS
> set.
This is one path that I did not consider. My initial thought when you
mentioned it was that this is still fine because ufs_getfrag_block can be
called with the lock held by the current task, but you are of course
right that this won't work when we are called by the writeback task.
> I didn't look through all your filesystem conversions, but it is
> something tricky to watch out for I think.
The only other one where I added locks is isofs, and I'm pretty sure that's
fine because it's read-only and does not do writeback. HPFS with Andi's suggested
solution of making it UP-only is of course also fine.
> Changing everything to GFP_NOFS may be an option, for
> such crufty old filesystems...
Yes. All performance optimizations in UFS are pointless anyway as long as we
do most of the disk I/O under a global mutex. The patch below converts all
allocations that are done under a lock in UFS to GFP_NOFS. If nobody comes
up with a better solution, I'll fold that into my other patch.
Arnd
---
fs/ufs/super.c | 8 ++++----
fs/ufs/util.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index e4e06ef..7693d62 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -539,7 +539,7 @@ static int ufs_read_cylinder_structures(struct super_block *sb)
*/
size = uspi->s_cssize;
blks = (size + uspi->s_fsize - 1) >> uspi->s_fshift;
- base = space = kmalloc(size, GFP_KERNEL);
+ base = space = kmalloc(size, GFP_NOFS);
if (!base)
goto failed;
sbi->s_csp = (struct ufs_csum *)space;
@@ -564,7 +564,7 @@ static int ufs_read_cylinder_structures(struct super_block *sb)
* Read cylinder group (we read only first fragment from block
* at this time) and prepare internal data structures for cg caching.
*/
- if (!(sbi->s_ucg = kmalloc (sizeof(struct buffer_head *) * uspi->s_ncg, GFP_KERNEL)))
+ if (!(sbi->s_ucg = kmalloc (sizeof(struct buffer_head *) * uspi->s_ncg, GFP_NOFS)))
goto failed;
for (i = 0; i < uspi->s_ncg; i++)
sbi->s_ucg[i] = NULL;
@@ -582,7 +582,7 @@ static int ufs_read_cylinder_structures(struct super_block *sb)
ufs_print_cylinder_stuff(sb, (struct ufs_cylinder_group *) sbi->s_ucg[i]->b_data);
}
for (i = 0; i < UFS_MAX_GROUP_LOADED; i++) {
- if (!(sbi->s_ucpi[i] = kmalloc (sizeof(struct ufs_cg_private_info), GFP_KERNEL)))
+ if (!(sbi->s_ucpi[i] = kmalloc (sizeof(struct ufs_cg_private_info), GFP_NOFS)))
goto failed;
sbi->s_cgno[i] = UFS_CGNO_EMPTY;
}
@@ -1415,7 +1415,7 @@ static struct kmem_cache * ufs_inode_cachep;
static struct inode *ufs_alloc_inode(struct super_block *sb)
{
struct ufs_inode_info *ei;
- ei = (struct ufs_inode_info *)kmem_cache_alloc(ufs_inode_cachep, GFP_KERNEL);
+ ei = (struct ufs_inode_info *)kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
ei->vfs_inode.i_version = 1;
diff --git a/fs/ufs/util.c b/fs/ufs/util.c
index d2c36d5..95425b5 100644
--- a/fs/ufs/util.c
+++ b/fs/ufs/util.c
@@ -27,7 +27,7 @@ struct ufs_buffer_head * _ubh_bread_ (struct ufs_sb_private_info * uspi,
if (count > UFS_MAXFRAG)
return NULL;
ubh = (struct ufs_buffer_head *)
- kmalloc (sizeof (struct ufs_buffer_head), GFP_KERNEL);
+ kmalloc (sizeof (struct ufs_buffer_head), GFP_NOFS);
if (!ubh)
return NULL;
ubh->fragment = fragment;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists