linux-kernel - Re: [PATCH 15/20] ufs: remove the BKL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTinKO=HM1BqDdFkdcCAGz85co64Ma79qomw6PDPp@mail.gmail.com>
Date:	Thu, 27 Jan 2011 16:47:55 +1100
From:	Nick Piggin <npiggin@...il.com>
To:	Arnd Bergmann <arnd@...db.de>
Cc:	linux-kernel@...r.kernel.org,
	Nick Bowler <nbowler@...iptictech.com>,
	Evgeniy Dushistov <dushistov@...l.ru>
Subject: Re: [PATCH 15/20] ufs: remove the BKL

Really great work in removing BKL, Arnd. It's awesome
work and I'm sure a lot of it was pretty thankless along
the way.

On Wed, Jan 26, 2011 at 9:17 AM, Arnd Bergmann <arnd@...db.de> wrote:
> This introduces a new per-superblock mutex in UFS to replace
> the big kernel lock. I have been careful to avoid nested
> calls to lock_ufs and to get the lock order right with
> respect to other mutexes, in particular lock_super.

When I looked at removing bkl from minix a long time ago,
I was a bit worried about reclaim and fs/io recursion in some
of the filesystems with bkl.


> @@ -436,7 +439,8 @@ int ufs_getfrag_block(struct inode *inode, sector_t fragment, struct buffer_head
>        ret = 0;
>        bh = NULL;
>
> -       lock_kernel();
> +       if (needs_lock)
> +               lock_ufs(sb);
>
>        UFSD("ENTER, ino %lu, fragment %llu\n", inode->i_ino, (unsigned long long)fragment);
>        if (fragment >

[...]

> @@ -55,16 +54,16 @@ static struct dentry *ufs_lookup(struct inode * dir, struct dentry *dentry, stru
>        if (dentry->d_name.len > UFS_MAXNAMLEN)
>                return ERR_PTR(-ENAMETOOLONG);
>
> -       lock_kernel();
> +       lock_ufs(dir->i_sb);
>        ino = ufs_inode_by_name(dir, &dentry->d_name);
>        if (ino) {
>                inode = ufs_iget(dir->i_sb, ino);
>                if (IS_ERR(inode)) {
> -                       unlock_kernel();
> +                       unlock_ufs(dir->i_sb);
>                        return ERR_CAST(inode);
>                }
>        }
> -       unlock_kernel();
> +       unlock_ufs(dir->i_sb);
>        d_add(dentry, inode);
>        return NULL;
>  }

versus

1405static struct inode *ufs_alloc_inode(struct super_block *sb)
1406{
1407        struct ufs_inode_info *ei;
1408        ei = (struct ufs_inode_info
*)kmem_cache_alloc(ufs_inode_cachep, GFP_KERNEL);
1409        if (!ei)
1410                return NULL;
1411        ei->vfs_inode.i_version = 1;
1412        return &ei->vfs_inode;
1413}

So, get_block can be called for .writepage in page reclaim,
which takes the lock. ufs_lookup takes the lock and winds
up calling ufs_alloc_inode. And ufs_alloc_inode does
GFP_KERNEL, which can enter reclaim with __GFP_FS
set.

I didn't look through all your filesystem conversions, but it is
something tricky to watch out for I think.

Changing everything to GFP_NOFS may be an option, for
such crufty old filesystems...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/