[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170119092236.GC2565@quack2.suse.cz>
Date: Thu, 19 Jan 2017 10:22:36 +0100
From: Jan Kara <jack@...e.cz>
To: Michal Hocko <mhocko@...nel.org>
Cc: Jan Kara <jack@...e.cz>, Theodore Ts'o <tytso@....edu>,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Chinner <david@...morbit.com>, djwong@...nel.org,
Chris Mason <clm@...com>, David Sterba <dsterba@...e.cz>,
ceph-devel@...r.kernel.org, cluster-devel@...hat.com,
linux-nfs@...r.kernel.org, logfs@...fs.org,
linux-xfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-btrfs@...r.kernel.org, linux-mtd@...ts.infradead.org,
reiserfs-devel@...r.kernel.org,
linux-ntfs-dev@...ts.sourceforge.net,
linux-f2fs-devel@...ts.sourceforge.net,
linux-afs@...ts.infradead.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"
On Thu 19-01-17 09:39:56, Michal Hocko wrote:
> On Tue 17-01-17 18:29:25, Jan Kara wrote:
> > On Tue 17-01-17 17:16:19, Michal Hocko wrote:
> > > > > But before going to play with that I am really wondering whether we need
> > > > > all this with no journal at all. AFAIU what Jack told me it is the
> > > > > journal lock(s) which is the biggest problem from the reclaim recursion
> > > > > point of view. What would cause a deadlock in no journal mode?
> > > >
> > > > We still have the original problem for why we need GFP_NOFS even in
> > > > ext2. If we are in a writeback path, and we need to allocate memory,
> > > > we don't want to recurse back into the file system's writeback path.
> > >
> > > But we do not enter the writeback path from the direct reclaim. Or do
> > > you mean something other than pageout()'s mapping->a_ops->writepage?
> > > There is only try_to_release_page where we get back to the filesystems
> > > but I do not see any NOFS protection in ext4_releasepage.
> >
> > Maybe to expand a bit: These days, direct reclaim can call ->releasepage()
> > callback, ->evict_inode() callback (and only for inodes with i_nlink > 0),
> > shrinkers. That's it. So the recursion possibilities are rather more limited
> > than they used to be several years ago and we likely do not need as much
> > GFP_NOFS protection as we used to.
>
> Thanks for making my remark more clear Jack! I would just want to add
> that I was playing with the patch below (it is basically
> GFP_NOFS->GFP_KERNEL for all allocations which trigger warning from the
> debugging patch which means they are called from within transaction) and
> it didn't hit the lockdep when running xfstests both with or without the
> enabled journal.
>
> So am I still missing something or the nojournal mode is safe and the
> current series is OK wrt. ext*?
I'm convinced the current series is OK, only real life will tell us whether
we missed something or not ;)
> The following patch in its current form is WIP and needs a proper review
> before I post it.
So jbd2 changes look confusing (although technically correct) to me - we
*always* should run in NOFS context in those place so having GFP_KERNEL
there looks like it is unnecessarily hiding what is going on. So in those
places I'd prefer to keep GFP_NOFS or somehow else make it very clear these
allocations are expected to be GFP_NOFS (and assert that). Otherwise the
changes look good to me.
Honza
> ---
> fs/ext4/inode.c | 4 ++--
> fs/ext4/mballoc.c | 14 +++++++-------
> fs/ext4/xattr.c | 2 +-
> fs/jbd2/journal.c | 4 ++--
> fs/jbd2/revoke.c | 2 +-
> fs/jbd2/transaction.c | 2 +-
> 6 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index b7d141c3b810..841cb8c4cb5e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2085,7 +2085,7 @@ static int ext4_writepage(struct page *page,
> return __ext4_journalled_writepage(page, len);
>
> ext4_io_submit_init(&io_submit, wbc);
> - io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);
> + io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
> if (!io_submit.io_end) {
> redirty_page_for_writepage(wbc, page);
> unlock_page(page);
> @@ -3794,7 +3794,7 @@ static int __ext4_block_zero_page_range(handle_t *handle,
> int err = 0;
>
> page = find_or_create_page(mapping, from >> PAGE_SHIFT,
> - mapping_gfp_constraint(mapping, ~__GFP_FS));
> + mapping_gfp_mask(mapping));
> if (!page)
> return -ENOMEM;
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d9fd184b049e..67b97cd6e3d6 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1251,7 +1251,7 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
> static int ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
> struct ext4_buddy *e4b)
> {
> - return ext4_mb_load_buddy_gfp(sb, group, e4b, GFP_NOFS);
> + return ext4_mb_load_buddy_gfp(sb, group, e4b, GFP_KERNEL);
> }
>
> static void ext4_mb_unload_buddy(struct ext4_buddy *e4b)
> @@ -2054,7 +2054,7 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
>
> /* We only do this if the grp has never been initialized */
> if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> - int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> + int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_KERNEL);
> if (ret)
> return ret;
> }
> @@ -3600,7 +3600,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> BUG_ON(ac->ac_status != AC_STATUS_FOUND);
> BUG_ON(!S_ISREG(ac->ac_inode->i_mode));
>
> - pa = kmem_cache_alloc(ext4_pspace_cachep, GFP_NOFS);
> + pa = kmem_cache_alloc(ext4_pspace_cachep, GFP_KERNEL);
> if (pa == NULL)
> return -ENOMEM;
>
> @@ -3694,7 +3694,7 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
> BUG_ON(!S_ISREG(ac->ac_inode->i_mode));
>
> BUG_ON(ext4_pspace_cachep == NULL);
> - pa = kmem_cache_alloc(ext4_pspace_cachep, GFP_NOFS);
> + pa = kmem_cache_alloc(ext4_pspace_cachep, GFP_KERNEL);
> if (pa == NULL)
> return -ENOMEM;
>
> @@ -4479,7 +4479,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
> }
> }
>
> - ac = kmem_cache_zalloc(ext4_ac_cachep, GFP_NOFS);
> + ac = kmem_cache_zalloc(ext4_ac_cachep, GFP_KERNEL);
> if (!ac) {
> ar->len = 0;
> *errp = -ENOMEM;
> @@ -4813,7 +4813,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
>
> /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */
> err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
> - GFP_NOFS|__GFP_NOFAIL);
> + GFP_KERNEL|__GFP_NOFAIL);
> if (err)
> goto error_return;
>
> @@ -4832,7 +4832,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
> * to fail.
> */
> new_entry = kmem_cache_alloc(ext4_free_data_cachep,
> - GFP_NOFS|__GFP_NOFAIL);
> + GFP_KERNEL|__GFP_NOFAIL);
> new_entry->efd_start_cluster = bit;
> new_entry->efd_group = block_group;
> new_entry->efd_count = count_clusters;
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 172317462238..f68e8c87f9f2 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1650,7 +1650,7 @@ ext4_xattr_cache_insert(struct mb_cache *ext4_mb_cache, struct buffer_head *bh)
> EXT4_XATTR_REFCOUNT_MAX;
> int error;
>
> - error = mb_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash,
> + error = mb_cache_entry_create(ext4_mb_cache, GFP_KERNEL, hash,
> bh->b_blocknr, reusable);
> if (error) {
> if (error == -EBUSY)
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 3a449150f834..bd29daa975a5 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -379,7 +379,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
> */
> J_ASSERT_BH(bh_in, buffer_jbddirty(bh_in));
>
> - new_bh = alloc_buffer_head(GFP_NOFS|__GFP_NOFAIL);
> + new_bh = alloc_buffer_head(GFP_KERNEL|__GFP_NOFAIL);
>
> /* keep subsequent assertions sane */
> atomic_set(&new_bh->b_count, 1);
> @@ -2375,7 +2375,7 @@ static struct journal_head *journal_alloc_journal_head(void)
> #ifdef CONFIG_JBD2_DEBUG
> atomic_inc(&nr_journal_heads);
> #endif
> - ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
> + ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_KERNEL);
> if (!ret) {
> jbd_debug(1, "out of memory for journal_head\n");
> pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
> diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
> index cfc38b552118..c9c347468c5b 100644
> --- a/fs/jbd2/revoke.c
> +++ b/fs/jbd2/revoke.c
> @@ -141,7 +141,7 @@ static int insert_revoke_hash(journal_t *journal, unsigned long long blocknr,
> {
> struct list_head *hash_list;
> struct jbd2_revoke_record_s *record;
> - gfp_t gfp_mask = GFP_NOFS;
> + gfp_t gfp_mask = GFP_KERNEL;
>
> if (journal_oom_retry)
> gfp_mask |= __GFP_NOFAIL;
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 35a5d3d76182..a7e50eb330a8 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -974,7 +974,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
> JBUFFER_TRACE(jh, "allocate memory for buffer");
> jbd_unlock_bh_state(bh);
> frozen_buffer = jbd2_alloc(jh2bh(jh)->b_size,
> - GFP_NOFS | __GFP_NOFAIL);
> + GFP_KERNEL | __GFP_NOFAIL);
> goto repeat;
> }
> jh->b_frozen_data = frozen_buffer;
> --
> 2.11.0
>
> --
> Michal Hocko
> SUSE Labs
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists