linux-kernel - Re: [PATCH 019/270] jbd: Fix assertion failure in commit code due to lacking transaction credits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121205120240.GA5706@quack.suse.cz>
Date:	Wed, 5 Dec 2012 13:02:41 +0100
From:	Jan Kara <jack@...e.cz>
To:	Herton Ronaldo Krzesinski <herton.krzesinski@...onical.com>
Cc:	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	kernel-team@...ts.ubuntu.com, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH 019/270] jbd: Fix assertion failure in commit code due
 to lacking transaction credits

On Mon 26-11-12 14:55:09, Herton Ronaldo Krzesinski wrote:
> 3.5.7u1 -stable review patch.  If anyone has any objections, please let me know.
  Note that this fix causes another problem which is fixed by
25389bb207987b5774182f763b9fb65ff08761c8 upstream.
									Honza

> 
> ------------------
> 
> From: Jan Kara <jack@...e.cz>
> 
> commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3 upstream.
> 
> ext3 users of data=journal mode with blocksize < pagesize were occasionally
> hitting assertion failure in journal_commit_transaction() checking whether the
> transaction has at least as many credits reserved as buffers attached.  The
> core of the problem is that when a file gets truncated, buffers that still need
> checkpointing or that are attached to the committing transaction are left with
> buffer_mapped set. When this happens to buffers beyond i_size attached to a
> page stradding i_size, subsequent write extending the file will see these
> buffers and as they are mapped (but underlying blocks were freed) things go
> awry from here.
> 
> The assertion failure just coincidentally (and in this case luckily as we would
> start corrupting filesystem) triggers due to journal_head not being properly
> cleaned up as well.
> 
> Under some rare circumstances this bug could even hit data=ordered mode users.
> There the assertion won't trigger and we would end up corrupting the
> filesystem.
> 
> We fix the problem by unmapping buffers if possible (in lots of cases we just
> need a buffer attached to a transaction as a place holder but it must not be
> written out anyway). And in one case, we just have to bite the bullet and wait
> for transaction commit to finish.
> 
> Reviewed-by: Josef Bacik <jbacik@...ionio.com>
> Signed-off-by: Jan Kara <jack@...e.cz>
> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@...onical.com>
> ---
>  fs/jbd/commit.c      |   45 ++++++++++++++++++++++++++---------
>  fs/jbd/transaction.c |   64 ++++++++++++++++++++++++++++++++++----------------
>  2 files changed, 78 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
> index 52c15c7..86b39b1 100644
> --- a/fs/jbd/commit.c
> +++ b/fs/jbd/commit.c
> @@ -86,7 +86,12 @@ nope:
>  static void release_data_buffer(struct buffer_head *bh)
>  {
>  	if (buffer_freed(bh)) {
> +		WARN_ON_ONCE(buffer_dirty(bh));
>  		clear_buffer_freed(bh);
> +		clear_buffer_mapped(bh);
> +		clear_buffer_new(bh);
> +		clear_buffer_req(bh);
> +		bh->b_bdev = NULL;
>  		release_buffer_page(bh);
>  	} else
>  		put_bh(bh);
> @@ -866,17 +871,35 @@ restart_loop:
>  		 * there's no point in keeping a checkpoint record for
>  		 * it. */
>  
> -		/* A buffer which has been freed while still being
> -		 * journaled by a previous transaction may end up still
> -		 * being dirty here, but we want to avoid writing back
> -		 * that buffer in the future after the "add to orphan"
> -		 * operation been committed,  That's not only a performance
> -		 * gain, it also stops aliasing problems if the buffer is
> -		 * left behind for writeback and gets reallocated for another
> -		 * use in a different page. */
> -		if (buffer_freed(bh) && !jh->b_next_transaction) {
> -			clear_buffer_freed(bh);
> -			clear_buffer_jbddirty(bh);
> +		/*
> +		 * A buffer which has been freed while still being journaled by
> +		 * a previous transaction.
> +		 */
> +		if (buffer_freed(bh)) {
> +			/*
> +			 * If the running transaction is the one containing
> +			 * "add to orphan" operation (b_next_transaction !=
> +			 * NULL), we have to wait for that transaction to
> +			 * commit before we can really get rid of the buffer.
> +			 * So just clear b_modified to not confuse transaction
> +			 * credit accounting and refile the buffer to
> +			 * BJ_Forget of the running transaction. If the just
> +			 * committed transaction contains "add to orphan"
> +			 * operation, we can completely invalidate the buffer
> +			 * now. We are rather throughout in that since the
> +			 * buffer may be still accessible when blocksize <
> +			 * pagesize and it is attached to the last partial
> +			 * page.
> +			 */
> +			jh->b_modified = 0;
> +			if (!jh->b_next_transaction) {
> +				clear_buffer_freed(bh);
> +				clear_buffer_jbddirty(bh);
> +				clear_buffer_mapped(bh);
> +				clear_buffer_new(bh);
> +				clear_buffer_req(bh);
> +				bh->b_bdev = NULL;
> +			}
>  		}
>  
>  		if (buffer_jbddirty(bh)) {
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index febc10d..78b7f84 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -1843,15 +1843,16 @@ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
>   * We're outside-transaction here.  Either or both of j_running_transaction
>   * and j_committing_transaction may be NULL.
>   */
> -static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
> +static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
> +				int partial_page)
>  {
>  	transaction_t *transaction;
>  	struct journal_head *jh;
>  	int may_free = 1;
> -	int ret;
>  
>  	BUFFER_TRACE(bh, "entry");
>  
> +retry:
>  	/*
>  	 * It is safe to proceed here without the j_list_lock because the
>  	 * buffers cannot be stolen by try_to_free_buffers as long as we are
> @@ -1879,10 +1880,18 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
>  	 * clear the buffer dirty bit at latest at the moment when the
>  	 * transaction marking the buffer as freed in the filesystem
>  	 * structures is committed because from that moment on the
> -	 * buffer can be reallocated and used by a different page.
> +	 * block can be reallocated and used by a different page.
>  	 * Since the block hasn't been freed yet but the inode has
>  	 * already been added to orphan list, it is safe for us to add
>  	 * the buffer to BJ_Forget list of the newest transaction.
> +	 *
> +	 * Also we have to clear buffer_mapped flag of a truncated buffer
> +	 * because the buffer_head may be attached to the page straddling
> +	 * i_size (can happen only when blocksize < pagesize) and thus the
> +	 * buffer_head can be reused when the file is extended again. So we end
> +	 * up keeping around invalidated buffers attached to transactions'
> +	 * BJ_Forget list just to stop checkpointing code from cleaning up
> +	 * the transaction this buffer was modified in.
>  	 */
>  	transaction = jh->b_transaction;
>  	if (transaction == NULL) {
> @@ -1909,13 +1918,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
>  			 * committed, the buffer won't be needed any
>  			 * longer. */
>  			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
> -			ret = __dispose_buffer(jh,
> +			may_free = __dispose_buffer(jh,
>  					journal->j_running_transaction);
> -			journal_put_journal_head(jh);
> -			spin_unlock(&journal->j_list_lock);
> -			jbd_unlock_bh_state(bh);
> -			spin_unlock(&journal->j_state_lock);
> -			return ret;
> +			goto zap_buffer;
>  		} else {
>  			/* There is no currently-running transaction. So the
>  			 * orphan record which we wrote for this file must have
> @@ -1923,13 +1928,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
>  			 * the committing transaction, if it exists. */
>  			if (journal->j_committing_transaction) {
>  				JBUFFER_TRACE(jh, "give to committing trans");
> -				ret = __dispose_buffer(jh,
> +				may_free = __dispose_buffer(jh,
>  					journal->j_committing_transaction);
> -				journal_put_journal_head(jh);
> -				spin_unlock(&journal->j_list_lock);
> -				jbd_unlock_bh_state(bh);
> -				spin_unlock(&journal->j_state_lock);
> -				return ret;
> +				goto zap_buffer;
>  			} else {
>  				/* The orphan record's transaction has
>  				 * committed.  We can cleanse this buffer */
> @@ -1950,10 +1951,24 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
>  		}
>  		/*
>  		 * The buffer is committing, we simply cannot touch
> -		 * it. So we just set j_next_transaction to the
> -		 * running transaction (if there is one) and mark
> -		 * buffer as freed so that commit code knows it should
> -		 * clear dirty bits when it is done with the buffer.
> +		 * it. If the page is straddling i_size we have to wait
> +		 * for commit and try again.
> +		 */
> +		if (partial_page) {
> +			tid_t tid = journal->j_committing_transaction->t_tid;
> +
> +			journal_put_journal_head(jh);
> +			spin_unlock(&journal->j_list_lock);
> +			jbd_unlock_bh_state(bh);
> +			spin_unlock(&journal->j_state_lock);
> +			log_wait_commit(journal, tid);
> +			goto retry;
> +		}
> +		/*
> +		 * OK, buffer won't be reachable after truncate. We just set
> +		 * j_next_transaction to the running transaction (if there is
> +		 * one) and mark buffer as freed so that commit code knows it
> +		 * should clear dirty bits when it is done with the buffer.
>  		 */
>  		set_buffer_freed(bh);
>  		if (journal->j_running_transaction && buffer_jbddirty(bh))
> @@ -1976,6 +1991,14 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
>  	}
>  
>  zap_buffer:
> +	/*
> +	 * This is tricky. Although the buffer is truncated, it may be reused
> +	 * if blocksize < pagesize and it is attached to the page straddling
> +	 * EOF. Since the buffer might have been added to BJ_Forget list of the
> +	 * running transaction, journal_get_write_access() won't clear
> +	 * b_modified and credit accounting gets confused. So clear b_modified
> +	 * here. */
> +	jh->b_modified = 0;
>  	journal_put_journal_head(jh);
>  zap_buffer_no_jh:
>  	spin_unlock(&journal->j_list_lock);
> @@ -2024,7 +2047,8 @@ void journal_invalidatepage(journal_t *journal,
>  		if (offset <= curr_off) {
>  			/* This block is wholly outside the truncation point */
>  			lock_buffer(bh);
> -			may_free &= journal_unmap_buffer(journal, bh);
> +			may_free &= journal_unmap_buffer(journal, bh,
> +							 offset > 0);
>  			unlock_buffer(bh);
>  		}
>  		curr_off = next_off;
> -- 
> 1.7.9.5
> 
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/