lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 May 2021 16:27:23 -0400
From:   Leah Rumancik <leah.rumancik@...il.com>
To:     Theodore Ts'o <tytso@....edu>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush

On Thu, May 13, 2021 at 02:09:26PM -0400, Theodore Ts'o wrote:
> On Tue, May 11, 2021 at 06:04:26PM +0000, Leah Rumancik wrote:
> > @@ -3223,7 +3223,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> >  		ext4_clear_inode_state(inode, EXT4_STATE_JDATA);
> >  		journal = EXT4_JOURNAL(inode);
> >  		jbd2_journal_lock_updates(journal);
> > -		err = jbd2_journal_flush(journal);
> > +		err = jbd2_journal_flush(journal, 0);
> 
> In the ocfs2 changes, I noticed you are using "false", instead of 0,
> in the second argument to jbd2_journal_flush.
> 
> When I looked more closely, the function signature of
> jbd2_journal_flush is also using an unsigned long long for flags,
> which struck me as strange:
> 
> > +extern int	 jbd2_journal_flush(journal_t *journal, unsigned long long flags);
> 
> I then noticed that later in the patch series, the ioctl argument is
> taking an unsigned long long and we're passing that straight through
> to jbd2_journal_flush().
> 
> First of all, unsigned long long is not very efficient on many
> platforms (especially 32-bit platforms), but also on platforms where
> int is 32 bits.  If we don't expect us to need more than 32 flag bits,
> I'd suggest explicit ly using __u32 in ioctl interface.  (__u32 is
> fine; it's the use of the base int type which can get us into trouble,
> since int can be either 32 or 64 bits depending on the architecture).
> 

Just to make sure I understand correctly, the explicit __u32 is critical
due to the size being read in by the ioctl, specifically through
copy_from_user? When do you switch from __u32 to unsigned long? I don't
see the __* types being carried throughout.

(Also, just got Darrick's reply about the 32 vs. 64. Yes, originally
went with 64 because there was an argument for it. I believe the 32
is likely sufficient, but I don't feel that strongly about this matter)

> Secondly, I'd suggest using a different set of flags for
> jbd2_journal_flush(), which is an internal kernel interface, and the
> EXT4_IOC_CHECKPOINT interface.  We might in the future want to add
> some internal flags to jbd2_journal_flush that we do *not* want to
> expose via EXT4_IOC_CHECKPOINT, and so it's best that we keep those
> two interfaces separate.
> 
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 2dc944442802..f86929dbca3c 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -1686,6 +1686,106 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
> >  	write_unlock(&journal->j_state_lock);
> >  }
> >  
> > +#define JBD2_ERASE_FLAG_DISCARD	1
> > +#define JBD2_ERASE_FLAG_ZEROOUT	2
> 
> I'd suggest defining these in include/linux/jbd2.h, and giving them
> names like: JBD2_JOURNAL_FLUSH_DISCARD and JBD2_JOURNAL_FLUSH_ERASE...
> (and making the flags parameter an unsigned int).
> 
> > +	/* flags must be set to either discard or zeroout */
> > +	if ((flags & JBD2_ERASE_FLAG_DISCARD & JBD2_ERASE_FLAG_ZEROOUT) || !flags)
> > +		return -EINVAL;
> 
> The expression (flags & JBD2_ERASE_FLAG_DISCARD & JBD2_ERASE_FLAG_ZEROOUT)
> is always going to evaluate to zero, since (1 & 2) is 0.
> 
> What you probably want is something like:
> 
> #define JBD2_JOURNAL_FLUSH_DISCARD	0x0001
> #define JBD2_JOURNAL_FLUSH_ZEROOUT	0x0002
> #define JBD2_JOURNAL_FLUSH_VALID	0x0003
> 

Why call them JBD2_JOURNAL_FLUSH* instead of JBD2_JOURNAL_ERASE* since
they get passed directly through to the erase function? I feel like it
would be weird if someone wanted to use the erase function directly but
had to use JBD2_JOURNAL_FLUSH* flags.

>      if ((flags & ~JBD2_JOURNAL_FLUSH_VALID) ||
>          ((flags & JBD2_JOURNAL_FLUSH_DISCARD) &&
> 	  (flags & JBD2_JOURNAL_FLUSH_ZEROOUT)))
> 	return -EINVAL;
> 
Ah, great. Thanks!

> > +
> > +	err = jbd2_journal_bmap(journal, log_offset, &block_start);
> > +	if (err) {
> > +		printk(KERN_ERR "JBD2: bad block at offset %lu", log_offset);
> > +		return err;
> > +	}
> 
> We could get rid of this, and instead make sure block_start is initialized
> to ~((unsigned long long) 0).  Then in the loop we can do...
> 
> > +
> > +	/*
> > +	 * use block_start - 1 to meet check for contiguous with previous region:
> > +	 * phys_block == block_stop + 1
> > +	 */
> > +	block_stop = block_start - 1;
> > +
> > +	for (block = log_offset; block < journal->j_total_len; block++) {
> > +		err = jbd2_journal_bmap(journal, block, &phys_block);
> > +		if (err) {
> > +			printk(KERN_ERR "JBD2: bad block at offset %lu", block);
> > +			return err;
> > +		}
> 
> 		if (block_start == ~((unsigned long long) 0)) {
> 			block_start = phys_block;
> 			block_Stop = block_start - 1;
> 		}
> 
> > +
> > +		if (block == journal->j_total_len - 1) {
> > +			block_stop = phys_block;
> > +		} else if (phys_block == block_stop + 1) {
> > +			block_stop++;
> > +			continue;
> > +		}
> > +
> > +		/*
> > +		 * not contiguous with prior physical block or this is last
> > +		 * block of journal, take care of the region
> > +		 */
> > +		byte_start = block_start * journal->j_blocksize;
> > +		byte_stop = block_stop * journal->j_blocksize;
> > +		byte_count = (block_stop - block_start + 1) *
> > +				journal->j_blocksize;
> > +
> > +		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
> > +				byte_start, byte_stop);
> > +
> > +		if (flags & JBD2_ERASE_FLAG_DISCARD) {
> > +			err = blkdev_issue_discard(journal->j_dev,
> > +					byte_start >> SECTOR_SHIFT,
> > +					byte_count >> SECTOR_SHIFT,
> > +					GFP_NOFS, 0);
> > +		} else if (flags & JBD2_ERASE_FLAG_ZEROOUT) {
> > +			err = blkdev_issue_zeroout(journal->j_dev,
> > +					byte_start >> SECTOR_SHIFT,
> > +					byte_count >> SECTOR_SHIFT,
> > +					GFP_NOFS, 0);
> > +		}
> > +
> > +		if (unlikely(err != 0)) {
> > +			printk(KERN_ERR "JBD2: (error %d) unable to wipe journal at physical blocks %llu - %llu",
> > +					err, block_start, block_stop);
> > +			return err;
> > +		}
> > +
> > +		block_start = phys_block;
> > +		block_stop = phys_block;
> 
> Is this right?  When we initialized the loop, above, block_stop was
> set to block_start-1 (where block_start == phys_block).  So I think it
> might be more correct to replace the above two lines with:
> 
> 		block_start = ~((unsigned long long) 0);
> 

I'll play around with this and see if I can get it to work. Seems like
it might simplify the code a bit.

> ... and then let block_start and block_stop be initialized in a single
> place.  Do you agree?  Does this make sense to you?
> 
> 	       	       	    	      	    - Ted


Thanks!
Leah

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ