lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 12 Oct 2020 17:28:01 -0700
From:   harshad shirwadkar <harshadshirwadkar@...il.com>
To:     Ritesh Harjani <riteshh@...ux.ibm.com>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>,
        "Theodore Y. Ts'o" <tytso@....edu>
Subject: Re: [PATCH v9 3/9] ext4 / jbd2: add fast commit initialization

Thanks Ritesh for taking a look at the patches! I know that a couple
of patches in this series are really big, I really appreciate you
taking a look at them!

On Fri, Oct 9, 2020 at 9:10 AM Ritesh Harjani <riteshh@...ux.ibm.com> wrote:
>
>
> Sorry about the delay. Few comments below.
>
> On 9/19/20 6:24 AM, Harshad Shirwadkar wrote:
> > This patch adds fast commit area trackers in the journal_t
> > structure. These are initialized via the jbd2_fc_init() routine that
> > this patch adds. This patch also adds ext4/fast_commit.c and
> > ext4/fast_commit.h files for fast commit code that will be added in
> > subsequent patches in this series.
> >
> > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > ---
> >   fs/ext4/Makefile      |  2 +-
> >   fs/ext4/ext4.h        |  4 ++++
> >   fs/ext4/fast_commit.c | 20 +++++++++++++++++
> >   fs/ext4/fast_commit.h |  9 ++++++++
> >   fs/ext4/super.c       |  1 +
> >   fs/jbd2/journal.c     | 52 ++++++++++++++++++++++++++++++++++++++-----
> >   include/linux/jbd2.h  | 39 ++++++++++++++++++++++++++++++++
> >   7 files changed, 121 insertions(+), 6 deletions(-)
> >   create mode 100644 fs/ext4/fast_commit.c
> >   create mode 100644 fs/ext4/fast_commit.h
> >
> > diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
> > index 2e42f47a7f98..49e7af6cc93f 100644
> > --- a/fs/ext4/Makefile
> > +++ b/fs/ext4/Makefile
> > @@ -10,7 +10,7 @@ ext4-y      := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
> >               indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
> >               mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
> >               super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \
> > -             xattr_user.o
> > +             xattr_user.o fast_commit.o
> >
> >   ext4-$(CONFIG_EXT4_FS_POSIX_ACL)    += acl.o
> >   ext4-$(CONFIG_EXT4_FS_SECURITY)             += xattr_security.o
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 82e889d5c2ed..9af3971dd12e 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -964,6 +964,7 @@ do {                                                                             \
> >   #endif /* defined(__KERNEL__) || defined(__linux__) */
> >
> >   #include "extents_status.h"
> > +#include "fast_commit.h"
> >
> >   /*
> >    * Lock subclasses for i_data_sem in the ext4_inode_info structure.
> > @@ -2679,6 +2680,9 @@ extern int ext4_init_inode_table(struct super_block *sb,
> >                                ext4_group_t group, int barrier);
> >   extern void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate);
> >
> > +/* fast_commit.c */
> > +
> > +void ext4_fc_init(struct super_block *sb, journal_t *journal);
> >   /* mballoc.c */
> >   extern const struct seq_operations ext4_mb_seq_groups_ops;
> >   extern long ext4_mb_stats;
> > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > new file mode 100644
> > index 000000000000..0dad8bdb1253
> > --- /dev/null
> > +++ b/fs/ext4/fast_commit.c
> > @@ -0,0 +1,20 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * fs/ext4/fast_commit.c
> > + *
> > + * Written by Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > + *
> > + * Ext4 fast commits routines.
> > + */
> > +#include "ext4_jbd2.h"
> > +
> > +void ext4_fc_init(struct super_block *sb, journal_t *journal)
> > +{
> > +     if (!test_opt2(sb, JOURNAL_FAST_COMMIT))
> > +             return;
> > +     if (jbd2_fc_init(journal, EXT4_NUM_FC_BLKS)) {
> > +             pr_warn("Error while enabling fast commits, turning off.");
> > +             ext4_clear_feature_fast_commit(sb);
> > +     }
> > +}
> > diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
> > new file mode 100644
> > index 000000000000..8362bf5e6e00
> > --- /dev/null
> > +++ b/fs/ext4/fast_commit.h
> > @@ -0,0 +1,9 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef __FAST_COMMIT_H__
> > +#define __FAST_COMMIT_H__
> > +
> > +/* Number of blocks in journal area to allocate for fast commits */
> > +#define EXT4_NUM_FC_BLKS             256
>
> Just wanted to understand how is this value determined?
> Do you think this needs to be configurable?
> Just thinking since, on some platforms blksz could be of 64K.
I see, I chose this value experimentally. In my experiments with very
aggressive journal commits, (such as fs_mark and NFS), I found that
256 blocks was enough to guarantee that this space doesn't get filled
up before the mandatory periodic full commit (happening at default of
5 seconds). But I realize that it's probably better to make this
configurable. Another option is to have this value be statically
defined as a percentage of the total number of blocks available for
JBD2. The latter has the advantage that we don't need on-disk format
doesn't need to be updated. Performance gains with fast commits are
achieved by delaying full commits as much as possible.
>
> > +
> > +#endif /* __FAST_COMMIT_H__ */
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b62858ee420b..94aaaf940449 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -4962,6 +4962,7 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal)
> >       journal->j_commit_interval = sbi->s_commit_interval;
> >       journal->j_min_batch_time = sbi->s_min_batch_time;
> >       journal->j_max_batch_time = sbi->s_max_batch_time;
> > +     ext4_fc_init(sb, journal);
> >
> >       write_lock(&journal->j_state_lock);
> >       if (test_opt(sb, BARRIER))
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 17fdc482f554..736a1736619f 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -1179,6 +1179,14 @@ static journal_t *journal_init_common(struct block_device *bdev,
> >       if (!journal->j_wbuf)
> >               goto err_cleanup;
> >
> > +     if (journal->j_fc_wbufsize > 0) {
> > +             journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > +                                     sizeof(struct buffer_head *),
> > +                                     GFP_KERNEL);
> > +             if (!journal->j_fc_wbuf)
> > +                     goto err_cleanup;
> > +     }
> > +
> >       bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize);
> >       if (!bh) {
> >               pr_err("%s: Cannot get buffer for journal superblock\n",
> > @@ -1192,11 +1200,22 @@ static journal_t *journal_init_common(struct block_device *bdev,
> >
> >   err_cleanup:
> >       kfree(journal->j_wbuf);
> > +     kfree(journal->j_fc_wbuf);
> >       jbd2_journal_destroy_revoke(journal);
> >       kfree(journal);
> >       return NULL;
> >   }
> >
> > +int jbd2_fc_init(journal_t *journal, int num_fc_blks)
> > +{
> > +     journal->j_fc_wbufsize = num_fc_blks;
> > +     journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
> > +                             sizeof(struct buffer_head *), GFP_KERNEL);
> > +     if (!journal->j_fc_wbuf)
> > +             return -ENOMEM;
> > +     return 0;
> > +}
> > +
> >   /* jbd2_journal_init_dev and jbd2_journal_init_inode:
> >    *
> >    * Create a journal structure assigned some fixed set of disk blocks to
> > @@ -1314,11 +1333,20 @@ static int journal_reset(journal_t *journal)
> >       }
> >
> >       journal->j_first = first;
> > -     journal->j_last = last;
> >
> > -     journal->j_head = first;
> > -     journal->j_tail = first;
> > -     journal->j_free = last - first;
> > +     if (jbd2_has_feature_fast_commit(journal) &&
> > +         journal->j_fc_wbufsize > 0) {
> > +             journal->j_last_fc = last;
> > +             journal->j_last = last - journal->j_fc_wbufsize;
> > +             journal->j_first_fc = journal->j_last + 1;
> > +             journal->j_fc_off = 0;
> > +     } else {
> > +             journal->j_last = last;
> > +     }
> > +
> > +     journal->j_head = journal->j_first;
> > +     journal->j_tail = journal->j_first;
> > +     journal->j_free = journal->j_last - journal->j_first;
> >
> >       journal->j_tail_sequence = journal->j_transaction_sequence;
> >       journal->j_commit_sequence = journal->j_transaction_sequence - 1;
> > @@ -1663,9 +1691,18 @@ static int load_superblock(journal_t *journal)
> >       journal->j_tail_sequence = be32_to_cpu(sb->s_sequence);
> >       journal->j_tail = be32_to_cpu(sb->s_start);
> >       journal->j_first = be32_to_cpu(sb->s_first);
> > -     journal->j_last = be32_to_cpu(sb->s_maxlen);
> >       journal->j_errno = be32_to_cpu(sb->s_errno);
> >
> > +     if (jbd2_has_feature_fast_commit(journal) &&
> > +         journal->j_fc_wbufsize > 0) {
> > +             journal->j_last_fc = be32_to_cpu(sb->s_maxlen);
> > +             journal->j_last = journal->j_last_fc - journal->j_fc_wbufsize;
> > +             journal->j_first_fc = journal->j_last + 1;
> > +             journal->j_fc_off = 0;
> > +     } else {
> > +             journal->j_last = be32_to_cpu(sb->s_maxlen);
> > +     }
> > +
> >       return 0;
> >   }
> >
> > @@ -1726,6 +1763,9 @@ int jbd2_journal_load(journal_t *journal)
> >        */
> >       journal->j_flags &= ~JBD2_ABORT;
> >
> > +     if (journal->j_fc_wbufsize > 0)
> > +             jbd2_journal_set_features(journal, 0, 0,
> > +                                       JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
> >       /* OK, we've finished with the dynamic journal bits:
> >        * reinitialise the dynamic contents of the superblock in memory
> >        * and reset them on disk. */
> > @@ -1809,6 +1849,8 @@ int jbd2_journal_destroy(journal_t *journal)
> >               jbd2_journal_destroy_revoke(journal);
> >       if (journal->j_chksum_driver)
> >               crypto_free_shash(journal->j_chksum_driver);
> > +     if (journal->j_fc_wbufsize > 0)
> > +             kfree(journal->j_fc_wbuf);
> >       kfree(journal->j_wbuf);
> >       kfree(journal);
> >
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index f438257d7f31..36f65a818366 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -915,6 +915,30 @@ struct journal_s
> >        */
> >       unsigned long           j_last;
> >
> > +     /**
> > +      * @j_first_fc:
> > +      *
> > +      * The block number of the first fast commit block in the journal
> > +      * [j_state_lock].
> > +      */
> > +     unsigned long           j_first_fc;
> > +
> > +     /**
> > +      * @j_fc_off:
> > +      *
> > +      * Number of fast commit blocks currently allocated.
> > +      * [j_state_lock].
> > +      */
> > +     unsigned long           j_fc_off;
>
> I guess choosing a single naming convention for fast commit would be
> very helpful for grepping/searching.
> So for e.g. we could have everything using j_fc_**
> If you agree, then we may have to change other members of this structure
> accordingly.
That makes sense, I'll rename the variables / functions where this
convention is not followed.

Thanks,
Harshad

>
> -ritesh
>
> > +
> > +     /**
> > +      * @j_last_fc:
> > +      *
> > +      * The block number one beyond the last fast commit block in the journal
> > +      * [j_state_lock].
> > +      */
> > +     unsigned long           j_last_fc;
> > +
> >       /**
> >        * @j_dev: Device where we store the journal.
> >        */
> > @@ -1065,6 +1089,12 @@ struct journal_s
> >        */
> >       struct buffer_head      **j_wbuf;
> >
> > +     /**
> > +      * @j_fc_wbuf: Array of fast commit bhs for
> > +      * jbd2_journal_commit_transaction.
> > +      */
> > +     struct buffer_head      **j_fc_wbuf;
> > +
> >       /**
> >        * @j_wbufsize:
> >        *
> > @@ -1072,6 +1102,13 @@ struct journal_s
> >        */
> >       int                     j_wbufsize;
> >
> > +     /**
> > +      * @j_fc_wbufsize:
> > +      *
> > +      * Size of @j_fc_wbuf array.
> > +      */
> > +     int                     j_fc_wbufsize;
> > +
> >       /**
> >        * @j_last_sync_writer:
> >        *
> > @@ -1507,6 +1544,8 @@ void __jbd2_log_wait_for_space(journal_t *journal);
> >   extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
> >   extern int jbd2_cleanup_journal_tail(journal_t *);
> >
> > +/* Fast commit related APIs */
> > +int jbd2_fc_init(journal_t *journal, int num_fc_blks);
> >   /*
> >    * is_journal_abort
> >    *
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ