linux-ext4 - [gnehzuil.liu@...il.com: Re: [RFC][PATCH 3/3] ext4: add dio overwrite nolock]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20120502160356.GA3970@gmail.com>
Date:	Thu, 3 May 2012 00:03:56 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	linux-ext4@...r.kernel.org
Subject: [gnehzuil.liu@...il.com: Re: [RFC][PATCH 3/3] ext4: add dio
 overwrite nolock]

[Sorry, when I reply this mail to the mailing list, I get an error.  So
I resend it to the mailing list.]

----- Forwarded message from Zheng Liu <gnehzuil.liu@...il.com> -----

Date: Wed, 2 May 2012 23:49:35 +0800
From: Zheng Liu <gnehzuil.liu@...il.com>
To: Eric Sandeen <sandeen@...hat.com>
Cc: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>, Zheng Liu <wenqing.lz@...bao.com>
Subject: Re: [RFC][PATCH 3/3] ext4: add dio overwrite nolock

On Wednesday, May 2, 2012, Eric Sandeen wrote:

> On 4/27/12 10:39 PM, Zheng Liu wrote:
> > From: Zheng Liu <wenqing.lz@...bao.com <javascript:;>>
> >
> > Aligned and overwrite direct IO can be parallelized.  In
> ext4_file_dio_write,
> > we first check whether these conditions are satisfied or not.  If so, we
> unlock
> > the i_mutex and acquire i_data_sem directly.  Meanwhile iocb->private is
> set to
> > indicate that this is a overwrite dio, and it will be processed in
> > ext4_ext_direct_IO.
>
> This copies almost 100 lines of generic_file_aio_write() back into
> ext4.  Do we really need to do this?  Copying core code into the
> fs can be a maintenance nightmare...
>
> I'll have to think more about the big picture and whether or not it's
> possible, but my first reaction is to find a way to leverage or modify
> existing IO code rather than pasting it all into ext4 with changes...


The key issue is file_update_time.  In this function, it starts a new
journal.  It will cause a deadlock because we usually start a new journal
firstly, and then try to lock i_data_sem.  But in overwrite this order is
reversed.  If we can avoid this deadlock, we won't need to copy core code
into ext4.

Regards,
Zheng


>
> -Eric
>
> > Signed-off-by: Zheng Liu <wenqing.lz@...bao.com>
> > ---
> >  fs/ext4/file.c |  140
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 files changed, 137 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> > index e5d6be3..8a5f713 100644
> > --- a/fs/ext4/file.c
> > +++ b/fs/ext4/file.c
> > @@ -100,9 +100,21 @@ static ssize_t
> >  ext4_file_dio_write(struct kiocb *iocb, const struct iovec *iov,
> >                   unsigned long nr_segs, loff_t pos)
> >  {
> > -     struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
> > -     int unaligned_aio = 0;
> > +     struct file *file = iocb->ki_filp;
> > +     struct address_space * mapping = file->f_mapping;
> > +     struct inode *inode = file->f_path.dentry->d_inode;
> > +     struct blk_plug plug;
> >       ssize_t ret;
> > +     ssize_t written, written_buffered;
> > +     size_t length = iov_length(iov, nr_segs);
> > +     size_t ocount;          /* original count */
> > +     size_t count;           /* after file limit checks */
> > +     int unaligned_aio = 0;
> > +     int overwrite = 0;
> > +     loff_t *ppos = &iocb->ki_pos;
> > +     loff_t endbyte;
> > +
> > +     BUG_ON(iocb->ki_pos != pos);
> >
> >       if (!is_sync_kiocb(iocb))
> >               unaligned_aio = ext4_unaligned_aio(inode, iov, nr_segs,
> pos);
> > @@ -121,7 +133,129 @@ ext4_file_dio_write(struct kiocb *iocb, const
> struct iovec *iov,
> >               ext4_aiodio_wait(inode);
> >       }
> >
> > -     ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
> > +     mutex_lock(&inode->i_mutex);
> > +     blk_start_plug(&plug);
> > +
> > +     ocount = 0;
> > +     ret = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
> > +     if (ret)
> > +             goto unlock_out;
> > +
> > +     count = ocount;
> > +     pos = *ppos;
> > +
> > +     vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
> > +
> > +     /* We can write back this queue in page reclaim */
> > +     current->backing_dev_info = mapping->backing_dev_info;
> > +     written = 0;
> > +
> > +     ret = generic_write_checks(file, &pos, &count,
> S_ISBLK(inode->i_mode));
> > +     if (ret)
> > +             goto out;
> > +
> > +     if (count == 0)
> > +             goto out;
> > +
> > +     ret = file_remove_suid(file);
> > +     if (ret)
> > +             goto out;
> > +
> > +     file_update_time(file);
> > +
> > +     iocb->private = NULL;
> > +
> > +     if (!unaligned_aio && !file->f_mapping->nrpages &&
> > +         pos + length < i_size_read(inode) &&
> > +         ext4_should_dioread_nolock(inode)) {
> > +             struct ext4_map_blocks map;
> > +             unsigned int blkbits = inode->i_blkbits;
> > +             int err;
> > +             int len;
> > +
> > +             map.m_lblk = pos >> blkbits;
> > +             map.m_len = (EXT4_BLOCK_ALIGN(pos + length, blkbits) >>
> blkbits)
> > +                     - map.m_lblk;
> > +             len = map.m_len;
> > +
> > +             err = ext4_map_blocks(NULL, inode, &map, 0);
> > +             if (err == len && (!map.m_flags ||
> > +                 map.m_flags & EXT4_MAP_MAPPED)) {
> > +

----- End forwarded message -----
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html