linux-ext4 - Re: [ext2/ext3] Re-allocation of blocks for an inode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <37d33d830903122035i722711f8qbae9e618d99217ac@mail.gmail.com>
Date:	Fri, 13 Mar 2009 09:05:56 +0530
From:	Sandeep K Sinha <sandeepksinha@...il.com>
To:	Greg Freemyer <greg.freemyer@...il.com>
Cc:	ext4 <linux-ext4@...r.kernel.org>,
	Kernel Newbies <kernelnewbies@...linux.org>,
	Manish Katiyar <mkatiyar@...il.com>
Subject: Re: [ext2/ext3] Re-allocation of blocks for an inode

On Fri, Mar 13, 2009 at 12:07 AM, Greg Freemyer <greg.freemyer@...il.com> wrote:
> On Thu, Mar 12, 2009 at 1:59 PM, Sandeep K Sinha
> <sandeepksinha@...il.com> wrote:
>> On Thu, Mar 12, 2009 at 9:43 PM, Greg Freemyer <greg.freemyer@...il.com> wrote:
>>> On Thu, Mar 12, 2009 at 11:16 AM, Sandeep K Sinha
>>> <sandeepksinha@...il.com> wrote:
>>>> Hi all,
>>>>
>>>> I am listing above two codes for re-allocating new blocks for a given inode.
>>>>
>>>> These code snippets are specific to  ext2.
>>>> The problem here is that it is performing slower than cp and dd.
>>>>
>>>> Can anyone help in determining what can be the possible reason(s) ?
>>>
>>> Excessive use of sync_dirty_buffer();
>>>
>>> You're calling it for every single block you copy. Pretty sure it is a
>>> very aggressive call that forces the data out to the disk immediately,
>>> thus the benefits of caching and elevators are lost.  And those are
>>> big benefits.
>>>
>>
>> probably the reason being that the ext2 has some issues with sync.
>> http://kerneltrap.org/index.php?q=mailarchive/linux-fsdevel/2007/2/14/316836/thread
>>
>> Hence, if we don't do this we see a difference in the original and
>> reallocated file.
>
> Sandeep,
>
> As I read that bug report, the issue would be resolved by doing:
>
> truncate(a)
> fsync(a)
> allocate_additional_blocks_for_b()
> fsync(b)
>
> But for the reporter the issue is that the code manipulating file A is
> independent of the code manipulating file B thus it needs to be fixed
> in a different way.
>
> Since you have total control of both the files (inodes & data blocks)
> in question, you should be able to find a better way than calling
> sync_dirty_buffer() for every block you copy.
>
 We are looking over it and expect to find a better way to do it.

> Have you actually seen data corruption if you leave out the
> sync_dirty_buffer() call, or is it a theoretical issue?
Yes, It has been practically. We can see a data corruption.

>
> Have you tried calling ext2_sync_inode() at the end of the copy loop
> instead of sync_dirty_buffer() on every iteration.  I really can't see
> why that would be any more dangerous.  Yet it would at least allow the
> elevators and caches to work for an entire file at a time.
>

Let me do this for you once again and get back to you. If I can
recall, we did try that but were able to see data corruption.
Will get back to you soon with the reply.

> Greg
>
>>> Greg
>>>
>>>
>>>>
>>>> 1. MEMCPY
>>>> =======
>>>>
>>>>
>>>>
>>>>                src_ind = ext2_iget (sb, inum);
>>>>        if (!src_ind) {
>>>>                printk (KERN_DEBUG "\n OHSM Source Inode Not Found ");
>>>>                goto out_err;
>>>>        }
>>>>
>>>>        /* Getting the ghost inode */
>>>>        dest_ind = ext2_new_inode (nd.path.dentry->d_inode, src_ind->i_mode);
>>>>        if (IS_ERR(dest_ind)) {
>>>>                printk (KERN_ALERT "\n OHSM destination inode unable to allocate.
>>>> err = %ld", PTR_ERR(dest_ind));
>>>>                goto out_err;
>>>>        }
>>>>
>>>>        /* Locking the source inode using mutex */
>>>>        mutex_lock (&src_ind->i_mutex);
>>>>
>>>>        /* Get the source inode size and number of blocks */
>>>>        i_size = i_size_read(src_ind);
>>>>
>>>>        /* Calculating number of block to allocate for ghost inode */
>>>>        nblocks = (i_size >> EXT2_BLOCK_SIZE_BITS(sb)) +
>>>>                   ((i_size & (sb->s_blocksize - 1)) ? 1 : 0);
>>>>
>>>>        memset(&dest_bh, 0, sizeof(struct buffer_head));
>>>>        memset(&src_bh, 0, sizeof(struct buffer_head));
>>>>        for (done = 0; done < nblocks; done++) {
>>>>
>>>>             err = ext2_get_block (src_ind, done, &src_bh, 0);
>>>>                if (err < 0) {
>>>>                        printk (KERN_DEBUG "\n error getting blocks ret_val = %d",err);
>>>>                        goto unlock;
>>>>                }
>>>>
>>>>                dest_bh.b_state = 0;
>>>>                err = ext2_get_block (dest_ind, done, &dest_bh, 1);
>>>>                if (err < 0) {
>>>>                        printk (KERN_DEBUG "\n error getting blocks ret_val = %d",err);
>>>>                        goto unlock;
>>>>                }
>>>>
>>>>                if (!buffer_mapped(&src_bh)){
>>>>                        printk (KERN_DEBUG "\nHOLE ");
>>>>                        continue;
>>>>                }
>>>>
>>>>                src_bhptr = sb_bread(sb, src_bh.b_blocknr);
>>>>                if (!src_bhptr)
>>>>                        goto unlock;
>>>>
>>>>                dst_bhptr = sb_bread(sb, dest_bh.b_blocknr);
>>>>                if (!dst_bhptr)
>>>>                        goto unlock;
>>>>
>>>>                lock_buffer(dst_bhptr);
>>>>                memcpy(dst_bhptr->b_data,src_bhptr->b_data,src_bhptr->b_size);
>>>>                unlock_buffer(dst_bhptr);
>>>>
>>>>                mark_buffer_dirty(dst_bhptr);
>>>>                sync_dirty_buffer(dst_bhptr);
>>>>
>>>>                brelse(src_bhptr);
>>>>                brelse(dst_bhptr);
>>>>
>>>>        }
>>>>
>>>>        for (blk_index = 0; blk_index < 15; blk_index++) {
>>>>                if (EXT2_I (dest_ind)->i_data[blk_index]){
>>>>                        swap = EXT2_I (src_ind)->i_data[blk_index];
>>>>                        EXT2_I (src_ind)->i_data[blk_index] =
>>>>                                EXT2_I (dest_ind)->i_data[blk_index];
>>>>                        EXT2_I (dest_ind)->i_data[blk_index] = swap;
>>>>                }
>>>>        }
>>>>
>>>>        dest_ind->i_blocks = src_ind->i_blocks ;
>>>>        mark_inode_dirty (src_ind);
>>>>        sync_mapping_buffers(src_ind->i_mapping);
>>>>        ohsm_sync_inode(src_ind);
>>>>        iput(src_ind);
>>>>
>>>>        dest_ind->i_nlink = 0;
>>>>        iput(dest_ind);
>>>>        mutex_unlock (&src_ind->i_mutex);
>>>>        goto out_err;
>>>>
>>>> unlock:
>>>>        brelse(src_bhptr);
>>>>        brelse(dst_bhptr);
>>>>        mutex_unlock (&src_ind->i_mutex);
>>>>
>>>> out_err:
>>>>        path_put (&nd.path);
>>>>        return;
>>>>
>>>>
>>>>
>>>> 2. PAGEFLIP
>>>> ========
>>>>
>>>>
>>>> /* Get the source inode */
>>>>
>>>>        src_ind = ext2_iget (sb, inum);
>>>>        if (!src_ind) {
>>>>                printk (KERN_DEBUG "\n OHSM Source Inode Not Found ");
>>>>                goto out_err;
>>>>        }
>>>>
>>>>        /* Getting the ghost inode */
>>>>        dest_ind = ext2_new_inode (nd.path.dentry->d_inode, src_ind->i_mode);
>>>>        if (IS_ERR(dest_ind)) {
>>>>                printk (KERN_ALERT "\n destination inode unable to allocate. err =
>>>> %ld", PTR_ERR(dest_ind));
>>>>                goto out_err;
>>>>        }
>>>>
>>>>        /* Locking the source inode using mutex */
>>>>        mutex_lock (&src_ind->i_mutex);
>>>>
>>>>        /* Get the source inode size and number of blocks */
>>>>        i_size = i_size_read(src_ind);
>>>>
>>>>        /* Calculating number of block to allocate for ghost inode */
>>>>        nblocks = (i_size >> EXT2_BLOCK_SIZE_BITS(sb)) +
>>>>                   ((i_size & (sb->s_blocksize - 1)) ? 1 : 0);
>>>>
>>>>
>>>>        memset(&dest_bh, 0, sizeof(struct buffer_head));
>>>>        memset(&src_bh, 0, sizeof(struct buffer_head));
>>>>        for (done = 0; done < nblocks; done++) {
>>>>
>>>>             err = ext2_get_block (src_ind, done, &src_bh, 0);
>>>>                if (err < 0) {
>>>>                        printk (KERN_DEBUG "\n error getting blocks ret_val = %d",err);
>>>>                        goto unlock;
>>>>                }
>>>>
>>>>                dest_bh.b_state = 0;
>>>>                err = ext2_get_block (dest_ind, done, &dest_bh, 1);
>>>>                if (err < 0) {
>>>>                        printk (KERN_DEBUG "\n error getting blocks ret_val = %d",err);
>>>>                        goto unlock;
>>>>                }
>>>>
>>>>                if (!buffer_mapped(&src_bh)){
>>>>                        printk (KERN_DEBUG "\nHOLE ");
>>>>                        continue;
>>>>                }
>>>>
>>>>                src_bhptr = sb_bread(sb, src_bh.b_blocknr);
>>>>                if (!src_bhptr)
>>>>                        goto unlock;
>>>>
>>>>                dst_bhptr = sb_bread(sb, dest_bh.b_blocknr);
>>>>                if (!dst_bhptr)
>>>>                        goto unlock;
>>>>
>>>>               lock_buffer(dst_bhptr);
>>>>               oldpage = dst_bhptr->b_page;
>>>>               olddata = dst_bhptr->b_data;
>>>>               dst_bhptr->b_data = src_bhptr->b_data;
>>>>               dst_bhptr->b_page = src_bhptr->b_page;
>>>>               flush_dcache_page(dst_bhptr->b_page);
>>>>               unlock_buffer(dst_bhptr);
>>>>
>>>>               mark_buffer_dirty(dst_bhptr);
>>>>               sync_dirty_buffer(dst_bhptr);
>>>>
>>>>               if(buffer_uptodate(dst_bhptr)) {
>>>>                dst_bhptr->b_page  = oldpage;
>>>>                dst_bhptr->b_data = olddata;
>>>>               }
>>>>
>>>>             brelse(src_bhptr);
>>>>             brelse(dst_bhptr);
>>>>        }
>>>>
>>>>        for (blk_index = 0; blk_index < 15; blk_index++) {
>>>>                if (EXT2_I (dest_ind)->i_data[blk_index]){
>>>>                        swap = EXT2_I (src_ind)->i_data[blk_index];
>>>>                        EXT2_I (src_ind)->i_data[blk_index] =
>>>>                                EXT2_I (dest_ind)->i_data[blk_index];
>>>>                        EXT2_I (dest_ind)->i_data[blk_index] = swap;
>>>>                }
>>>>        }
>>>>
>>>>        dest_ind->i_blocks = src_ind->i_blocks ;
>>>>        mark_inode_dirty (src_ind);
>>>>        sync_mapping_buffers(src_ind->i_mapping);
>>>>        ohsm_sync_inode(src_ind);
>>>>        iput(src_ind);
>>>>
>>>>        dest_ind->i_nlink = 0;
>>>>        iput(dest_ind);
>>>>        mutex_unlock (&src_ind->i_mutex);
>>>>        goto out_err;
>>>>
>>>> unlock:
>>>>        brelse(src_bhptr);
>>>>        brelse(dst_bhptr);
>>>>        mutex_unlock (&src_ind->i_mutex);
>>>>
>>>> out_err:
>>>>        path_put (&nd.path);
>>>>        return;
>>>> }
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Sandeep.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> “To learn is to change. Education is a process that changes the learner.”
>>>>
>>>
>>>
>>>
>>> --
>>> Greg Freemyer
>>> Litigation Triage Solutions Specialist
>>> http://www.linkedin.com/in/gregfreemyer
>>> First 99 Days Litigation White Paper -
>>> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>>>
>>> The Norcross Group
>>> The Intersection of Evidence & Technology
>>> http://www.norcrossgroup.com
>>>
>>
>>
>>
>> --
>> Regards,
>> Sandeep.
>>
>>
>>
>>
>>
>>
>> “To learn is to change. Education is a process that changes the learner.”
>>
>
>
>
> --
> Greg Freemyer
> Head of EDD Tape Extraction and Processing team
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> First 99 Days Litigation White Paper -
> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html