lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 14 Feb 2012 16:01:36 +0800
From:	Robin Dong <hao.bigrat@...il.com>
To:	"Ted Ts'o" <tytso@....edu>, Aditya Kali <adityakali@...gle.com>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: s_dirtyclusters_counter should tranform to unit of
 cluster before assigning to "dirty_clusters" in ext4_has_free_clusters()

在 2012年2月10日 上午10:55,Ted Ts'o <tytso@....edu> 写道:
> On Thu, Feb 02, 2012 at 03:48:49PM +0800, Robin Dong wrote:
>>
>> The reason is ext4_has_free_clusters reporting wrong
>> result. Actually, the unit of argument "dirty_clusters" is block, so
>> we should tranform the sbi->s_dirtyclusters_counter to block , just
>> like "free_clusters".
>
> I've been looking at the the delayed allocation reservation code, and
> I think it's a lot more confused that this.  Some places we are
> playing with fields as if they are clusters, and some times as if they
> are blocks, and I've definitely found places where we're not handling
> quota correctly with bigalloc (ext4_map_blocks is calling
> ext4_da_update_reserve_space() with a value which is clearly in units
> of blocks, but we are treating that vale as if it is clusters, etc.)
>
> So I think the problem is a lot larger than what you pointed out, and
> we shouldn't fix it just by throwing in an EXT4_C2B call.  If
> s_dirtyclusters_counter should be denominated in blocks, then we need
> to rename it.  And there there's this comment which made me wince:
>
>           * Note that in case of bigalloc, i_reserved_meta_blocks,
>           * i_reserved_data_blocks, etc. refer to number of clusters.
>
> Argh, no.  If something is denominated in clusters, then it should be
> appropriately named as such.
>
> Actually, I think the problem is a lot bigger than this, and it's a
> conceptual one.  When we try writing to a block which does not have a
> mapping, and bigalloc is enabled, there are three cases:
>
> 1) This is the first time the cluster has ever been written to; hence,
> even though we are dirtying a block, we need to reserve room for the
> entire cluster.
>
> 2) While the block has not been mapped (and so the data on disk is
> uninitialized) the cluster has indeed been allocated, so we don't need
> to reserve any additional room for the block.
>
> 3) One or more blocks in the cluster have already been written to via
> delayed allocation, but the cluster itself has not been selected
> (allocated) yet.  But since the space for the entire cluster was
> reserved when the first block in the cluster was dirtied (case #1), we
> don't need to reserve any more space.
>
> We aren't handling these cases correctly today, since we don't have
> this logic here.  These cases don't matter if you're only using
> fallocate() and direct I/O, but if you are using buffered writes and
> delayed allocation, they obviously matter a huge amount.
>
> And I don't think we're handling it correctly.   Argh...
>
>                                                 - Ted

Hi, Ted and Aditya

I am trying to fix this "bigalloc quota reservation in
delay-allocation" problem recently, and I feel confusion about these
code below in ext4_ext_map_blocks():

        if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
                unsigned int reserved_clusters;
                /*
                 * Check how many clusters we had reserved this allocated range
                 */
                reserved_clusters = get_reserved_cluster_alloc(inode,
                                                map->m_lblk, allocated);
                if (map->m_flags & EXT4_MAP_FROM_CLUSTER) {
                        if (reserved_clusters) {
                                ......
                                ext4_da_update_reserve_space(inode,
                                                reserved_clusters, 0);
                        }
                } else {
                        BUG_ON(allocated_clusters < reserved_clusters);
                        /* We will claim quota for all newly allocated blocks.*/
                        ext4_da_update_reserve_space(inode, allocated_clusters,
                                                        1);
                        if (reserved_clusters < allocated_clusters) {
                                struct ext4_inode_info *ei = EXT4_I(inode);
                                int reservation = allocated_clusters -
                                                  reserved_clusters;
                                dquot_reserve_block(inode,
                                                EXT4_C2B(sbi, reservation));
                                spin_lock(&ei->i_block_reservation_lock);
                                ei->i_reserved_data_blocks += reservation;
                                spin_unlock(&ei->i_block_reservation_lock);
                        }
                }
        }

if we have allocated 3 clusters, why should we add "reservation" back
to i_reservd_data_blocks ? (Aditya Kali 's long comment dose not make
sense to me) Maybe we don't need to do ext4_da_update_reserve_space
when the new block is allocated from "implied_cluster"? Why not just
change them to:

        if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
                if ( !(map->m_flags & EXT4_MAP_FROM_CLUSTER) )
                        /* We will claim quota for all newly allocated blocks.*/
                        ext4_da_update_reserve_space(inode, allocated_clusters,
                                                        1);
        }

-- 
--
Best Regard
Robin Dong
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ