lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230214085008.gix3dvvyg7wcf3bz@quack3>
Date:   Tue, 14 Feb 2023 09:50:08 +0100
From:   Jan Kara <jack@...e.cz>
To:     Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc:     Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
        Theodore Ts'o <tytso@....edu>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        rookxu <brookxu.cn@...il.com>,
        Ritesh Harjani <ritesh.list@...il.com>
Subject: Re: [PATCH v3 7/8] ext4: Use rbtrees to manage PAs instead of inode
 i_prealloc_list


Hi Ojaswin!

On Mon 13-02-23 23:28:41, Ojaswin Mujoo wrote:
> On Fri, Feb 10, 2023 at 03:37:53PM +0100, Jan Kara wrote:
> > > See my comments at the end for more info.
> > > 
> > > > 
> > > > > Also, another thing I noticed is that after ext4_mb_normalize_request(),
> > > > > sometimes the original range can also exceed the normalized goal range,
> > > > > which is again was a bit surprising to me since my understanding was
> > > > > that normalized range would always encompass the orignal range.
> > > > 
> > > > Well, isn't that because (part of) the requested original range is already
> > > > preallocated? Or what causes the goal range to be shortened?
> > > > 
> > > Yes I think that pre existing PAs could be one of the cases.
> > > 
> > > Other than that, I'm also seeing some cases of sparse writes which can cause
> > > ext4_mb_normalize_request() to result in having an original range that
> > > overflows out of the goal range. For example, I observed these values just
> > > after the if else if else conditions in the function, before we check if range
> > > overlaps pre existing PAs:
> > > 
> > > orig_ex:2045/2055(len:10) normalized_range:0/2048, orig_isize:8417280
> > > 
> > > Basically, since isize is large and we are doing a sparse write, we end
> > > up in the following if condition:
> > > 
> > > 	} else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,
> > > 								(8<<20)>>bsbits, max, 8 * 1024)) {
> > > 		start_off = ((loff_t)ac->ac_o_ex.fe_logical >> (23 - bsbits)) << 23;
> > > 		size = 8 * 1024 * 1024;
> > >  }
> > > 
> > > Resulting in normalized range less than original range.
> > 
> > I see.
> > 
> > > Now, in any case, once we get such an overflow, if we try to enter the PA
> > > adjustment window in ext4_mb_new_inode_pa() function, we will again end
> > > up with a best extent overflowing out of goal extent since we would try
> > > to cover the original extent. 
> > > 
> > > So yeah, seems like there are 2 cases where we could result in
> > > overlapping PAs:
> > > 
> > > 1. Due to off calculation in PA adjustment window, as we discussed.  2.
> > > Due to original extent overflowing out of goal extent.
> > > 
> > > I think the 3 step solution you proposed works well to counter 1 but not
> > > 2, so we probably need some more logic on top of your solution to take
> > > care of that.  I'll think some more on how to fix this but I think this
> > > will be as a separate patch.
> > 
> > Well, my algorithm will still result in preallocation being within the goal
> > range AFAICS. In the example above we had:
> > 
> > Orig 2045/2055 Goal 0/2048
> > 
> > Suppose we found 200 blocks. So we try placing preallocation like:
> > 
> > 1848/2048, it covers the original starting block 2045 so we are fine and
> > create preallocation 1848/2048. Nothing has overflown the goal window...
> 
> Ahh okay, I though you meant checking if we covered the complete
> original range instead of just the original start. In that case we
> should be good.
> 
> However, I still feel that the example where we unnecessarily end up with 
> a lesser goal len than original len should not happen. Maybe in
> ext4_mb_normalize_request(), instead of hardcording the goal lengths for
> different i_sizes, we can select the next power of 2 greater than our
> original length or some similar heuristic. What do you think?

I agree that seems suboptimal but I'd leave tweaking this heuristic for a
separate patchset. In this one I'd just fix the minimum to make ranges not
overlap. The current heuristic makes sense as an anti-fragmentation measure
when there's enough free space. We can tweak the goal window heuristic of
mballoc but it would require some deeper thought and measurements, how it
affects fragmentation for various workloads so it is not just about
changing those several lines of code...

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ