lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241112084539.702485-1-jimzhao.ai@gmail.com>
Date: Tue, 12 Nov 2024 16:45:39 +0800
From: Jim Zhao <jimzhao.ai@...il.com>
To: jack@...e.cz
Cc: akpm@...ux-foundation.org,
	jimzhao.ai@...il.com,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	willy@...radead.org
Subject: Re: [PATCH] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit

> On Fri 08-11-24 11:19:49, Jim Zhao wrote:
> > > On Wed 23-10-24 18:00:32, Jim Zhao wrote:
> > > > With the strictlimit flag, wb_thresh acts as a hard limit in
> > > > balance_dirty_pages() and wb_position_ratio(). When device write
> > > > operations are inactive, wb_thresh can drop to 0, causing writes to
> > > > be blocked. The issue occasionally occurs in fuse fs, particularly
> > > > with network backends, the write thread is blocked frequently during
> > > > a period. To address it, this patch raises the minimum wb_thresh to a
> > > > controllable level, similar to the non-strictlimit case.
> > > >
> > > > Signed-off-by: Jim Zhao <jimzhao.ai@...il.com>
> > >
> > > ...
> > >
> > > > +       /*
> > > > +        * With strictlimit flag, the wb_thresh is treated as
> > > > +        * a hard limit in balance_dirty_pages() and wb_position_ratio().
> > > > +        * It's possible that wb_thresh is close to zero, not because
> > > > +        * the device is slow, but because it has been inactive.
> > > > +        * To prevent occasional writes from being blocked, we raise wb_thresh.
> > > > +        */
> > > > +       if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
> > > > +               unsigned long limit = hard_dirty_limit(dom, dtc->thresh);
> > > > +               u64 wb_scale_thresh = 0;
> > > > +
> > > > +               if (limit > dtc->dirty)
> > > > +                       wb_scale_thresh = (limit - dtc->dirty) / 100;
> > > > +               wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / 4));
> > > > +       }
> > >
> > > What you propose makes sense in principle although I'd say this is mostly a
> > > userspace setup issue - with strictlimit enabled, you're kind of expected
> > > to set min_ratio exactly if you want to avoid these startup issues. But I
> > > tend to agree that we can provide a bit of a slack for a bdi without
> > > min_ratio configured to ramp up.
> > >
> > > But I'd rather pick the logic like:
> > >
> > >   /*
> > >    * If bdi does not have min_ratio configured and it was inactive,
> > >    * bump its min_ratio to 0.1% to provide it some room to ramp up.
> > >    */
> > >   if (!wb_min_ratio && !numerator)
> > >           wb_min_ratio = min(BDI_RATIO_SCALE / 10, wb_max_ratio / 2);
> > >
> > > That would seem like a bit more systematic way than the formula you propose
> > > above...
> >
> > Thanks for the advice.
> > Here's the explanation of the formula:
> > 1. when writes are small and intermittent,wb_thresh can approach 0, not
> > just 0, making the numerator value difficult to verify.
>
> I see, ok.
>
> > 2. The ramp-up margin, whether 0.1% or another value, needs
> > consideration.
> > I based this on the logic of wb_position_ratio in the non-strictlimit
> > scenario: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); It seems
> > provides more room and ensures ramping up within a controllable range.
>
> I see, thanks for explanation. So I was thinking how to make the code more
> consistent instead of adding another special constant and workaround. What
> I'd suggest is:
>
> 1) There's already code that's supposed to handle ramping up with
> strictlimit in wb_update_dirty_ratelimit():
>
>         /*
>          * For strictlimit case, calculations above were based on wb counters
>          * and limits (starting from pos_ratio = wb_position_ratio() and up to
>          * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate).
>          * Hence, to calculate "step" properly, we have to use wb_dirty as
>          * "dirty" and wb_setpoint as "setpoint".
>          *
>          * We rampup dirty_ratelimit forcibly if wb_dirty is low because
>          * it's possible that wb_thresh is close to zero due to inactivity
>          * of backing device.
>          */
>         if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
>                 dirty = dtc->wb_dirty;
>                 if (dtc->wb_dirty < 8)
>                         setpoint = dtc->wb_dirty + 1;
>                 else
>                         setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2;
>         }
>
> Now I agree that increasing wb_thresh directly is more understandable and
> transparent so I'd just drop this special case.

yes, I agree.

> 2) I'd just handle all the bumping of wb_thresh in a single place instead
> of having is spread over multiple places. So __wb_calc_thresh() could have
> a code like:
>
>         wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE)
>         wb_thresh *= numerator;
>         wb_thresh = div64_ul(wb_thresh, denominator);
>
>         wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio);
>
>         wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE);
>       limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh);
>         /*
>          * It's very possible that wb_thresh is close to 0 not because the
>          * device is slow, but that it has remained inactive for long time.
>          * Honour such devices a reasonable good (hopefully IO efficient)
>          * threshold, so that the occasional writes won't be blocked and active
>          * writes can rampup the threshold quickly.
>          */
>       if (limit > dtc->dirty)
>               wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8);
>       if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE))
>               wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);
>
> and we can drop the bumping from wb_position)_ratio(). This way have the
> wb_thresh bumping in a single logical place. Since we still limit wb_tresh
> with max_ratio, untrusted bdis for which max_ratio should be configured
> (otherwise they can grow amount of dirty pages upto global treshold anyway)
> are still under control.
>
> If we really wanted, we could introduce a different bumping in case of
> strictlimit, but at this point I don't think it is warranted so I'd leave
> that as an option if someone comes with a situation where this bumping
> proves to be too aggressive.

Thank you, this is very helpful. And I have 2 concerns:

1.
In the current non-strictlimit logic, wb_thresh is only bumped within wb_position_ratio() for calculating pos_ratio, and this bump isn’t restricted by max_ratio. 
I’m unsure if moving this adjustment to __wb_calc_thresh() would effect existing behavior. 
Would it be possible to keep the current logic for non-strictlimit case?

2. Regarding the formula:
wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8);

Consider a case: 
With 100 fuse devices(with high max_ratio) experiencing high writeback delays, the pages being written back are accounted in NR_WRITEBACK_TEMP, not dtc->dirty. 
As a result, the bumped wb_thresh may remain high. While individual devices are under control, the total could exceed expectations.

Although lowering the max_ratio can avoid this issue, how about reducing the bumped wb_thresh?

The formula in my patch:
wb_scale_thresh = (limit - dtc->dirty) / 100;
The intention is to use the default fuse max_ratio(1%) as the multiplier.


Thanks
Jim Zhao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ