lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH9Oa-ZcG0+08d=D5-rbzY-v1cdUcuW0E7D_GcwjDoC1Phf+0g@mail.gmail.com>
Date:   Fri, 18 Jun 2021 10:31:35 +0200
From:   Michael Stapelberg <stapelberg+linux@...gle.com>
To:     Miklos Szeredi <miklos@...redi.hu>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm <linux-mm@...ck.org>,
        linux-fsdevel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
        Dennis Zhou <dennis@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        Roman Gushchin <guro@...com>,
        Johannes Thumshirn <johannes.thumshirn@....com>,
        Jan Kara <jack@...e.cz>, Song Liu <song@...nel.org>,
        David Sterba <dsterba@...e.com>
Subject: Re: [PATCH] backing_dev_info: introduce min_bw/max_bw limits

Hey Miklos

Thanks for taking a look!

On Fri, 18 Jun 2021 at 10:18, Miklos Szeredi <miklos@...redi.hu> wrote:
>
> On Thu, 17 Jun 2021 at 11:53, Michael Stapelberg
> <stapelberg+linux@...gle.com> wrote:
> >
> > These new knobs allow e.g. FUSE file systems to guide kernel memory
> > writeback bandwidth throttling.
> >
> > Background:
> >
> > When using mmap(2) to read/write files, the page-writeback code tries to
> > measure how quick file system backing devices (BDI) are able to write data,
> > so that it can throttle processes accordingly.
> >
> > Unfortunately, certain usage patterns, such as linkers (tested with GCC,
> > but also the Go linker) seem to hit an unfortunate corner case when writing
> > their large executable output files: the kernel only ever measures
> > the (non-representative) rising slope of the starting bulk write, but the
> > whole file write is already over before the kernel could possibly measure
> > the representative steady-state.
> >
> > As a consequence, with each program invocation hitting this corner case,
> > the FUSE write bandwidth steadily sinks in a downward spiral, until it
> > eventually reaches 0 (!). This results in the kernel heavily throttling
> > page dirtying in programs trying to write to FUSE, which in turn manifests
> > itself in slow or even entirely stalled linker processes.
> >
> > Change:
> >
> > This commit adds 2 knobs which allow avoiding this situation entirely on a
> > per-file-system basis by restricting the minimum/maximum bandwidth.
>
>
> This looks like  a bug in the dirty throttling heuristics, that may be
> effecting multiple fuse based filesystems.
>
> Ideally the solution should be a fix to those heuristics, not adding more knobs.


Agreed.

>
>
> Is there a fundamental reason why that can't be done?    Maybe the
> heuristics need to detect the fact that steady state has not been
> reached, and not modify the bandwidth in that case, or something along
> those lines.

Maybe, but I don’t have the expertise, motivation or time to
investigate this any further, let alone commit to get it done.
During our previous discussion I got the impression that nobody else
had any cycles for this either:
https://lore.kernel.org/linux-fsdevel/CANnVG6n=ySfe1gOr=0ituQidp56idGARDKHzP0hv=ERedeMrMA@mail.gmail.com/

Have you had a look at the China LSF report at
http://bardofschool.blogspot.com/2011/?
The author of the heuristic has spent significant effort and time
coming up with what we currently have in the kernel:

"""
Fengguang said he draw more than 10K performance graphs and read even
more in the past year.
"""

This implies that making changes to the heuristic will not be a quick fix.

I think adding these limit knobs could be useful regardless of the
specific heuristic behavior.
The knobs are certainly easy to understand, safe to introduce (no regressions),
and can be used to fix the issue at hand as well as other issues (if
any, now or in the future).

Thanks
Best regards
Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ