lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 14 Aug 2020 15:17:15 -0700
From:   Minchan Kim <minchan@...nel.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Zhaoyang Huang <huangzhaoyang@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <klamm@...dex-team.ru>,
        Zhaoyang Huang <zhaoyang.huang@...soc.com>,
        "open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, liumartin@...gle.com,
        fengguang.wu@...el.com
Subject: Re: [PATCH] mm : update ra->ra_pages if it's NOT equal to
 bdi->ra_pages

On Fri, Aug 14, 2020 at 04:19:29AM +0100, Matthew Wilcox wrote:
> On Fri, Aug 14, 2020 at 10:45:37AM +0800, Zhaoyang Huang wrote:
> > On Fri, Aug 14, 2020 at 10:33 AM Andrew Morton
> > <akpm@...ux-foundation.org> wrote:
> > >
> > > On Fri, 14 Aug 2020 10:20:11 +0800 Zhaoyang Huang <huangzhaoyang@...il.com> wrote:
> > >
> > > > On Fri, Aug 14, 2020 at 10:07 AM Matthew Wilcox <willy@...radead.org> wrote:
> > > > >
> > > > > On Fri, Aug 14, 2020 at 02:43:55AM +0100, Matthew Wilcox wrote:
> > > > > > On Fri, Aug 14, 2020 at 09:30:11AM +0800, Zhaoyang Huang wrote:
> > > > > > > file->f_ra->ra_pages will remain the initialized value since it opend, which may
> > > > > > > be NOT equal to bdi->ra_pages as the latter one is updated somehow(etc,
> > > > > > > echo xxx > /sys/block/dm/queue/read_ahead_kb).So sync ra->ra_pages to the
> > > > > > > updated value when sync read.
> > > > > >
> > > > > > It still ignores the work done by shrink_readahead_size_eio()
> > > > > > and fadvise(POSIX_FADV_SEQUENTIAL).
> > > > >
> > > > > ... by the way, if you're trying to update one particular file's readahead
> > > > > state, you can just call fadvise(POSIX_FADV_NORMAL) on it.
> > > > >
> > > > > If you want to update every open file's ra_pages by writing to sysfs,
> > > > > then just no.  We don't do that.
> > > > No, What I want to fix is the file within one process's context  keeps
> > > > using the initialized value when it is opened and not sync with new
> > > > value when bdi->ra_pages changes.
> > >
> > > So you're saying that
> > >
> > >         echo xxx > /sys/block/dm/queue/read_ahead_kb
> > >
> > > does not affect presently-open files, and you believe that it should do
> > > so?
> > >
> > > I guess that could be a reasonable thing to want - it's reasonable for
> > > a user to expect that writing to a global tunable will take immediate
> > > global effect.  I guess.
> > >
> > > But as Matthew says, it would help if you were to explain why this is
> > > needed.  In full detail.  What operational problems is the present
> > > implementation causing?
> > The real scenario is some system(like android) will turbo read during
> > startup via expanding the readahead window and then set it back to
> > normal(128kb as usual). However, some files in the system process
> > context will keep to be opened since it is opened up and has no chance
> > to sync with the updated value as it is almost impossible to change
> > the files attached to the inode(processes are unaware of these
> > things). we have to fix it from a kernel perspective.
> 
> OK, this is a much more useful description of the problem, thank you!

It's not the first time we brought up the issue.
https://patchwork.kernel.org/patch/10866161/
Hopefully, we have some solution at this time.

> 
> I can think of two possibilities here.  One is that maybe our readahead
> heuristics just don't work on modern phone hardware.  Perhaps we need
> to ramp up more aggressively by default.
> 
> The other is that maybe it really is just a "boost at startup" kind
> of situation and so we should support _that_.  Some interface where
> we can set a ra_boost, and then do:
> 
> 	if (ra_boost)
> 		newsize *= 2;
> 
> in get_init_ra_size().

With kernel boot paramter, it sounds good idea to me.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ