linux-kernel - Re: [PATCH 3/3] mm: add vmstat statistics for madvise

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y8hhAslIzgNH3hzv@dhcp22.suse.cz>
Date:   Wed, 18 Jan 2023 22:13:38 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Minchan Kim <minchan@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        SeongJae Park <sj@...nel.org>
Subject: Re: [PATCH 3/3] mm: add vmstat statistics for madvise_[cold|pageout]

On Wed 18-01-23 09:55:38, Minchan Kim wrote:
> On Wed, Jan 18, 2023 at 06:27:02PM +0100, Michal Hocko wrote:
> > On Wed 18-01-23 09:15:34, Minchan Kim wrote:
> > > On Wed, Jan 18, 2023 at 10:11:46AM +0100, Michal Hocko wrote:
> > > > On Tue 17-01-23 15:16:32, Minchan Kim wrote:
> > > > > madvise LRU manipulation APIs need to scan address ranges to find
> > > > > present pages at page table and provides advice hints for them.
> > > > > 
> > > > > Likewise pg[scan/steal] count on vmstat, madvise_pg[scanned/hinted]
> > > > > shows the proactive reclaim efficiency so this patch addes those
> > > > > two statistics in vmstat.
> > > > 
> > > > Please describe the usecase for those new counters.
> > > 
> > > I wanted to know the proactive reclaim efficieny using MADV_COLD/MDDV_PAGEOUT.
> > > Userspace has several policy which when/which vmas need to be hinted by the call
> > > and they are evolving. I needed to know how effectively their policy works since
> > > the vma ranges are huge(i.e., nr_hinted/nr_scanned).
> > 
> > I can see how that can be an interesting information but is there
> > anything actionable about that beyond debugging purposes? In other words
> > isn't this something that could be done by tracing instead?
> 
> That's the statictis for telemetry. With those stat, we are collecting
> various vmstat fields(i.e., pgsteal/pgscan) from real field devices
> and thought those two stats would be good fit along with other reclaim
> statistics in vmstat since we can know how much proactive madvise policy
> could make system healthier(e.g., less kswapd scan, less allocstall
> and so on).
> 
> > 
> > Also how are you going to identify specific madvise calls when they can
> > interleave arbitrarily?
> 
> I guess you are talking about how we could separate MADV_PAGEOUT and
> MADV_COLD from vmstat. That's valid question. I thought for the start,
> adds just umbrella stat like this and if we want to break down, we need
> to introudce sysfs likewise slab. 

No, not really. MADV_COLD is about aging. There is no actual reclaim
going on so pgscan/steal metrics do not make any sense. I am asking
about potential different concurrent MADV_PAGEOUT happening. From what
you've said earlier (how effectively policy works) I have understood you
want to find out how a specific MADV_PAGEOUT effective is. But there
maybe different callers of this applied to all sorts of different memory
mappings and therefore the efficiency might be really different. As
there is no clear way to tell one from the other I am really questioning
whether this global stat is actually useful.

-- 
Michal Hocko
SUSE Labs