lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wXO2Hjs0HZBGsGegBAeE8YxJbCF6ZXQQ6ZnVxgR82AuQ@mail.gmail.com>
Date: Tue, 29 Oct 2024 05:15:36 +0800
From: Barry Song <21cnbao@...il.com>
To: Usama Arif <usamaarif642@...il.com>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Nhat Pham <nphamcs@...il.com>, akpm@...ux-foundation.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Barry Song <v-songbaohua@...o.com>, Chengming Zhou <chengming.zhou@...ux.dev>, 
	Johannes Weiner <hannes@...xchg.org>, David Hildenbrand <david@...hat.com>, Hugh Dickins <hughd@...gle.com>, 
	Matthew Wilcox <willy@...radead.org>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	Andi Kleen <ak@...ux.intel.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Chris Li <chrisl@...nel.org>, "Huang, Ying" <ying.huang@...el.com>, 
	Kairui Song <kasong@...cent.com>, Ryan Roberts <ryan.roberts@....com>, joshua.hahnjy@...il.com
Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin

On Tue, Oct 29, 2024 at 4:51 AM Usama Arif <usamaarif642@...il.com> wrote:
>
>
>
> On 28/10/2024 20:42, Barry Song wrote:
> > On Tue, Oct 29, 2024 at 4:00 AM Usama Arif <usamaarif642@...il.com> wrote:
> >>
> >>
> >>
> >> On 28/10/2024 19:54, Barry Song wrote:
> >>> On Tue, Oct 29, 2024 at 1:20 AM Usama Arif <usamaarif642@...il.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 28/10/2024 17:08, Yosry Ahmed wrote:
> >>>>> On Mon, Oct 28, 2024 at 10:00 AM Usama Arif <usamaarif642@...il.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 28/10/2024 16:33, Nhat Pham wrote:
> >>>>>>> On Mon, Oct 28, 2024 at 5:23 AM Usama Arif <usamaarif642@...il.com> wrote:
> >>>>>>>>
> >>>>>>>> I wonder if instead of having counters, it might be better to keep track
> >>>>>>>> of the number of zeropages currently stored in zeromap, similar to how
> >>>>>>>> zswap_same_filled_pages did it. It will be more complicated then this
> >>>>>>>> patch, but would give more insight of the current state of the system.
> >>>>>>>>
> >>>>>>>> Joshua (in CC) was going to have a look at that.
> >>>>>>>
> >>>>>>> I don't think one can substitute for the other.
> >>>>>>
> >>>>>> Yes agreed, they have separate uses and provide different information, but
> >>>>>> maybe wasteful to have both types of counters? They are counters so maybe
> >>>>>> dont consume too much resources but I think we should still think about
> >>>>>> it..
> >>>>>
> >>>>> Not for or against here, but I would say that statement is debatable
> >>>>> at best for memcg stats :)
> >>>>>
> >>>>> Each new counter consumes 2 longs per-memcg per-CPU (see
> >>>>> memcg_vmstats_percpu), about 16 bytes, which is not a lot but it can
> >>>>> quickly add up with a large number of CPUs/memcgs/stats.
> >>>>>
> >>>>> Also, when flushing the stats we iterate all of them to propagate
> >>>>> updates from per-CPU counters. This is already a slowpath so adding
> >>>>> one stat is not a big deal, but again because we iterate all stats on
> >>>>> multiple CPUs (and sometimes on each node as well), the overall flush
> >>>>> latency becomes a concern sometimes.
> >>>>>
> >>>>> All of that is not to say we shouldn't add more memcg stats, but we
> >>>>> have to be mindful of the resources.
> >>>>
> >>>> Yes agreed! Plus the cost of incrementing similar counters (which ofcourse is
> >>>> also not much).
> >>>>
> >>>> Not trying to block this patch in anyway. Just think its a good point
> >>>> to discuss here if we are ok with both types of counters. If its too wasteful
> >>>> then which one we should have.
> >>>
> >>> Hi Usama,
> >>> my point is that with all the below three counters:
> >>> 1. PSWPIN/PSWPOUT
> >>> 2. ZSWPIN/ZSWPOUT
> >>> 3. SWAPIN_SKIP/SWAPOUT_SKIP or (ZEROSWPIN, ZEROSWPOUT what ever)
> >>>
> >>> Shouldn't we have been able to determine the portion of zeromap
> >>> swap indirectly?
> >>>
> >>
> >> Hmm, I might be wrong, but I would have thought no?
> >>
> >> What if you swapout a zero folio, but then discard it?
> >> zeromap_swpout would be incremented, but zeromap_swapin would not.
> >
> > I understand. It looks like we have two issues to tackle:
> > 1. We shouldn't let zeromap swap in or out anything that vanishes into
> > a black hole
> > 2. We want to find out how much I/O/memory has been saved due to zeromap so far
> >
> > From my perspective, issue 1 requires a "fix", while issue 2 is more
> > of an optimization.
>
> Hmm I dont understand why point 1 would be an issue.
>
> If its discarded thats fine as far as I can see.

it is fine to you and probably me who knows zeromap as well :-) but
any userspace code
as below might be entirely confused:

p = malloc(1G);
write p to 0; or write part of p to 0
madv_pageout(p, 1g)
read p to swapin.

The entire procedure used to involve 1GB of swap out and 1GB of swap in by any
means. Now, it has recorded 0 swaps counted.

I don't expect userspace is as smart as you :-)

>
> As a reference, memory.stat.zswapped != memory.stat.zswapout - memory.stat.zswapin.
> Because zswapped would take into account swapped out anon memory freed, MADV_FREE,
> shmem truncate, etc as Yosry said about zeromap, But zswapout and zswapin dont.

I understand. However, I believe what we really need to focus on is
this: if we’ve
swapped out, for instance, 100GB in the past hour, how much of that 100GB is
zero? This information can help us assess the proportion of zero data in the
workload, along with the potential benefits that zeromap can provide for memory,
I/O space, or read/write operations. Additionally, having the second count
can enhance accuracy when considering MADV_DONTNEED, FREE, TRUNCATE,
and so on.

>
>
> >
> > I consider issue 1 to be more critical because, after observing a phone
> > running for some time, I've been able to roughly estimate the portion
> > zeromap can
> > help save using only PSWPOUT, ZSWPOUT, and SWAPOUT_SKIP, even without a
> > SWPIN counter. However, I agree that issue 2 still holds significant value
> > as a separate patch.
> >

Thanks
 Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ