lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkbCDe1Y__0vUKt9q0dz_sXM74fKGQo2Zgq9CJ8=FEjH3w@mail.gmail.com>
Date: Wed, 25 Sep 2024 14:06:26 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@...el.com>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, nphamcs@...il.com, chengming.zhou@...ux.dev, 
	usamaarif642@...il.com, shakeel.butt@...ux.dev, ryan.roberts@....com, 
	ying.huang@...el.com, 21cnbao@...il.com, akpm@...ux-foundation.org, 
	nanhai.zou@...el.com, wajdi.k.feghali@...el.com, vinodh.gopal@...el.com
Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store().

On Wed, Sep 25, 2024 at 1:13 PM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Wed, Sep 25, 2024 at 12:39:02PM -0700, Yosry Ahmed wrote:
> > On Wed, Sep 25, 2024 at 12:20 PM Johannes Weiner <hannes@...xchg.org> wrote:
> > >
> > > On Wed, Sep 25, 2024 at 11:30:34AM -0700, Yosry Ahmed wrote:
> > > > Johannes wrote:
> > > > > If this ever becomes an issue, we can handle it in a fastpath-slowpath
> > > > > scheme: check the limit up front for fast-path failure if we're
> > > > > already maxed out, just like now; then make obj_cgroup_charge_zswap()
> > > > > atomically charge against zswap.max and unwind the store if we raced.
> > > > >
> > > > > For now, I would just keep the simple version we currently have: check
> > > > > once in zswap_store() and then just go ahead for the whole folio.
> > > >
> > > > I am not totally against this but I feel like this is too optimistic.
> > > > I think we can keep it simple-ish by maintaining an ewma for the
> > > > compression ratio, we already have primitives for this (see
> > > > DECLARE_EWMA).
> > > >
> > > > Then in zswap_store(), we can use the ewma to estimate the compressed
> > > > size and use it to do the memcg and global limit checks once, like we
> > > > do today. Instead of just checking if we are below the limits, we
> > > > check if we have enough headroom for the estimated compressed size.
> > > > Then we call zswap_store_page() to do the per-page stuff, then do
> > > > batched charging and stats updates.
> > >
> > > I'm not sure what you gain from making a non-atomic check precise. You
> > > can get a hundred threads determining down precisely that *their*
> > > store will fit exactly into the last 800kB before the limit.
> >
> > We just get to avoid overshooting in cases where we know we probably
> > can't fit it anyway. If we have 4KB left and we are trying to compress
> > a 2MB THP, for example. It just makes the upfront check to avoid
> > pointless compression a little bit more meaningful.
>
> I think I'm missing something. It's not just an upfront check, it's
> the only check. The charge down the line doesn't limit anything, it
> just counts. So if this check passes, we WILL store the folio. There
> is no pointless compression.

I got confused by what you said about the fast-slow path, I thought
you were suggesting we do this now, so I was saying it's better to use
an estimate of the compressed size in the fast path to avoid pointless
compression.

I missed the second paragraph.

>
> We might overshoot the limit by about one folio in a single-threaded
> scenario. But that is negligible in comparison to the overshoot we can
> get due to race conditions.
>
> Again, I see no no practical, meaningful difference in outcome by
> making that limit check any more precise. Just keep it as-is.

> Sorry to be blunt, but "precision" in a non-atomic check like this?
> makes no sense. The fact that it's not too expensive is irrelevant.
> This discussion around this honestly has gone off the rails.

Yeah I thought we were talking about the version where we rollback
compressions if we overshoot, my bad. We discussed quite a few things
and I managed to confuse myself.

> Just leave the limit checks exactly as they are. Check limits and
> cgroup_may_zswap() once up front. Compress the subpages. Acquire
> references and bump all stats in batches of folio_nr_pages(). You can
> add up the subpage compressed bytes in the for-loop and do the
> obj_cgroup_charge_zswap() in a single call at the end as well.

We can keep the limit checks as they are for now, and revisit as needed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ