linux-kernel - Re: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ajg3gij4v2kzdmgpy3qjutfxsispuxjqkj7u7digdjqi5pst5i@3v2g2lra3ay3>
Date: Tue, 28 Jan 2025 09:59:55 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, Minchan Kim <minchan@...nel.org>, 
	Johannes Weiner <hannes@...xchg.org>, Nhat Pham <nphamcs@...il.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API

On (25/01/27 21:58), Yosry Ahmed wrote:
> On Mon, Jan 27, 2025 at 04:59:30PM +0900, Sergey Senozhatsky wrote:
> > Introduce new API to map/unmap zsmalloc handle/object.  The key
> > difference is that this API does not impose atomicity restrictions
> > on its users, unlike zs_map_object() which returns with page-faults
> > and preemption disabled - handle mapping API does not need a per-CPU
> > vm-area because the users are required to provide an aux buffer for
> > objects that span several physical pages.
> 
> I like the idea of supplying the buffer directly to zsmalloc, and zswap
> already has per-CPU buffers allocated. This will help remove the special
> case to handle not being able to sleep in zswap_decompress().

The interface, basically, is what we currently have, but the state
is moved out of zsmalloc internal per-CPU vm-area.

> That being said, I am not a big fan of the new API for several reasons:
> - The interface seems complicated, why do we need struct
> zs_handle_mapping? Can't the user just pass an extra parameter to
> zs_map_object/zs_unmap_object() to supply the buffer, and the return
> value is the pointer to the data within the buffer?

At least now we need to save some state - e.g. direction of the map()
so that during unmap zsmalloc determines if it needs to perform copy-out
or not.  It also needs that state in order to know if the buffer needs
to be unmapped.

zsmalloc MAP has two cases:
a) the object spans several physical non-contig pages: copy-in object into
  aux buffer and return (linear) pointer to that buffer
b) the object is contained within a physical page: kmap that page and
  return (linear) pointer to that mapping, unmap in zs_unmap_object().

> - This seems to require an additional buffer on the compress side. Right
> now, zswap compresses the page into its own buffer, maps the handle,
> and copies to it. Now the map operation will require an extra buffer.

Yes, for (a) mentioned above.

> I guess in the WO case the buffer is not needed and we can just pass
> NULL?

Yes.

> Taking a step back, it actually seems to me that the mapping interface
> may not be the best, at least from a zswap perspective. In both cases,
> we map, copy from/to the handle, then unmap. The special casing here is
> essentially handling the copy direction. Zram looks fairly similar but I
> didn't look too closely.
> 
> I wonder if the API should store/load instead. You either pass a buffer
> to be stored (equivalent to today's alloc + map + copy), or pass a
> buffer to load into (equivalent to today's map + copy). What we really
> need on the zswap side is zs_store() and zs_load(), not zs_map() with
> different mapping types and an optional buffer if we are going to
> eventually store. I guess that's part of a larger overhaul and we'd need
> to update other zpool allocators (or remove them, z3fold should be
> coming soon).

So I though about it: load and store.

zs_obj_load()
{
	zspage->page kmap, etc.
	memcpy buf page   # if direction is not WO
	unmap
}

zs_obj_store()
{
	zspage->page kmap, etc.
	memcpy page buf   # if direction is not RO
	unmap
}

load+store would not require zsmalloc to be preemptible internally, we
could just keep existing atomic locks and it would make things a little
simpler on the zram side (slot-free-notification is called from atomic
section).

But, and it's a big but.  And it's (b) from the above.  I wasn't brave
enough to just drop (b) optimization and replace it with memcpy(),
especially when we work with relatively large objects (say size-class
3600 bytes and above).  This certainly would not make battery powered
devices happier.  Maybe in zswap the page is only read once (is that
correct?), but in zram page can be read multiple times (e.g. when zram
is used as a raw block-dev, or has a mounted fs on it) which means
multiple extra memcpy()-s.