lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lxottj72e7jcqw634qwudpsyqckfrvpmlhra43en4zlrlz4cip@erufv6w4n5j6>
Date: Tue, 4 Feb 2025 15:59:42 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible

On (25/02/03 21:11), Yosry Ahmed wrote:
> > > We also lose some debugging capabilities as Hilf pointed out in another
> > > patch.
> > 
> > So that zspage lock should have not been a lock, I think, it's a ref-counter
> > and it's being used as one
> > 
> > map()
> > {
> > 	page->users++;
> > }
> > 
> > unmap()
> > {
> > 	page->users--;
> > }
> > 
> > migrate()
> > {
> > 	if (!page->users)
> > 		migrate_page();
> > }
> 
> Hmm, but in this case we want migration to block new map/unmap
> operations. So a vanilla refcount won't work.

Yeah, correct - migration needs negative values so that map would
wait until it's positive (or zero).

> > > Just my 2c.
> > 
> > Perhaps we can sprinkle some lockdep on it.  For instance:
> 
> Honestly this looks like more reason to use existing lock primitives to
> me. What are the candidates? I assume rw_semaphore, anything else?

Right, rwsem "was" the first choice.

> I guess the main reason you didn't use a rw_semaphore is the extra
> memory usage.

sizeof(struct zs_page) change is one thing.  Another thing is that
zspage->lock is taken from atomic sections, pretty much everywhere.
compaction/migration write-lock it under pool rwlock and class spinlock,
but both compaction and migration now EAGAIN if the lock is locked
already, so that is sorted out.

The remaining problem is map(), which takes zspage read-lock under pool
rwlock.  RFC series (which you hated with passion :P) converted all zsmalloc
into preemptible ones because of this - zspage->lock is a nested leaf-lock,
so it cannot schedule unless locks it's nested under permit it (needless to
say neither rwlock nor spinlock permit it).

> Seems like it uses ~32 bytes more than rwlock_t on x86_64.
> That's per zspage. Depending on how many compressed pages we have
> per-zspage this may not be too bad.

So on a 16GB laptop our memory pressure test at peak used approx 1M zspages.
That is 32 bytes * 1M ~ 32MB of extra memory use.  Not alarmingly a lot,
less than what a single browser tab needs nowadays.  I suppose on 4GB/8GB
that will be even smaller (because those device generate less zspages).
Numbers are not the main issue, however.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ