lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <u6dy6oa6ztghy7ozficimubhb2mwppcq6gosupepnn63uu6oq7@qyph3nyq7las>
Date: Wed, 7 Jan 2026 05:19:16 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Nhat Pham <nphamcs@...il.com>, Minchan Kim <minchan@...nel.org>, 
	Johannes Weiner <hannes@...xchg.org>, Brian Geffon <bgeffon@...gle.com>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org
Subject: Re: [PATCH] zsmalloc: use actual object size to detect spans

On Wed, Jan 07, 2026 at 11:20:20AM +0900, Sergey Senozhatsky wrote:
> On (26/01/07 02:10), Yosry Ahmed wrote:
> > On Wed, Jan 07, 2026 at 11:06:09AM +0900, Sergey Senozhatsky wrote:
> > > On (26/01/07 01:56), Yosry Ahmed wrote:
> > > > > I recall us having exactly this idea when we first introduced
> > > > > zs_obj_{read,write}_end() functions, and I do recall that it
> > > > > did not work.  Somehow this panics in __memcpy+0xc/0x44.  Let
> > > > > me dig into it again.
> > > > 
> > > > Maybe because at this point we are trying to memcpy() class->size, which
> > > > already includes ZS_HANDLE_SIZE. So reading after increasing the offset
> > > > reads ZS_HANDLE_SIZE after class->size.
> > > 
> > > Yeah, I guess that falsely hits the spanning path because of extra
> > > sizeof(unsigned long).
> > 
> > Or the object could be spanning two pages indeed, but we're copying
> > extra sizeof(unsigned long), that shouldn't crash tho.
> 
> It seems there is no second page, it's a pow-of-two size class.  So
> we mis-detect spanning.
> 
> [   51.406310] zsmalloc: :: size class 48, orig offt 16336, page size 16384, memcpy sizes 40, 8
> [   51.407571] Unable to handle kernel paging request at virtual address ffffc04000000000
> [   51.420816] pc : __memcpy+0xc/0x44
> 
> Second memcpy() of sizeof(unsigned long) traps.

I think this case is exactly what you expected earlier (not sure what
you mean by the pow of 2 reply). We increase the offset by 8 bytes
(ZS_HANDLE_SIZE), but we still copy 48 bytes, even though 48 bytes
includes both the object and ZS_HANDLE_SIZE. So we end up copying 8
bytes beyond the end of the object, which puts us in the next page which
we should not be copying.

I think to fix the bug at this point we need to subtract ZS_HANDLE_SIZE
from class->size before we use it for copying or spanning detection.

Something like (untested):

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..894783d2526c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1072,6 +1072,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
        unsigned long obj, off;
        unsigned int obj_idx;
        struct size_class *class;
+       unsigned long size;
        void *addr;

        /* Guarantee we can get zspage from handle safely */
@@ -1087,7 +1088,13 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
        class = zspage_class(pool, zspage);
        off = offset_in_page(class->size * obj_idx);

-       if (off + class->size <= PAGE_SIZE) {
+       size = class->size;
+       if (!ZsHugePage(zspage)) {
+               off += ZS_HANDLE_SIZE;
+               size -= ZS_HANDLE_SIZE;
+       }
+
+       if (off + size <= PAGE_SIZE) {
                /* this object is contained entirely within a page */
                addr = kmap_local_zpdesc(zpdesc);
                addr += off;
@@ -1096,7 +1103,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,

                /* this object spans two pages */
                sizes[0] = PAGE_SIZE - off;
-               sizes[1] = class->size - sizes[0];
+               sizes[1] = size - sizes[0];
                addr = local_copy;

                memcpy_from_page(addr, zpdesc_page(zpdesc),
@@ -1107,9 +1114,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
                                 0, sizes[1]);
        }

-       if (!ZsHugePage(zspage))
-               addr += ZS_HANDLE_SIZE;
-
        return addr;
 }
 EXPORT_SYMBOL_GPL(zs_obj_read_begin);
@@ -1121,6 +1125,7 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
        struct zpdesc *zpdesc;
        unsigned long obj, off;
        unsigned int obj_idx;
+       unsigned long size;
        struct size_class *class;

        obj = handle_to_obj(handle);
@@ -1129,9 +1134,13 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
        class = zspage_class(pool, zspage);
        off = offset_in_page(class->size * obj_idx);

-       if (off + class->size <= PAGE_SIZE) {
-               if (!ZsHugePage(zspage))
-                       off += ZS_HANDLE_SIZE;
+       size = class->size;
+       if (!ZsHugePage(zspage)) {
+               off += ZS_HANDLE_SIZE;
+               size -= ZS_HANDLE_SIZE;
+       }
+
+       if (off + size <= PAGE_SIZE) {
                handle_mem -= off;
                kunmap_local(handle_mem);
        }

> 
> > I think the changes need to be shuffled around to avoid this, or just
> > have a combined patch, which would be less pretty.
> 
> I think I prefer a shuffle.
> 
> There is another possible improvement point (UNTESTED): if the first
> page holds only ZS_HANDLE bytes, then we can avoid memcpy() path and
> instead just kmap the second page + offset.

Yeah good point.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ