[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <u6dy6oa6ztghy7ozficimubhb2mwppcq6gosupepnn63uu6oq7@qyph3nyq7las>
Date: Wed, 7 Jan 2026 05:19:16 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Nhat Pham <nphamcs@...il.com>, Minchan Kim <minchan@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Brian Geffon <bgeffon@...gle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH] zsmalloc: use actual object size to detect spans
On Wed, Jan 07, 2026 at 11:20:20AM +0900, Sergey Senozhatsky wrote:
> On (26/01/07 02:10), Yosry Ahmed wrote:
> > On Wed, Jan 07, 2026 at 11:06:09AM +0900, Sergey Senozhatsky wrote:
> > > On (26/01/07 01:56), Yosry Ahmed wrote:
> > > > > I recall us having exactly this idea when we first introduced
> > > > > zs_obj_{read,write}_end() functions, and I do recall that it
> > > > > did not work. Somehow this panics in __memcpy+0xc/0x44. Let
> > > > > me dig into it again.
> > > >
> > > > Maybe because at this point we are trying to memcpy() class->size, which
> > > > already includes ZS_HANDLE_SIZE. So reading after increasing the offset
> > > > reads ZS_HANDLE_SIZE after class->size.
> > >
> > > Yeah, I guess that falsely hits the spanning path because of extra
> > > sizeof(unsigned long).
> >
> > Or the object could be spanning two pages indeed, but we're copying
> > extra sizeof(unsigned long), that shouldn't crash tho.
>
> It seems there is no second page, it's a pow-of-two size class. So
> we mis-detect spanning.
>
> [ 51.406310] zsmalloc: :: size class 48, orig offt 16336, page size 16384, memcpy sizes 40, 8
> [ 51.407571] Unable to handle kernel paging request at virtual address ffffc04000000000
> [ 51.420816] pc : __memcpy+0xc/0x44
>
> Second memcpy() of sizeof(unsigned long) traps.
I think this case is exactly what you expected earlier (not sure what
you mean by the pow of 2 reply). We increase the offset by 8 bytes
(ZS_HANDLE_SIZE), but we still copy 48 bytes, even though 48 bytes
includes both the object and ZS_HANDLE_SIZE. So we end up copying 8
bytes beyond the end of the object, which puts us in the next page which
we should not be copying.
I think to fix the bug at this point we need to subtract ZS_HANDLE_SIZE
from class->size before we use it for copying or spanning detection.
Something like (untested):
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..894783d2526c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1072,6 +1072,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
unsigned long obj, off;
unsigned int obj_idx;
struct size_class *class;
+ unsigned long size;
void *addr;
/* Guarantee we can get zspage from handle safely */
@@ -1087,7 +1088,13 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);
- if (off + class->size <= PAGE_SIZE) {
+ size = class->size;
+ if (!ZsHugePage(zspage)) {
+ off += ZS_HANDLE_SIZE;
+ size -= ZS_HANDLE_SIZE;
+ }
+
+ if (off + size <= PAGE_SIZE) {
/* this object is contained entirely within a page */
addr = kmap_local_zpdesc(zpdesc);
addr += off;
@@ -1096,7 +1103,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
/* this object spans two pages */
sizes[0] = PAGE_SIZE - off;
- sizes[1] = class->size - sizes[0];
+ sizes[1] = size - sizes[0];
addr = local_copy;
memcpy_from_page(addr, zpdesc_page(zpdesc),
@@ -1107,9 +1114,6 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
0, sizes[1]);
}
- if (!ZsHugePage(zspage))
- addr += ZS_HANDLE_SIZE;
-
return addr;
}
EXPORT_SYMBOL_GPL(zs_obj_read_begin);
@@ -1121,6 +1125,7 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
struct zpdesc *zpdesc;
unsigned long obj, off;
unsigned int obj_idx;
+ unsigned long size;
struct size_class *class;
obj = handle_to_obj(handle);
@@ -1129,9 +1134,13 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);
- if (off + class->size <= PAGE_SIZE) {
- if (!ZsHugePage(zspage))
- off += ZS_HANDLE_SIZE;
+ size = class->size;
+ if (!ZsHugePage(zspage)) {
+ off += ZS_HANDLE_SIZE;
+ size -= ZS_HANDLE_SIZE;
+ }
+
+ if (off + size <= PAGE_SIZE) {
handle_mem -= off;
kunmap_local(handle_mem);
}
>
> > I think the changes need to be shuffled around to avoid this, or just
> > have a combined patch, which would be less pretty.
>
> I think I prefer a shuffle.
>
> There is another possible improvement point (UNTESTED): if the first
> page holds only ZS_HANDLE bytes, then we can avoid memcpy() path and
> instead just kmap the second page + offset.
Yeah good point.
Powered by blists - more mailing lists