[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z72WiUlhxOGnrXFb@harry>
Date: Tue, 25 Feb 2025 19:08:09 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Lilith Gkini <lilithpgkini@...il.com>
Cc: Christoph Lameter <cl@...ux.com>, Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] slub: Fix Off-By-One in the While condition in
on_freelist()
On Mon, Feb 24, 2025 at 02:12:13PM +0200, Lilith Gkini wrote:
> On Mon, Feb 24, 2025 at 09:00:14AM +0900, Harry Yoo wrote:
> > On second thought (after reading your email),
> > I think it should be (fp == NULL) && (search == NULL)?
> >
> > When fp is NULL after the loop, it means (the end of the freelist
> > is NULL) AND (there were equal to or less than (slab->objects - nr) objects
> > on the freelist).
> >
> > If the loop has ended after observing fp == NULL,
> > on_freelist() should return true only when search == NULL.
> >
> > If fp != NULL, it means there were more objects than it should have on
> > the free list. In that case, we can't take risk of iterating the loop
> > forever and it's reasonable to assume that the freelist does not
> > end with NULL.
> >
> > > The reason I m saying this is cause the While loop is looking through
> > > every non-NULL freelist pointer for the "search" pattern. If someone
> > > wants to search for NULL in the freelist that return 1 will never
> > > trigger, even for normal freelists. If "fp" was ever null it wouldn't re
> > > enter the loop. Thus the result of the function would be search == NULL
> > > because the original writers assumed the freelist will always end with
> > > NULL.
> >
> > Agreed.
> >
> > > As you correctly pointed out there might be a case where the freelist
> > > is corrupted and it doesnt end in NULL. In that case two things could happen:
> > >
> > > 1) The check_valid_pointer() catches it and fixes it, restoring the
> > > corrupted freelist.
> > >
> > > 2) The check_valid_pointer() fails to catch it and there isn't any NULL.
> > >
> > >
> > > In the first case the problem fixes itself and thats probably why
> > > validate_slab() worked fine all this time.
> > >
> > > The second case is pretty rare, but thats the case that validate_slab()
> > > wants to rout out when it checks the `!on_freelist(s, slab, NULL)`,
> > > right?
> >
> > Yes.
> >
> > > Also to make my added suggestion of `return fp == NULL` work in the first
> > > case (since it does corrrect the freelist we want it to now return TRUE)
> > > we could also add a line that nulls out the fp right after the
> > > `set_freepointer(s, object, NULL);`.
> >
> > Why?
> > fp = get_freepionter() will observe NULL anyway.
> >
> > > But words are cheap, I should test it out dynamically rather than just
> > > reading the code to see how it behaves. Feel free to also test it
> > > yourself.
> >
> > Yes, testing is important, and you should test
> > to some extent before submitting a patch.
> >
> > > I know I am supposed to send a proper Patch with the multiple added
> > > lines, but for now we are mostly brainstorming solutions. It will be
> > > better to see its behavior first in a debugger I think.
> >
> > I think it's generally considered good practice to discuss before
> > writing any code.
> >
> > > I hope I am making sense with the thought process I outlined for the
> > > return thing. I probably shouldn't be writing emails ealry Saturday
> > > morning, haha.
> > >
> > > Also I really appreciate the kind feedback! The Linux Kernel is infamously
> > > known for how rude and unhinged people can be, which can make it a bit
> > > stressful and intimidating sending any emails, especially for someone
> > > starting out such as myself.
> >
> > You're welcome ;-)
> >
> > --
> > Cheers,
> > Harry
Hi, Lilith.
If you don't mind, could you please avoid bottom posting and
reply with inline comments like how I reply to you?
It makes it easier to follow the conversation.
> Alright, I managed to run some tests through a debugger.
>
> You are right, my code isn't correct, `return fp == search` should be
> more appropriate.
>
> So I think the right way would be to do these changes:
> - while (fp && nr < slab->objects) {
>
> - fp = NULL;
>
> - return fp == search;
>
> The first line removes the "=" so it doesnt iterate more times than
> slab->objects.
Yes.
> The second line sets fp to NULL for the case where the code caught a
> corrupted freelist (check_valid_pointer) and fixes it by setting
> the freepointer to NULL (set_freepointer). Now NULL will be in the
> freelist.
Yes. but do we care about the return value of on_freelist()
when the freelist is corrupted? (we don't know if it was null-terminated
or not because it's corrupted.)
> The third line is the return value:
> TRUE if the final fp we got happens to be the search value the caller
> was looking for in the freelist.
> FALSE if the final fp we got was not the same as the search value.
>
> This should make the validate_slab() work right and if anyone ever uses
> the on_freelist() again with some other search value other than NULL it
> should also work as intended.
Yes! that makes sense to me.
> For my tests I wrote a kernel module that creates an isolated cache with
> 8 objects per slab, allocated all 8 of them and freed them. Then called
> validate_slab_cache() with my cache and set a breakpoint at on_freelist().
> From there I could set any value I wanted and observe its behavior
> through GDB (I used QEMU and GDB-ed remotely from my host).
> This way I didn't have to set a bunch of EXPORT_SYMBOL() and change
> stuff to not static; It made testing a lot easier.
>
> Here are the tests I did with this new change I just mentioned.
>
> Note: By "full slab" I mean that I allocated every object in the slab
> and freed them.
FYI in slab terms (slab->inuse == slab->objects) means full slab,
You probably meant empty slab?
> By "partial" I mean that I only allocated and freed some
> objects, but not every object in the slab, ie if the total objects can
> be 8 I only alloc-ed and freed 7.
>
> search == NULL
> - full slab
> - correct tail true
> - corrupted tail with garbage false
> - corrupted tail with valid address false
Two falses because when the freepointer of the tail object
is corrupted, the loop stops when nr equals slab->objects?
> - partial slab
> - correct tail true
> - corrupted tail with garbage true
This is true because for partial slabs, the number of objects
plus 1 for the garbage, does not exceed slab->objects?
> - corrupted tail with valid address false
>
> search == some fake address
> - full slab
> - correct tail false
> - corrupted tail with garbage false
> - corrupted tail with valid address false
>
> - partial slab
> - correct tail false
> - corrupted tail with garbage false
> - corrupted tail with valid address false
>
>
> search == some address in the freelist
> - full slab
> - correct tail true
> - corrupted tail with garbage true
> - corrupted tail with valid address true
>
> - partial slab
> - correct tail true
> - corrupted tail with garbage true
> - corrupted tail with valid address true
The result seems valid to me. At least, this is the best SLUB can do
while avoiding the risk of infinite loop iteration.
> I apologize if am going into too many details with my testing, I just
> wanna make sure I didn't miss anything.
No, it's ok ;-)
> If my proposed changes look confusing I can send a proper patch.
> Should I send it in this chain as a reply, or send a new email
> and add you as well to the cc?
You can either send a new email or reply to this email with
In-Reply-To header. And please cc SLAB ALLOCATOR folks in MAINTAINERS file
including me—I recently changed my name and email (previously Hyeonggon Yoo).
--
Cheers,
Harry
Powered by blists - more mailing lists