[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <in2dqgf4y3npivzl3vkby6pbjp2dv7f7geeqqsmrfy2pb3rptu@luoquhvf23gg>
Date: Fri, 11 Jul 2025 14:14:28 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Casey Chen <cachen@...estorage.com>
Cc: akpm@...ux-foundation.org, surenb@...gle.com, corbet@....net,
dennis@...nel.org, tj@...nel.org, cl@...two.org, vbabka@...e.cz, mhocko@...e.com,
jackmanb@...gle.com, hannes@...xchg.org, ziy@...dia.com, rientjes@...gle.com,
roman.gushchin@...ux.dev, harry.yoo@...cle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, yzhong@...estorage.com,
souravpanda@...gle.com, 00107082@....com
Subject: Re: [PATCH v3] alloc_tag: add per-NUMA node stats
On Fri, Jul 11, 2025 at 10:41:36AM -0700, Casey Chen wrote:
> On Thu, Jul 10, 2025 at 8:09 PM Kent Overstreet
> <kent.overstreet@...ux.dev> wrote:
> >
> > On Thu, Jul 10, 2025 at 06:07:13PM -0700, Casey Chen wrote:
> > > On Thu, Jul 10, 2025 at 5:54 PM Kent Overstreet
> > > <kent.overstreet@...ux.dev> wrote:
> > > >
> > > > On Thu, Jul 10, 2025 at 05:42:05PM -0700, Casey Chen wrote:
> > > > > Hi All,
> > > > >
> > > > > Thanks for reviewing my previous patches. I am replying some comments
> > > > > in our previous discussion
> > > > > https://lore.kernel.org/all/CAJuCfpHhSUhxer-6MP3503w6520YLfgBTGp7Q9Qm9kgN4TNsfw@mail.gmail.com/T/#u
> > > > >
> > > > > Most people care about the motivations and usages of this feature.
> > > > > Internally, we used to have systems having asymmetric memory to NUMA
> > > > > nodes. Node 0 uses a lot of memory but node 1 is pretty empty.
> > > > > Requests to allocate memory on node 0 always fail. With this patch, we
> > > > > can find the imbalance and optimize the memory usage. Also, David
> > > > > Rientjes and Sourav Panda provide their scenarios in which this patch
> > > > > would be very useful. It is easy to turn on an off so I think it is
> > > > > nice to have, enabling more scenarios in the future.
> > > > >
> > > > > Andrew / Kent,
> > > > > * I agree with Kent on using for_each_possible_cpu rather than
> > > > > for_each_online_cpu, considering CPU online/offline.
> > > > > * When failing to allocate counters for in-kernel alloc_tag, panic()
> > > > > is better than WARN(), eventually the kernel would panic at invalid
> > > > > memory access.
> > > > > * percpu stats would bloat data structures quite a bit.
> > > > >
> > > > > David Wang,
> > > > > I don't really understand what is 'granularity of calling sites'. If
> > > > > NUMA imbalance is found, the calling site could request memory
> > > > > allocation from different nodes. Other factors can affect NUMA
> > > > > balance, those information can be implemented in a different patch.
> > > >
> > > > Let's get this functionality in.
> > > >
> > > > We've already got userspace parsing and consuming /proc/allocinfo, so we
> > > > just need to do it without changing that format.
> > >
> > > You mean keep the format without per-NUMA info the same as before ?
> > > My patch v3 changed the header and the alignment of bytes and calls. I
> > > can restore them back.
> >
> > I mean an ioctl interface - so we can have a userspace program with
> > different switches for getting different types of output.
> >
> > Otherwise the existing programs people have already written for
> > consuming /proc/allocinfo are going to break.
>
> What does this IOCTL interface do ? get bytes/calls per allocating
> site ? or get total bytes/calls per module ? or per-NUMA bytes/calls
> for each allocating site or module ?
> Would it be too much work for this patch ? If you can show me an
> example, it would be useful. I can try implementing it.
Since we're adding optional features the ioctl needs to pass in a flags
argument for which features we want - per numa node stats for now, but I
suspect more will come up (maybe we'll want to revisit number of calls
per callsite).
Return -EINVAL if we ask for something the running kernel doesn't
support...
Powered by blists - more mailing lists