lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmxPzrnve-LKKhVNnHCpTeYV=MkuBu0qaAu_YmQP5CSXhg@mail.gmail.com>
Date: Mon, 27 Jan 2025 09:34:27 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org, 
	kuba@...nel.org
Subject: Re: [PATCH net-next v6 04/12] net: homa: create homa_pool.h and homa_pool.c

On Mon, Jan 27, 2025 at 1:41 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 1/25/25 12:53 AM, John Ousterhout wrote:
> > On Thu, Jan 23, 2025 at 4:06 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > ...
> >>> +     pool->descriptors = kmalloc_array(pool->num_bpages,
> >>> +                                       sizeof(struct homa_bpage),
> >>> +                                       GFP_ATOMIC);
> >>
> >> Possibly wort adding '| __GFP_ZERO' and avoid zeroing some fields later.
> >
> > I prefer to do all the initialization explicitly (this makes it
> > totally clear that a zero value is intended, as opposed to accidental
> > omission of an initializer). If you still think I should use
> > __GFP_ZERO, let me know and I'll add it.
>
> Indeed the __GFP_ZERO flag is the preferred for such allocation, as it
> at very least reduce the generated code size.

OK, I have added __GFP_ZERO and removed explicit zero initializers,
both here and in similar situations elsewhere in the code.

> >>> +int homa_pool_get_pages(struct homa_pool *pool, int num_pages, __u32 *pages,
> >>> +                     int set_owner)
> >>> +{
> >>> +     int core_num = raw_smp_processor_id();
> >>
> >> Why the 'raw' variant? If this code is pre-emptible it means another
> >> process could be scheduled on the same core...
> >
> > My understanding is that raw_smp_processor_id is faster.
> > homa_pool_get_pages is invoked with a spinlock held, so there is no
> > risk of a core switch while it is executing. Is there some other
> > problem I have missed?
>
> raw_* variants, alike __* ones, fall under the 'use at your own risk'
> category.
>
> In this specific case raw_smp_processor_id() is supposed to be used if
> you don't care the process being move on other cores while using the
> 'id' value.
>
> Using raw_smp_processor_id() and building with the CONFIG_DEBUG_PREEMPT
> knob, the generated code will miss run-time check for preemption being
> actually disabled at invocation time. Such check will be added while
> using smp_processor_id(), with no performance cost for non debug build.

I'm pretty confident that the raw variant is safe. However, are you
saying that there is no performance advantage of the raw version in
production builds? If so, then I might as well switch to the non-raw
version.

> >> ____cacheline_aligned instead of inserting the struct into an union
> >> should suffice.
> >
> > Done (but now that alloc_percpu_gfp is being used I'm not sure this is
> > needed to ensure alignment?).
>
> Yep, cacheline alignment should not be needed for percpu data.

OK, I've removed the alignment directive for the percpu data.

-John-

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ