linux-kernel - Re: [PATCH v1] neighbour: Don't let neigh_forced

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=XvczUqrYVzDYy=uQ7-LMcRgpJj3iiehzfJNAq=UvJXbg@mail.gmail.com>
Date:   Fri, 1 Dec 2023 10:40:41 -0800
From:   Doug Anderson <dianders@...omium.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Judy Hsiao <judyhsiao@...omium.org>,
        David Ahern <dsahern@...nel.org>,
        Simon Horman <horms@...nel.org>,
        Brian Haley <haleyb.dev@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Joel Granados <joel.granados@...il.com>,
        Julian Anastasov <ja@....bg>,
        Leon Romanovsky <leon@...nel.org>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org
Subject: Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable
 preemption for long

Hi,

On Fri, Dec 1, 2023 at 9:35 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> > > > > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > >  {
> > > > >         int max_clean = atomic_read(&tbl->gc_entries) -
> > > > >                         READ_ONCE(tbl->gc_thresh2);
> > > > > +       u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> > > >
> > > > It might be nice to make the above timeout based on jiffies. On a
> > > > HZ=100 system it's probably OK to keep preemption disabled for 10 ms
> > > > but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
> > > > jiffies_to_nsecs(1)?
> > >
> > > I do not think so. 10ms would be awfully long.
> > >
> > > We have nsec based time service, why downgrading to jiffies resolution ???
> >
> > Well, the whole issue is that we're disabling preemption, right?
> > Unless I'm mistaken, on a 1000 HZ system then a task's timeslice is
> > 1ms and on a 100 HZ system then a task's timeslice is 10ms. When we
> > disable preemption then the problem is that we can keep going past the
> > end of our timeslice. This is bad for whatever task the system is
> > trying to schedule instead of us since it will be blocked waiting for
> > us to re-enable preemption.
> >
> > So essentially the problem here is really tied to jiffies resolution,
> > right? Specifically, if jiffies is 100 Hz then it's actually
> > inefficient to timeout every 1 ms--I think it would be better to use
> > up our whole timeslice.
>
> It is not because a kernel is built with HZ=100 that each thread has
> to consume cpu times in 10ms slices.
>
> Process scheduler does not use jiffies at all, but high resolution time service.
>
> Keep in mind this code is run from soft-interrupt, not a dedicated processus.

Fair enough. I guess my mental model is wrong here. Please disregard
my suggestion about using something based on how long "jiffies" is
then. Using a fixed 1 ms timeout sounds great, then.


> > > Can you tell us in which scenario this gc_list can be so big, other
> > > than fuzzers ?
> >
> > The place we hit this wasn't actually with fuzzers but with normal
> > usage in our labs. The only case where it was a really big problem was
> > when neigh_forced_gc() was scheduled on a "little" CPU (in a
> > big.LITTLE system) and that little CPU happened to be running at the
> > lowest CPU frequency. Specifically Judy was testing on sc7180-trogdor
> > and the lowest CPU Frequency of the "little" CPUs was 300 MHz. Since
> > the littles are less powerful than the bigs, this is roughly the
> > equivalent processing power of a big core running at 120 MHz.
> >
> > FWIW, we are apparently no longer seeing the bad latency after
> > <https://crrev.com/c/4914309>, which does this:
> >
> > # Increase kernel neighbor table size.
> > echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
> > echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
> > echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
> > echo 1024 > /proc/sys/net/ipv6/neigh/default/gc_thresh1
> > echo 4096 > /proc/sys/net/ipv6/neigh/default/gc_thresh2
> > echo 8192 > /proc/sys/net/ipv6/neigh/default/gc_thresh3
> >
> > However, I still believe that we should land something like Judy's
> > patch because, no matter what kernel tunings we have, the kernel
> > shouldn't be disabling preemption for so long.
>
> Sure, and I suggested a refinement, because as I said jiffies can
> stick to a value.
>
> Not sure why a refinement given by a network maintainer is not an option ?
>
> I must be missing something.

Your refinement was good. I was merely trying to answer the question
you asked about how we got into it as completely as possible. Sorry if
I caused confusion.