netdev - Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1490329655.9687.33.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Thu, 23 Mar 2017 21:27:35 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: [net-next PATCH v2 5/8] net: Track start of busy loop instead
 of when it should end

On Thu, 2017-03-23 at 20:42 -0700, Alexander Duyck wrote:
> On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote:
> >> From: Alexander Duyck <alexander.h.duyck@...el.com>
> >>
> >
> >> The last bit I changed is to move from using a shift by 10 to just using
> >> NSEC_PER_USEC and using multiplication for any run time calculations and
> >> division for a few compile time ones.  This should be more accurate and
> >> perform about the same on most architectures since modern CPUs typically
> >> handle multiplication without too much overhead.
> >
> >
> > busy polling thread can be preempted for more than 2 seconds.
> 
> If it is preempted is the timing value even valid anymore?  I was
> wondering about that.  Also when preemption is enabled is there
> anything to prevent us from being migrated to a CPU?  If so what do we
> do about architectures that allow drift between the clocks?
> 
> > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion.
> 
> Yes, but the problem is we also opened up an issue where if the clock
> was approaching a roll-over we could add a value to it that would put
> us in a state where we would never time out.

If you believe there is a bug, send a fix for net tree ?

I really do not see a bug, given we use time_after(now, end_time) which
handles roll-over just fine.



> 
> > We do not need nsec accuracy for busy polling users, if this restricts
> > range and usability under stress.
> 
> Yes and no.  So the standard use cases suggest using values of 50 to
> 100 microseconds.  I suspect that for most people that is probably
> what they are using.  The addition of preemption kind of threw a
> wrench in the works because now instead of spending that time busy
> polling you can get preempted and then are off doing something else
> for the entire period of time.

That is fine. Busy polling heavy users are pinning threads to cpu, and
the re-schedule check is basically a way to make sure ksoftirqd will get
a chance to service BH.

> 
> What would you think of changing this so that instead of tracking the
> total time this function is active, instead we tracked the total time
> we spent with preemption disabled?  What I would do is move the
> start_time configuration to just after the preempt_disable() call.
> Then if we end up yielding to another thread we would just reset the
> start_time when we restarted instead of trying to deal with all the
> extra clock nonsense that we would have to deal with otherwise since I
> don't know if we actually want to count time where we aren't actually
> doing anything.  In addition this would bring us closer to how NAPI
> already works since it essentially will either find an event, or if we
> time out we hand it off to the softirq which in turn can handle it or
> hand it off to softirqd.  The only item that might be a bit more
> difficult to deal with then would be the way the times are used in
> fs/select.c but I don't know if that is really the right way to go
> anyway. With the preemption changes and such it might just make sense
> to drop those bits and rely on just the socket polling alone.
> 
> The other option is to switch over everything from using unsigned long
> to using uint64_t and time_after64.  Then we can guarantee the range
> needed and then some, but then we are playing with a u64 time value on
> 32b architectures which might be a bit more expensive.  Even with that
> though I still need to clean up the sysctl since it doesn't make sense
> to allow negative values for the busy_poll usec to be used which is
> currently the case.
> 
> Anyway let me know what you think and I can probably spin out a new
> set tomorrow.


Leave usec resolution, I see no point switching to nsec, as long we
support 32bit kernels.

If you believe min/max values should be added to the sysctls, because we
do not trust root anymore, please send patches only addressing that. 

Thanks.