[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7til71dy42.fsf@redhat.com>
Date: Fri, 20 Oct 2023 13:04:13 -0400
From: Aaron Conole <aconole@...hat.com>
To: "Nicholas Piggin" <npiggin@...il.com>
Cc: <netdev@...r.kernel.org>, <dev@...nvswitch.org>, "Pravin B Shelar"
<pshelar@....org>, "Eelco Chaudron" <echaudro@...hat.com>, "Ilya
Maximets" <imaximet@...hat.com>, "Flavio Leitner" <fbl@...hat.com>,
"Paolo Abeni" <pabeni@...hat.com>, "Jakub Kicinski" <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>, "Eric Dumazet"
<edumazet@...gle.com>
Subject: Re: [PATCH 0/7] net: openvswitch: Reduce stack usage
"Nicholas Piggin" <npiggin@...il.com> writes:
> On Wed Oct 11, 2023 at 11:23 PM AEST, Aaron Conole wrote:
>> Nicholas Piggin <npiggin@...il.com> writes:
>>
>> > Hi,
>> >
>> > I'll post this out again to keep discussion going. Thanks all for the
>> > testing and comments so far.
>>
>> Thanks for the update - did you mean for this to be tagged RFC as well?
>
> Yeah, it wasn't intended for merge with no RB or tests of course.
> I intended to tag it RFC v2.
I only did a basic test with this because of some other stuff, and I
only tested 1, 2, and 3. I didn't see any real performance changes, but
that is only with a simple port-port test. I plan to do some additional
testing with some recursive calls. That will also help to understand
the limits a bit.
That said, I'm very nervous about the key allocator, especially if it is
possible that it runs out. We probably will need the limit to be
bigger, but I want to get a worst-case flow from OVN side to understand.
>>
>> I don't see any performance data with the deployments on x86_64 and
>> ppc64le that cause these stack overflows. Are you able to provide the
>> impact on ppc64le and x86_64?
>
> Don't think it'll be easy but they are not be pushing such rates
> so it wouldn't say much. If you want to show the worst case, those
> tput and latency microbenchmarks should do it.
>
> It's the same tradeoff and reasons the per-cpu key allocator was
> added in the first place, presumably. Probably more expensive than
> stack, but similar order of magnitude O(cycles) vs slab which is
> probably O(100s cycles).
>
>> I guess the change probably should be tagged as -next since it doesn't
>> really have a specific set of commits it is "fixing." It's really like
>> a major change and shouldn't really go through stable trees, but I'll
>> let the maintainers tell me off if I got it wrong.
>
> It should go upstream first if anything. I thought it was relatively
> simple and elegant to reuse the per-cpu key allocator though :(
>
> It is a kernel crash, so we need something for stable. But In a case
> like this there's not one single problem. Linux kernel stack use has
> always been pretty dumb - "don't use too much", for some values of
> too much, and just cross fingers config and compiler and worlkoad
> doesn't hit some overflow case.
>
> And powerpc has always used more stack x86, so probably it should stay
> one power-of-two larger to be safe. And that may be the best fix for
> -stable.
Given the reply from David (with msg-id:
<ff6cd12e28894f158d9a6c9f7157487f@...MS.aculab.com>), are there other
things we can look at with respect to the compiler as well?
> But also, ovs uses too much stack. Look at the stack sizes in the first
> RFC patch, and ovs takes the 5 largest. That's because it has always
> been the practice to not put large variables on stack, and when you're
> introducing significant recursion that puts extra onus on you to be
> lean. Even if it costs a percent. There are probably lots of places in
> the kernel that could get a few cycles by sticking large structures on
> stack, but unfortunately we can't all do that.
Well, OVS operated this way for at least 6 years, so it isn't a recent
thing. But we should look at it.
I also wonder if we need to recurse in the internal devices, or if we
shouldn't just push the skb into the packet queue. That will cut out
1/3 of the stack frame that you reported originally, and then when doing
the xmit, will cut out 2/3rds. I have no idea what the performance
impact hit there might be. Maybe it looks more like a latency hit
rather than a throughput hit, but just speculating.
> Thanks,
> Nick
Powered by blists - more mailing lists