lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAF=yD-KcX-zCgZFVVVMU7JFy+gJwRpUoViA_mWdM4QtHNr685g@mail.gmail.com> Date: Tue, 28 May 2019 13:07:02 -0400 From: Willem de Bruijn <willemdebruijn.kernel@...il.com> To: Fred Klassen <fklassen@...neta.com> Cc: "David S. Miller" <davem@...emloft.net>, Alexey Kuznetsov <kuznet@....inr.ac.ru>, Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>, Shuah Khan <shuah@...nel.org>, Network Development <netdev@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, linux-kselftest@...r.kernel.org, Willem de Bruijn <willemb@...gle.com> Subject: Re: [PATCH net 4/4] net/udpgso_bench_tx: audit error queue On Tue, May 28, 2019 at 12:57 PM Fred Klassen <fklassen@...neta.com> wrote: > > > > > On May 28, 2019, at 8:08 AM, Willem de Bruijn <willemdebruijn.kernel@...il.com> wrote: > > > > I will push up latest patches soon. > > I did some testing and discovered that only TCP audit tests failed. They > failed much less often when enabling poll. Once in about 20 runs > still failed. Therefore I commented out the TCP audit tests. Sounds good, thanks. > You may be interested that I reduced test lengths from 4 to 3 seconds, > but I am still getting 3 reports per test. I picked up the extra report by > changing 'if (tnow > treport)’ to 'if (tnow >= treport)’ Nice! > > The only issue specific to GSO is that xmit_more can forego this > > doorbell until the last segment. We want to complicate this logic with > > a special case based on tx_flags. A process that cares should either > > not use GSO, or the timestamp should be associated with the last > > segment as I've been arguing so far. > > This is the area I was thinking of looking into. I’m not sure it will work > or that it will be too messy. It may be worth a little bit of digging to > see if there is anything there. That will be down the road a bu Sorry, I meant we [do not (!)] want to complicate this logic. And definitely don't want to read skb_shinfo where that cacheline isn't accessed already. Given xmit_more, do you still find the first segment the right one to move the timestamp tx_flags to in __udp_gso_segment? > > >> > >> I’ll get back to you when I have tested this more thoroughly. Early results > >> suggest that adding the -P poll() option has fixed it without any appreciable > >> performance hit. I’ll share raw results with you, and we can make a final > >> decision together. > > > > In the main loop? It still is peculiar that notifications appear to go > > missing unless the process blocks waiting for them. Nothing in > > sock_zerocopy_callback or the queueing onto the error queue should > > cause drops, as far as I know. > > > > Now that I know the issue is only in TCP, I can speculate that all bytes are > being reported, but done with fewer messages. It may warrant some > investigation in case there is some kind of bug. This would definitely still be a bug and should not happen. We have quite a bit of experience with TCP zerocopy and I have not run into this in practice, so I do think that it is somehow a test artifact. > > Indeed. Ideally even run all tests, but return error if any failed, > > like this recent patch > > > > selftests/bpf: fail test_tunnel.sh if subtests fail > > https://patchwork.ozlabs.org/patch/1105221/ > > > > but that may be a lot of code churn and better left to a separate patch. > > I like it. I have it coded up, and it seems to work well. I’ll make a > separate commit in the patch set so we can yank it out if you feel > it is too much Great. Yes, it sounds like an independent improvement, in which case it is easier to review as a separate patch. If you already have it, by all means send it as part of the larger patchset.
Powered by blists - more mailing lists