netdev - Re: [PATCH net-next 1/3] net-timestamp: add strict check when setting tx flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoA+rHS_i4yWowK-cefOtjavfZSkifYsTEqB7Bu8i3MusA@mail.gmail.com>
Date: Tue, 1 Oct 2024 08:42:59 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, dsahern@...nel.org, shuah@...nel.org, willemb@...gle.com, 
	linux-kselftest@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next 1/3] net-timestamp: add strict check when setting
 tx flags

On Tue, Oct 1, 2024 at 2:15 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Jason Xing wrote:
> > On Tue, Oct 1, 2024 at 1:14 AM Willem de Bruijn
> > <willemdebruijn.kernel@...il.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Mon, Sep 30, 2024 at 7:49 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@...il.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > On Mon, Sep 30, 2024 at 6:39 PM Willem de Bruijn
> > > > > > <willemdebruijn.kernel@...il.com> wrote:
> > > > > > >
> > > > > > > Jason Xing wrote:
> > > > > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > > > > >
> > > > > > > > Even though this case is unlikely to happen, we have to avoid such
> > > > > > > > a case occurring at an earlier point: the sk_rmem_alloc could get
> > > > > > > > increased because of inserting more and more skbs into the errqueue
> > > > > > > > when calling __skb_complete_tx_timestamp(). This bad case would stop
> > > > > > > > the socket transmitting soon.
> > > > > > >
> > > > > > > It is up to the application to read from the error queue frequently
> > > > > > > enough and/or increase SO_RCVBUF.
> > > > > >
> > > > > > Sure thing. If we test it without setting SOF_TIMESTAMPING_SOFTWARE on
> > > > > > the loopback, it will soon stop. That's the reason why I tried to add
> > > > > > the restriction just in case.
> > > > >
> > > > > I don't follow at all.
> > > > >
> > > > > That bit does not affect the core issue: that the application is not
> > > > > clearing its error queue quickly enough.
> > > > >
> > > > > > >
> > > > > > > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > > > > > > > ---
> > > > > > > >  net/core/sock.c | 4 ++++
> > > > > > > >  1 file changed, 4 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > > > > > index fe87f9bd8f16..4bddd6f62e4f 100644
> > > > > > > > --- a/net/core/sock.c
> > > > > > > > +++ b/net/core/sock.c
> > > > > > > > @@ -905,6 +905,10 @@ int sock_set_timestamping(struct sock *sk, int optname,
> > > > > > > >       if (val & ~SOF_TIMESTAMPING_MASK)
> > > > > > > >               return -EINVAL;
> > > > > > > >
> > > > > > > > +     if (val & SOF_TIMESTAMPING_TX_RECORD_MASK &&
> > > > > > > > +         !(val & SOF_TIMESTAMPING_SOFTWARE))
> > > > > > > > +             return -EINVAL;
> > > > > > > > +
> > > > > > >
> > > > > > > This breaks hardware timestamping
> > > > > >
> > > > > > Yes, and sorry about that. I'll fix this.
> > > > >
> > > > > As is I don't understand the purpose of this patch. Please do not
> > > > > just resubmit with a change, but explain the problem and suggested
> > > > > solution first.
> > > > >
> > > >
> > > > I will drop this patch because I just tested with my program in the
> > > > local machine and found there is one mistake I made about calculating
> > > > the diff between those two . Sorry for the noise.
> > > >
> > > > Well, I only need to send a V2 patch of patch [3/3] in the next few days.
> > > >
> > > > BTW, please allow me to ask one question unrelated to this patch
> > > > again. I do wonder: if we batch the recv skbs from the errqueue when
> > > > calling tcp_recvmsg() -> inet_recv_error(), it could break users,
> > > > right?
> > >
> > > Analogous to __msg_zerocopy_callback with __msg_zerocopy_callback.
> > >
> > > Only here we cannot return range-based results and thus cannot just
> > > expand the range of the one outstanding notification.
> > >
> > > This would mean in ip(v6)_recv_error calling sock_dequeue_err_skb,
> > > sock_recv_timestamp and put_cmsg IP_RECVERR multiple times. And
> > > ip_cmsg_recv if needed.
> > >
> > > Existing applications do not have to expect multiple results per
> > > single recvmsg call. So enabling that unconditionally could break
> > > them.
> >
> > Thanks for your explanation! I was unsure because I read some use
> > cases in github and txtimestamp.c, they can only handle one err skb at
> > one time.
> >
> > >
> > > Adding this will require a new flag. An sk_tsflag is the obvious
> > > approach.
> > >
> > > Interpreting a MSG_* flag passed to recvmsg would be
> > > another option. If there is a bit that can be set with MSG_ERRQUEUE
> > > and cannot already be set currently. But I don't think that's the
> > > case. We allow all bits and ignore any undefined ones.
> >
> > Do you feel it is necessary that we can implement this idea to
> > optimize it, saving 2 or 3 syscalls at most at one time? IIRC, it's
> > you who proposed that we can batch them when applying the tracepoints
> > mechanism after I gave a presentation at netconf :) It's really good.
> > That inspires me a lot and makes me keep wondering if we can do this
> > these days.
> >
> > Since I've already finished the bpf for timestamping feature locally
> > which bypasses receiving skbs from errqueue,
>
> That's great!
>
> > I believe it could be
> > helpful for those applications that still have tendency to use the
> > "traditional way" to trace.
> >
> > What are your thoughts on this? If you agree, do you want to do this
> > on your own or allow me to give it a try?
>
> I'd focus on the workload that you care about most, which is the
> administrator driven interface, which will use BPF.
>
> This micro optimization would need some benchmarks that show that it
> has a measurable effect.

Got it. I will post that series soon.

Thanks,
Jason