linux-kernel - Re: [PATCH net v2 2/2] net: clarify SO_DEVMEM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izNK6xJHDB+W80iCr2CEwbo=OecG-1UW5dB9X_0aowo8Bw@mail.gmail.com>
Date: Fri, 8 Nov 2024 10:45:07 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>, 
	Eric Dumazet <edumazet@...gle.com>, Willem de Bruijn <willemb@...gle.com>, 
	"David S. Miller" <davem@...emloft.net>, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>, Yi Lai <yi1.lai@...ux.intel.com>, 
	Stanislav Fomichev <sdf@...ichev.me>
Subject: Re: [PATCH net v2 2/2] net: clarify SO_DEVMEM_DONTNEED behavior in documentation

On Fri, Nov 8, 2024 at 10:07 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> On 11/08, Mina Almasry wrote:
> > On Thu, Nov 7, 2024 at 7:01 PM Stanislav Fomichev <stfomichev@...il.com> wrote:
> > >
> > > On 11/07, Mina Almasry wrote:
> > > > On Thu, Nov 7, 2024 at 5:30 PM Stanislav Fomichev <stfomichev@...il.com> wrote:
> > > > >
> > > > > On 11/07, Mina Almasry wrote:
> > > > > > Document new behavior when the number of frags passed is too big.
> > > > > >
> > > > > > Signed-off-by: Mina Almasry <almasrymina@...gle.com>
> > > > > > ---
> > > > > >  Documentation/networking/devmem.rst | 9 +++++++++
> > > > > >  1 file changed, 9 insertions(+)
> > > > > >
> > > > > > diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
> > > > > > index a55bf21f671c..d95363645331 100644
> > > > > > --- a/Documentation/networking/devmem.rst
> > > > > > +++ b/Documentation/networking/devmem.rst
> > > > > > @@ -225,6 +225,15 @@ The user must ensure the tokens are returned to the kernel in a timely manner.
> > > > > >  Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
> > > > > >  and will lead to packet drops.
> > > > > >
> > > > > > +The user must pass no more than 128 tokens, with no more than 1024 total frags
> > > > > > +among the token->token_count across all the tokens. If the user provides more
> > > > > > +than 1024 frags, the kernel will free up to 1024 frags and return early.
> > > > > > +
> > > > > > +The kernel returns the number of actual frags freed. The number of frags freed
> > > > > > +can be less than the tokens provided by the user in case of:
> > > > > > +
> > > > >
> > > > > [..]
> > > > >
> > > > > > +(a) an internal kernel leak bug.
> > > > >
> > > > > If you're gonna respin, might be worth mentioning that the dmesg
> > > > > will contain a warning in case of a leak?
> > > >
> > > > We will not actually warn in the likely cases of leak.
> > > >
> > > > We warn when we find an entry in the xarray that is not a net_iov, or
> > > > if napi_pp_put_page fails on that net_iov. Both are very unlikely to
> > > > happen honestly.
> > > >
> > > > The likely 'leaks' are when we don't find the frag_id in the xarray.
> > > > We do not warn on that because the user can intentionally trigger the
> > > > warning with invalid input. If the user is actually giving valid input
> > > > and the warn still happens, likely a kernel bug like I mentioned in
> > > > another thread, but we still don't warn.
> > >
> > > In this case, maybe don't mention the leaks at all? If it's not
> > > actionable, not sure how it helps?
> >
> > It's good to explain what the return code of the setsockopt means, and
> > when it would be less than the number of passed in tokens.
> >
> > Also it's not really 'not actionable'. I expect serious users of
> > devmem tcp to log such leaks in metrics and try to root cause the
> > userspace or kernel bug causing them if they happen.
>
> Right now it reads like both (a) and (b) have a similar probability. Maybe
> even (a) is more probable because you mention it first? In theory, any syscall
> can have a bug in it where it returns something bogus, so maybe at least
> downplay the 'leak' part a bit? "In the extremely rare cases, kernel
> might free less frags than requested .... "
>
> Imagine a situation where the user inadvertently tries to free the same token
> twice or something and gets the unexpected return value. Why? Might be
> the kernel leak, right?
>
> From the POW of the kernel, the most probable cases where we return
> less tokens are:
> 1. user gave us more than 1024
> 2. user gave us incorrect tokens
> ...
> 99. kernel is full of bugs and we lost the frag

The current wording doesn't make any comment about probability. More
information is better than less. I don't see a strong reason to omit
information. I think the docs are better now and will be improved
further in the future. Lets not bike shed too much on docs wording.
It's painfully obvious invalid input is more likely not subtle kernel
bugs IMO without calling out.

-- 
Thanks,
Mina