linux-kernel - Re: [PATCH] net/skbuff: silence warnings under memory pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1567692555.5576.91.camel@lca.pw>
Date:   Thu, 05 Sep 2019 10:09:15 -0400
From:   Qian Cai <cai@....pw>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Michal Hocko <mhocko@...nel.org>, davem@...emloft.net,
        netdev@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Petr Mladek <pmladek@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] net/skbuff: silence warnings under memory pressure

On Thu, 2019-09-05 at 10:32 +0200, Eric Dumazet wrote:
> 
> On 9/4/19 10:42 PM, Qian Cai wrote:
> 
> > To summary, those look to me are all good long-term improvement that would
> > reduce the likelihood of this kind of livelock in general especially for
> > other
> > unknown allocations that happen while processing softirqs, but it is still
> > up to
> > the air if it fixes it 100% in all situations as printk() is going to take
> > more
> > time and could deal with console hardware that involve irq_exit() anyway.
> > 
> > On the other hand, adding __GPF_NOWARN in the build_skb() allocation will
> > fix
> > this known NET_TX_SOFTIRQ case which is common when softirqd involved at
> > least
> > in short-term. It even have a benefit to reduce the overall warn_alloc()
> > noise
> > out there.
> > 
> > I can resubmit with an update changelog. Does it make any sense?
> 
> It does not make sense.
> 
> We have thousands other GFP_ATOMIC allocations in the networking stacks.

Instead of repeatedly make generalize statements, could you enlighten me with
some concrete examples that have the similar properties which would trigger a
livelock,

- guaranteed GFP_ATOMIC allocations when processing softirq batches.
- the allocation has a fallback mechanism that is unnecessary to warn a failure.

I thought "skb" is a special-case here as every packet sent or received is
handled using this data structure.

> 
> Soon you will have to send more and more patches adding __GFP_NOWARN once
> your workloads/tests can hit all these various points.

I doubt so.

> 
> It is really time to fix this problem generically, instead of having
> to review hundreds of patches.
> 
> This was my initial feedback really, nothing really has changed since.

I feel like you may not follow the thread closely. There are more details
uncovered in the last few days and narrowed down to the culprits.

> 
> The ability to send a warning with a stack trace, holding the cpu
> for many milliseconds should not be decided case by case, otherwise
> every call points will decide to opt-out from the harmful warnings.

That is not really the reasons anymore why I asked to add a __GPF_NOWARN here.