netdev - Re: [PATCH net V2 1/2] net: stmmac: Premature loop termination check was ignored on rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230318190125.175b0fea@kernel.org>
Date:   Sat, 18 Mar 2023 19:01:25 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Jochen Henneberg <jh@...neberg-systemdesign.com>
Cc:     netdev@...r.kernel.org,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Jose Abreu <joabreu@...opsys.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Maxime Coquelin <mcoquelin.stm32@...il.com>,
        Ong Boon Leong <boon.leong.ong@...el.com>,
        linux-stm32@...md-mailman.stormreply.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net V2 1/2] net: stmmac: Premature loop termination
 check was ignored on rx

On Sat, 18 Mar 2023 09:38:12 +0100 Jochen Henneberg wrote:
> > Are you sure? Can you provide more detailed analysis?
> > Do you observe a problem / error in real life or is this theoretical?  
> 
> This is theoretical, I was hunting another bug and just stumbled over
> the check which is, I think you agree, pointless right now. I did not
> try to force execute that code path.

If you have the HW it's definitely worth doing. There is a fault
injection infra in Linus which allows to fail memory allocations.
Or you can just make a little patch to the driver to fake failing
every 1000th allocation.

> > As far as I can tell only path which jumps to read_again after doing
> > count++ is via the drain_data jump, but I can't tell how it's
> > discarding subsequent segments in that case..
> >  
> >> -read_again:
> >>  		buf1_len = 0;
> >>  		buf2_len = 0;
> >>  		entry = next_entry;  
> 
> Correct. The read_again is triggered in case that the segment is not the
> last segment of the frame:
> 
> 		if (likely(status & rx_not_ls))
> 			goto read_again;
> 
> So in case there is no skb (queue error) it will keep increasing count
> until the last segment has been found with released device DMA
> ownership. So skb will not change while the goto loop is running, the
> only thing that will change is that subsequent segments release device
> DMA ownership. The dirty buffers are then cleaned up from
> stmmac_rx_refill().

To be clear - I'm only looking at stmmac_rx(), that ZC one is even more
confusing.

Your patch makes sense, but I think it's not enough to make this code
work in case of memory allocation failure. AFAIU the device supports
scatter - i.e. receiving a single frame in multiple chunks. Each time
thru the loop we process one (or two?) chunks. But the code uses 
skb == NULL to decide whether it's the first chunk or not. So in case
of memory allocation error it will treat the second chunk as the first
(since skb will be NULL) and we'll get a malformed frame with missing
chunks sent to the stack. The driver should discard the entire frame
on failure..

> I think the driver code is really hard to read I have planned to cleanup
> things later, however, this patch simply tries to prevent us from
> returning a value greater than limit which could happen and would
> definitely be wrong.