linux-kernel - Re: [PATCH net V2 1/2] net: stmmac: Premature loop termination check was ignored on rx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r0tj23eh.fsf@henneberg-systemdesign.com>
Date:   Mon, 20 Mar 2023 10:04:54 +0100
From:   Jochen Henneberg <jh@...neberg-systemdesign.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Jose Abreu <joabreu@...opsys.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Maxime Coquelin <mcoquelin.stm32@...il.com>,
        Ong Boon Leong <boon.leong.ong@...el.com>,
        linux-stm32@...md-mailman.stormreply.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net V2 1/2] net: stmmac: Premature loop termination
 check was ignored on rx


Jakub Kicinski <kuba@...nel.org> writes:

> On Sat, 18 Mar 2023 09:38:12 +0100 Jochen Henneberg wrote:
>> > Are you sure? Can you provide more detailed analysis?
>> > Do you observe a problem / error in real life or is this theoretical?  
>> 
>> This is theoretical, I was hunting another bug and just stumbled over
>> the check which is, I think you agree, pointless right now. I did not
>> try to force execute that code path.
>
> If you have the HW it's definitely worth doing. There is a fault
> injection infra in Linus which allows to fail memory allocations.
> Or you can just make a little patch to the driver to fake failing
> every 1000th allocation.
>

I have the hardware available and will do the check.

>> > As far as I can tell only path which jumps to read_again after doing
>> > count++ is via the drain_data jump, but I can't tell how it's
>> > discarding subsequent segments in that case..
>> >  
>> >> -read_again:
>> >>  		buf1_len = 0;
>> >>  		buf2_len = 0;
>> >>  		entry = next_entry;  
>> 
>> Correct. The read_again is triggered in case that the segment is not the
>> last segment of the frame:
>> 
>> 		if (likely(status & rx_not_ls))
>> 			goto read_again;
>> 
>> So in case there is no skb (queue error) it will keep increasing count
>> until the last segment has been found with released device DMA
>> ownership. So skb will not change while the goto loop is running, the
>> only thing that will change is that subsequent segments release device
>> DMA ownership. The dirty buffers are then cleaned up from
>> stmmac_rx_refill().
>
> To be clear - I'm only looking at stmmac_rx(), that ZC one is even more
> confusing.
>
> Your patch makes sense, but I think it's not enough to make this code
> work in case of memory allocation failure. AFAIU the device supports
> scatter - i.e. receiving a single frame in multiple chunks. Each time
> thru the loop we process one (or two?) chunks. But the code uses 
> skb == NULL to decide whether it's the first chunk or not. So in case
> of memory allocation error it will treat the second chunk as the first
> (since skb will be NULL) and we'll get a malformed frame with missing
> chunks sent to the stack. The driver should discard the entire frame
> on failure..
>

Understood. However, this forces me to read the code and datahseet very
carefully to understand the details. I will do that, however, it will
take me some time.

For the ST and Synopsys people:
I could imagine that you would be able to fix this much faster than
I can, so if they want to work on this please let me know so I don't
waste my time on doing double work.

>> I think the driver code is really hard to read I have planned to cleanup
>> things later, however, this patch simply tries to prevent us from
>> returning a value greater than limit which could happen and would
>> definitely be wrong.