netdev - Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANEJEGtVFE8awJz3j9j7T2BseJ5qMd_7er7WbdPQNgrdz9F5dg@mail.gmail.com>
Date:   Tue, 26 Apr 2022 10:20:35 -0700
From:   Grant Grundler <grundler@...omium.org>
To:     Igor Russkikh <irusskikh@...vell.com>
Cc:     Grant Grundler <grundler@...omium.org>,
        Dmitry Bezrukov <dbezrukov@...vell.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        netdev <netdev@...r.kernel.org>,
        "David S . Miller" <davem@...emloft.net>,
        LKML <linux-kernel@...r.kernel.org>,
        Aashay Shringarpure <aashay@...gle.com>,
        Yi Chou <yich@...gle.com>,
        Shervin Oloumi <enlightened@...gle.com>
Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes

[reply-all again since I forgot to tell gmail to post this as "plain
text"...grrh... so much for AI figuring this stuff out.]


On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh <irusskikh@...vell.com> wrote:
>
> Hi Grant,
>
> Sorry for the delay, I was on vacation.
> Thanks for working on this.

Hi Igor!
Very welcome! And yes, I was starting to wonder... but I'm now glad
that you didn't review them before you got back. These patches are no
reason to ruin a perfectly good vacation. :)

> I'm adding here Dmitrii, to help me review the patches.
> Dmitrii, here is a full series:
>
> https://patchwork.kernel.org/project/netdevbpf/cover/20220418231746.2464800-1-grundler@chromium.org/
>
> Grant, I've reviewed and also quite OK with patches 1-4.

Excellent! \o/


> For patch 5 - why do you think we need these extra comparisons with software head/tail?

The ChromeOS security team (CC'd) believes the driver needs to verify
"expected behavior". In other words, the driver expects the device to
provide new values of tail index which are between [tail,head)
("available to fill").

Your question makes me chuckle because I asked exactly the same
question. :D Everyone agrees it is a minimum requirement to verify the
index was "in bounds". And I agree it's prudent to verify the device
is "well behaved" where we can. I haven't looked at the code enough to
know what could go wrong if, for example, the tail index is
decremented instead of incremented or a "next fragment" index falls in
the "available to fill" range.

However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS
security team's word that this check is needed. If you (or Dmitrii)
feel strongly the driver can handle malicious or firmware bugs in
other ways, I'm not offended if you decline this patch. However, I
would be curious what those other mechanisms are.

> From what I see in logic, only the size limiting check is enough there..
>
> Other extra checks are tricky and non intuitive..

Yes, somewhat tricky in the code but conceptually simple: For the RX
buffer ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is
"available to fill". New tail values should always be in the latter
range.

The trickiness comes in because this is a ring buffer and [tail, head)
it is equally likely that head =< tail  or head > tail numerically.

If you like, feel free to add comments explaining the ring behavior or
ask me to add such a comment (and repost #5). I'm a big fan of
documenting non-intuitive things in the code. That way the next person
to look at the code can verify the code and the IO device do what the
comment claims.

On the RX buffer ring, I'm also wondering if there is a race condition
such that the driver uses stale values of the tail pointer when
walking the RX fragment lists and validating index values. Aashay
assures me this race condition is not possible and I am convinced this
is true for the TX buffer ring where the driver is the "producer"
(tells the device what is in the TX ring). I still have to review the
RX buffer handling code more and will continue the conversation with
him until we agree.

cheers,
grant

>
> Regards,
>   Igor
>
> On 4/21/2022 9:53 PM, Grant Grundler wrote:
> > External Email
> >
> > ----------------------------------------------------------------------
> > Igor,
> > Will you have a chance to comment on this in the near future?
> > Should someone else review/integrate these patches?
> >
> > I'm asking since I've seen no comments in the past three days.
> >
> > cheers,
> > grant
> >
> >
> > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler <grundler@...omium.org>
> > wrote:
> >>
> >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic driver
> >> in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
> >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOpOlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jqeSvQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
> >>
> >> It essentially describes four problems:
> >> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> >> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> >> 3) limit iterations handling fragments in aq_ring_rx_clean()
> >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
> >>
> >> I've added one "clean up" contribution:
> >>     "net: atlantic: reduce scope of is_rsc_complete"
> >>
> >> I tested the "original" patches using chromeos-v5.4 kernel branch:
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Dreview.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28status-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqrY-be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
> >>
> >> The fuzzing team will retest using the chromeos-v5.4 patches and the b0
> >> HW.
> >>
> >> I've forward ported those patches to 5.18-rc2 and compiled them but am
> >> currently unable to test them on 5.18-rc2 kernel (logistics problems).
> >>
> >> I'm confident in all but the last patch:
> >>    "net: atlantic: verify hw_head_ is reasonable"
> >>
> >> Please verify I'm not confusing how ring->sw_head and ring->sw_tail
> >> are used in hw_atl_b0_hw_ring_tx_head_update().
> >>
> >> Credit largely goes to Chrome OS Fuzzing team members:
> >>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
> >>
> >> cheers,
> >> grant