netdev - Re: BUG: KCSAN: data-race in rtl8169

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNNYGax0BfjA98ViGsM4rVrcaNx_SKdetgt+-SzFqB-7zg@mail.gmail.com>
Date: Mon, 18 Sep 2023 13:44:02 +0200
From: Marco Elver <elver@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Mirsad Todorovac <mirsad.todorovac@....unizg.hr>, nic_swsd@...ltek.com, 
	Heiner Kallweit <hkallweit1@...il.com>, "David S. Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: BUG: KCSAN: data-race in rtl8169_poll

On Mon, 18 Sept 2023 at 12:39, Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Mon, Sep 18, 2023 at 11:43 AM Mirsad Todorovac
> <mirsad.todorovac@....unizg.hr> wrote:
> >
> > On 9/18/23 09:41, Eric Dumazet wrote:
> > > On Mon, Sep 18, 2023 at 8:15 AM Mirsad Todorovac
> > > <mirsad.todorovac@....unizg.hr> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> In the vanilla torvalds tree kernel on Ubuntu 22.04, commit 6.6.0-rc1-kcsan-00269-ge789286468a9,
> > >> KCSAN discovered a data-race in rtl8169_poll():
> > >>
> > >> [ 9591.740976] ==================================================================
> > >> [ 9591.740990] BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
> > >>
> > >> [ 9591.741060] race at unknown origin, with read to 0xffff888109773130 of 4 bytes by interrupt on cpu 21:
> > >> [ 9591.741073] rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
> > >> [ 9591.741135] __napi_poll (net/core/dev.c:6527)
> > >> [ 9591.741149] net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
> > >> [ 9591.741161] __do_softirq (kernel/softirq.c:553)
> > >> [ 9591.741175] __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
> > >> [ 9591.741185] irq_exit_rcu (kernel/softirq.c:647)
> > >> [ 9591.741194] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
> > >> [ 9591.741206] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
> > >> [ 9591.741217] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
> > >> [ 9591.741227] cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
> > >> [ 9591.741237] call_cpuidle (kernel/sched/idle.c:135)
> > >> [ 9591.741249] do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
> > >> [ 9591.741259] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
> > >> [ 9591.741268] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
> > >> [ 9591.741281] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
> > >>
> > >> [ 9591.741300] value changed: 0x80003fff -> 0x34044510
> > >>
> > >> [ 9591.741314] Reported by Kernel Concurrency Sanitizer on:
> > >> [ 9591.741322] CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc1-kcsan-00269-ge789286468a9-dirty #4
> > >> [ 9591.741334] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> > >> [ 9591.741343] ==================================================================
> > >>
> > >> (The taint is not from the proprietary module, but triggered from the previous reported and unfixed bug.)
> > >>
> > >> Apparently, it is this code:
> > >>
> > >> static int rtl8169_poll(struct napi_struct *napi, int budget)
> > >> {
> > >>          struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
> > >>          struct net_device *dev = tp->dev;
> > >>          int work_done;
> > >>
> > >>          rtl_tx(dev, tp, budget);
> > >>
> > >> →       work_done = rtl_rx(dev, tp, budget);
> > >>
> > >>          if (work_done < budget && napi_complete_done(napi, work_done))
> > >>                  rtl_irq_enable(tp);
> > >>
> > >>          return work_done;
> > >> }
> > >>
> > >> and
> > >>
> > >> static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget)
> > >> {
> > >>          struct device *d = tp_to_dev(tp);
> > >>          int count;
> > >>
> > >>          for (count = 0; count < budget; count++, tp->cur_rx++) {
> > >>                  unsigned int pkt_size, entry = tp->cur_rx % NUM_RX_DESC;
> > >>                  struct RxDesc *desc = tp->RxDescArray + entry;
> > >>                  struct sk_buff *skb;
> > >>                  const void *rx_buf;
> > >>                  dma_addr_t addr;
> > >>                  u32 status;
> > >>
> > >> →               status = le32_to_cpu(desc->opts1);
> > >>                  if (status & DescOwn)
> > >>                          break;
> > >>
> > >>                  /* This barrier is needed to keep us from reading
> > >>                   * any other fields out of the Rx descriptor until
> > >>                   * we know the status of DescOwn
> > >>                   */
> > >>                  dma_rmb();
> > >>
> > >>                  if (unlikely(status & RxRES)) {
> > >> .
> > >> .
> > >> .
> > >>
> > >> The reason isn't obvious, so it might be interesting if this is a valid report and whether it caused spurious corruption
> > >> of the network data on Realtek 8169 compatible cards ...
> > >>
> > >
> > > I think this is pretty much expected.
> > >
> > > Driver reads a piece of memory that the hardware can modify.
> > >
> > > Adding data_race() annotations could avoid these false positives.
> > >
> > >> Hope this helps.
> > >>
> > >> Best regards,
> > >> Mirsad Todorovac
> >
> > Well, another approach was this quick fix that eliminated all those rtl8169_poll() KCSAN warnings.
> >
> > If READ_ONCE(desc->opts1) fixed it, then maybe there is more to this than meets the eye?
> >
> > -------------------------------------------------
> >   drivers/net/ethernet/realtek/r8169_main.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 6351a2dc13bc..051551ee2a15 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -4427,7 +4427,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
> >                  dma_addr_t addr;
> >                  u32 status;
> >
> > -               status = le32_to_cpu(desc->opts1);
> > +               status = le32_to_cpu(READ_ONCE(desc->opts1));
> >                  if (status & DescOwn)
> >                          break;
> >
>
> This is also working, but in this case we already have barriers (
> dma_rmb() here)
> to synchronize host side and hardware (when flipping DescOwn) bit.

READ_ONCE() does not imply any (strong) barriers (it does imply
address-dependency barriers, i.e. ordering dependent reads/writes, but
if that can be relied upon if the concurrent writer is a device and
not CPU I don't know).

So in this case pairing READ_ONCE() with dma_rmb() is perfectly
reasonable: writes to desc->opts1 can happen concurrently, and the
READ_ONCE() ensures the compiler doesn't mess up that access; later
reads must be ordered by dma_rmb().

The data race here is a consequence of a concurrent write with the
read of desc->opts1. The dma_rmb() does not prevent that from
happening, and therefore we still have to mark the racing access.