lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJv8VRPwQBAE=5-oKHGMs9JVCvCiCBwL+3QW9sJDxo5cQ@mail.gmail.com>
Date:   Mon, 18 Sep 2023 09:41:15 +0200
From:   Eric Dumazet <edumazet@...gle.com>
To:     Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
Cc:     nic_swsd@...ltek.com, Heiner Kallweit <hkallweit1@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: BUG: KCSAN: data-race in rtl8169_poll

On Mon, Sep 18, 2023 at 8:15 AM Mirsad Todorovac
<mirsad.todorovac@....unizg.hr> wrote:
>
> Hi all,
>
> In the vanilla torvalds tree kernel on Ubuntu 22.04, commit 6.6.0-rc1-kcsan-00269-ge789286468a9,
> KCSAN discovered a data-race in rtl8169_poll():
>
> [ 9591.740976] ==================================================================
> [ 9591.740990] BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
>
> [ 9591.741060] race at unknown origin, with read to 0xffff888109773130 of 4 bytes by interrupt on cpu 21:
> [ 9591.741073] rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
> [ 9591.741135] __napi_poll (net/core/dev.c:6527)
> [ 9591.741149] net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
> [ 9591.741161] __do_softirq (kernel/softirq.c:553)
> [ 9591.741175] __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
> [ 9591.741185] irq_exit_rcu (kernel/softirq.c:647)
> [ 9591.741194] common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
> [ 9591.741206] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
> [ 9591.741217] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
> [ 9591.741227] cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
> [ 9591.741237] call_cpuidle (kernel/sched/idle.c:135)
> [ 9591.741249] do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
> [ 9591.741259] cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
> [ 9591.741268] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
> [ 9591.741281] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
>
> [ 9591.741300] value changed: 0x80003fff -> 0x34044510
>
> [ 9591.741314] Reported by Kernel Concurrency Sanitizer on:
> [ 9591.741322] CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc1-kcsan-00269-ge789286468a9-dirty #4
> [ 9591.741334] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> [ 9591.741343] ==================================================================
>
> (The taint is not from the proprietary module, but triggered from the previous reported and unfixed bug.)
>
> Apparently, it is this code:
>
> static int rtl8169_poll(struct napi_struct *napi, int budget)
> {
>         struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
>         struct net_device *dev = tp->dev;
>         int work_done;
>
>         rtl_tx(dev, tp, budget);
>
> →       work_done = rtl_rx(dev, tp, budget);
>
>         if (work_done < budget && napi_complete_done(napi, work_done))
>                 rtl_irq_enable(tp);
>
>         return work_done;
> }
>
> and
>
> static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget)
> {
>         struct device *d = tp_to_dev(tp);
>         int count;
>
>         for (count = 0; count < budget; count++, tp->cur_rx++) {
>                 unsigned int pkt_size, entry = tp->cur_rx % NUM_RX_DESC;
>                 struct RxDesc *desc = tp->RxDescArray + entry;
>                 struct sk_buff *skb;
>                 const void *rx_buf;
>                 dma_addr_t addr;
>                 u32 status;
>
> →               status = le32_to_cpu(desc->opts1);
>                 if (status & DescOwn)
>                         break;
>
>                 /* This barrier is needed to keep us from reading
>                  * any other fields out of the Rx descriptor until
>                  * we know the status of DescOwn
>                  */
>                 dma_rmb();
>
>                 if (unlikely(status & RxRES)) {
> .
> .
> .
>
> The reason isn't obvious, so it might be interesting if this is a valid report and whether it caused spurious corruption
> of the network data on Realtek 8169 compatible cards ...
>

I think this is pretty much expected.

Driver reads a piece of memory that the hardware can modify.

Adding data_race() annotations could avoid these false positives.

> Hope this helps.
>
> Best regards,
> Mirsad Todorovac

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ