lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f60813c-c52b-5c08-27c7-490b7d28c598@alu.unizg.hr>
Date:   Wed, 30 Aug 2023 13:43:50 +0200
From:   Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        linux-nvme@...ts.infradead.org
Subject: Re: BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

Hi, Mr. Matthew,

On 8/29/23 21:13, Matthew Wilcox wrote:
> On Mon, Aug 28, 2023 at 11:14:23PM +0200, Mirsad Todorovac wrote:
>> In the vanilla torvalds tree 6.5 kernel on the Ubuntu 22.04 system, KCSAN found another data race:
> 
> KCSAN is wrong.

Thank you for evaluating this bug report to such a detail.

Well, I ain't giving up on KCSAN anyway, because it found some real life data races.

To express it more graphically, it is very unpleasant when the other core changes the data
from underneath you or it magically and unexpectedly changes in the course of some work ...

:-(

>> [   34.102069] write (marked) to 0xffffef9a44978bc0 of 8 bytes by interrupt on cpu 28:
>> [   34.108569] mpage_read_end_io (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/bitops.h:55 /home/marvin/linux/kernel/linux_torvalds/./include/asm-generic/bitops/instrumented-atomic.h:29 /home/marvin/linux/kernel/linux_torvalds/./include/linux/page-flags.h:739 /home/marvin/linux/kernel/linux_torvalds/fs/mpage.c:55)
> 
>          bio_for_each_folio_all(fi, bio) {
>                  if (err)
>                          folio_set_error(fi.folio);
>                  else
>                          folio_mark_uptodate(fi.folio);
>                  folio_unlock(fi.folio);
>          }

> It's noting the write to folio->flags in folio_mark_uptodate().  You can
> see it's locked.  Also, the folio is under I/O.

Yes, from folio_unlock(fi.folio), it appears that somewhere it was locked. But finding
where it was locked is beyond my understanding ATM.

I see folio_put() in other places, but it seems to increase refcount only, I did not where
it is locked, but this is probably just me ...

>> [   34.115221] read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>> [   34.121702] folio_batch_move_lru (/home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1814 /home/marvin/linux/kernel/linux_torvalds/./include/linux/mm.h:1824 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1636 /home/marvin/linux/kernel/linux_torvalds/./include/linux/memcontrol.h:1659 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:216)
> 
> Here, it's noting the read to folio->flags that's part of page_to_nid().
> 
>> [   34.121713] folio_batch_add_and_move (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:235)
>> [   34.121724] folio_add_lru (/home/marvin/linux/kernel/linux_torvalds/./arch/x86/include/asm/preempt.h:95 /home/marvin/linux/kernel/linux_torvalds/mm/swap.c:518)
>> [   34.121735] folio_add_lru_vma (/home/marvin/linux/kernel/linux_torvalds/mm/swap.c:538)
>> [   34.121746] do_anonymous_page (/home/marvin/linux/kernel/linux_torvalds/mm/memory.c:4146)
> 
> Here we can see the page is freshly allocated.
> 
> So KCSAN has three things wrong here.  One is that the write to
> folio_mark_uptodate() is setting a bit, that is nowhere near the bits
> that are used for the node ID.  It can't know that; it doesn't track
> writes at that granularity.
> 
> The second thing is that the node bits in folio->flags are immutable.
> They're set at boot (or memory hotplug).  There is never a race risk when
> reading them.  Presumably there needs to be some kind of annotation to
> tell KCSAN that this is always safe.
> 
> The third thing is that these two accesses cannot race.  The write is
> to a folio which is under I/O, so cannot be freed.  The read is to a
> folio which has just been allocated, so cannot be under I/O.  This is
> some kind of failure of KCSAN.

Based on your insight, I will assume that the bug report is resolved.

Thank you again for your time.

Best regards,
Mirsad Todorovac

-- 
Mirsad Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ