lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b8422ee0-974c-43d8-9c1a-3e5e715fbd7d@amd.com>
Date: Tue, 3 Dec 2024 10:31:48 +0530
From: Bharata B Rao <bharata@....com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, nikunj@....com,
 willy@...radead.org, vbabka@...e.cz, david@...hat.com,
 akpm@...ux-foundation.org, yuzhao@...gle.com, axboe@...nel.dk,
 viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
 joshdon@...gle.com, clm@...a.com
Subject: Re: [RFC PATCH 0/1] Large folios in block buffered IO path

On 02-Dec-24 3:38 PM, Mateusz Guzik wrote:
> On Mon, Dec 2, 2024 at 10:37 AM Bharata B Rao <bharata@....com> wrote:
>>
>> On 28-Nov-24 10:01 AM, Mateusz Guzik wrote:
>>
>>> WIlly mentioned the folio wait queue hash table could be grown, you
>>> can find it in mm/filemap.c:
>>>     1062 #define PAGE_WAIT_TABLE_BITS 8
>>>     1063 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
>>>     1064 static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE]
>>> __cacheline_aligned;
>>>     1065
>>>     1066 static wait_queue_head_t *folio_waitqueue(struct folio *folio)
>>>     1067 {
>>>     1068 │       return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
>>>     1069 }
>>>
>>> Can you collect off cpu time? offcputime-bpfcc -K > /tmp/out
>>
>> Flamegraph for "perf record --off-cpu -F 99 -a -g --all-kernel
>> --kernel-callchains -- sleep 120" is attached.
>>
>> Off-cpu samples were collected for 120s at around 45th minute run of the
>> FIO benchmark that actually runs for 1hr. This run was with kernel that
>> had your inode_lock fix but no changes to PAGE_WAIT_TABLE_BITS.
>>
>> Hopefully this captures the representative sample of the scalability
>> issue with folio lock.

Here is the data from offcputime-bpfcc -K run with inode_lock fix and no 
change to PAGE_WAIT_TABLE_BITS. This data was captured for the entire 
duration of FIO run (1hr). Since the data is huge, I am pasting a few 
relevant entries.

The first entry in the offcputime records

     finish_task_switch.isra.0
     schedule
     irqentry_exit_to_user_mode
     irqentry_exit
     sysvec_reschedule_ipi
     asm_sysvec_reschedule_ipi
     -                fio (33790)
         2

There are thousands of entries for read and write paths of FIO and I 
have shown only the first and last entries for the same here.

First entry for FIO read path that waits on folio_lock

     finish_task_switch.isra.0
     schedule
     io_schedule
     folio_wait_bit_common
     filemap_get_pages
     filemap_read
     blkdev_read_iter
     vfs_read
     ksys_read
     __x64_sys_read
     x64_sys_call
     do_syscall_64
     entry_SYSCALL_64_after_hwframe
     -                fio (34143)
         3381769535

Last entry for FIO read path that waits on folio_lock

     finish_task_switch.isra.0
     schedule
     io_schedule
     folio_wait_bit_common
     filemap_get_pages
     filemap_read
     blkdev_read_iter
     vfs_read
     ksys_read
     __x64_sys_read
     x64_sys_call
     do_syscall_64
     entry_SYSCALL_64_after_hwframe
     -                fio (34171)
         3516224519

First entry for FIO write path that waits on folio_lock

     finish_task_switch.isra.0
     schedule
     io_schedule
     folio_wait_bit_common
     __filemap_get_folio
     iomap_get_folio
     iomap_write_begin
     iomap_file_buffered_write
     blkdev_write_iter
     vfs_write
     ksys_write
     __x64_sys_write
     x64_sys_call
     do_syscall_64
     entry_SYSCALL_64_after_hwframe
     -                fio (33842)
         48900

Last entry for FIO write path that waits on folio_lock

     finish_task_switch.isra.0
     schedule
     io_schedule
     folio_wait_bit_common
     __filemap_get_folio
     iomap_get_folio
     iomap_write_begin
     iomap_file_buffered_write
     blkdev_write_iter
     vfs_write
     ksys_write
     __x64_sys_write
     x64_sys_call
     do_syscall_64
     entry_SYSCALL_64_after_hwframe
     -                fio (34187)
         1815993

The last entry in the offcputime records

     finish_task_switch.isra.0
     schedule
     futex_wait_queue
     __futex_wait
     futex_wait
     do_futex
     __x64_sys_futex
     x64_sys_call
     do_syscall_64
     entry_SYSCALL_64_after_hwframe
     -                multipathd (6308)
         3698877753

Regards,
Bharata.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ