linux-kernel - Re: [PATCH stable] block/mq-deadline: fix different priority request on the same zone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240517014456.1919588-1-bo.wu@vivo.com>
Date: Thu, 16 May 2024 19:44:56 -0600
From: Wu Bo <bo.wu@...o.com>
To: bvanassche@....org
Cc: axboe@...nel.dk,
	bo.wu@...o.com,
	dlemoal@...nel.org,
	linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	stable@...r.kernel.org,
	wubo.oduw@...il.com
Subject: Re: [PATCH stable] block/mq-deadline: fix different priority request on the same zone

On Thu, May 16, 2024 at 07:45:21AM -0600, Bart Van Assche wrote:
> On 5/16/24 03:28, Wu Bo wrote:
> > Zoned devices request sequential writing on the same zone. That means
> > if 2 requests on the saem zone, the lower pos request need to dispatch
> > to device first.
> > While different priority has it's own tree & list, request with high
> > priority will be disptch first.
> > So if requestA & requestB are on the same zone. RequestA is BE and pos
> > is X+0. ReqeustB is RT and pos is X+1. RequestB will be disptched before
> > requestA, which got an ERROR from zoned device.
> > 
> > This is found in a practice scenario when using F2FS on zoned device.
> > And it is very easy to reproduce:
> > 1. Use fsstress to run 8 test processes
> > 2. Use ionice to change 4/8 processes to RT priority
> 
> Hi Wu,
> 
> I agree that there is a problem related to the interaction of I/O
> priority and zoned storage. A solution with a lower runtime overhead
> is available here:
> https://lore.kernel.org/linux-block/20231218211342.2179689-1-bvanassche@acm.org/T/#me97b088c535278fe3d1dc5846b388ed58aa53f46
Hi Bart,

I have tried to set all seq write requests the same priority:

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 6a05dd86e8ca..b560846c63cb 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -841,7 +841,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx,
struct request *rq,
         */
	         blk_req_zone_write_unlock(rq);

		 -       prio = ioprio_class_to_prio[ioprio_class];
		 +       if (blk_rq_is_seq_zoned_write(rq))
		 +               prio = DD_BE_PRIO;
		 +       else
		 +               prio = ioprio_class_to_prio[ioprio_class];
		         per_prio = &dd->per_prio[prio];
			         if (!rq->elv.priv[0]) {
				                 per_prio->stats.inserted++;

I think this is the same effect as the patch you mentioned here. Unfortunatelly,
this fix causes another issue.
As all write requests are set to the same priority while read requests still
have different priotities. This makes f2fs prone to hung when under stress test:

[129412.105440][T1100129] vkhungtaskd: INFO: task "f2fs_ckpt-254:5":769 blocked for more than 193 seconds.
[129412.106629][T1100129] vkhungtaskd:       6.1.25-android14-11-maybe-dirty #1
[129412.107624][T1100129] vkhungtaskd: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129412.108873][T1100129] vkhungtaskd: task:f2fs_ckpt-254:5 state:D stack:10496 pid:769   ppid:2      flags:0x00000408
[129412.110194][T1100129] vkhungtaskd: Call trace:
[129412.110769][T1100129] vkhungtaskd:  __switch_to+0x174/0x338
[129412.111566][T1100129] vkhungtaskd:  __schedule+0x604/0x9e4
[129412.112275][T1100129] vkhungtaskd:  schedule+0x7c/0xe8
[129412.112938][T1100129] vkhungtaskd:  rwsem_down_write_slowpath+0x4cc/0xf98
[129412.113813][T1100129] vkhungtaskd:  down_write+0x38/0x40
[129412.114500][T1100129] vkhungtaskd:  __write_checkpoint_sync+0x8c/0x11c
[129412.115409][T1100129] vkhungtaskd:  __checkpoint_and_complete_reqs+0x54/0x1dc
[129412.116323][T1100129] vkhungtaskd:  issue_checkpoint_thread+0x8c/0xec
[129412.117148][T1100129] vkhungtaskd:  kthread+0x110/0x224
[129412.117826][T1100129] vkhungtaskd:  ret_from_fork+0x10/0x20
[129412.484027][T1700129] vkhungtaskd: task:f2fs_gc-254:55  state:D stack:10832 pid:771   ppid:2      flags:0x00000408
[129412.485337][T1700129] vkhungtaskd: Call trace:
[129412.485906][T1700129] vkhungtaskd:  __switch_to+0x174/0x338
[129412.486618][T1700129] vkhungtaskd:  __schedule+0x604/0x9e4
[129412.487327][T1700129] vkhungtaskd:  schedule+0x7c/0xe8
[129412.487985][T1700129] vkhungtaskd:  io_schedule+0x38/0xc4
[129412.488675][T1700129] vkhungtaskd:  folio_wait_bit_common+0x3d8/0x4f8
[129412.489496][T1700129] vkhungtaskd:  __folio_lock+0x1c/0x2c
[129412.490196][T1700129] vkhungtaskd:  __folio_lock_io+0x24/0x44
[129412.490936][T1700129] vkhungtaskd:  __filemap_get_folio+0x190/0x400
[129412.491736][T1700129] vkhungtaskd:  pagecache_get_page+0x1c/0x5c
[129412.492501][T1700129] vkhungtaskd:  f2fs_wait_on_block_writeback+0x60/0xf8
[129412.493376][T1700129] vkhungtaskd:  do_garbage_collect+0x1100/0x223c
[129412.494185][T1700129] vkhungtaskd:  f2fs_gc+0x284/0x778
[129412.494858][T1700129] vkhungtaskd:  gc_thread_func+0x304/0x838
[129412.495603][T1700129] vkhungtaskd:  kthread+0x110/0x224
[129412.496271][T1700129] vkhungtaskd:  ret_from_fork+0x10/0x20

I think because f2fs is a CoW filesystem. Some threads holding lock need much
reading & writing at the same time. Different reading & writing priority of this
thread makes this process very long. And other FS operations will be blocked.

So I figured this solution to fix this priority issue on zoned device. It sure
raises the overhead but can do fix it.

Thanks,
Wu Bo
> 
> Are you OK with that alternative solution?
> 
> Thanks,
> 
> Bart.