linux-kernel - Re: Bisected: Kernel 4.14 + has 3 times higher write IO latency than Kernel 4.4 with raid1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87h86vjhv0.fsf@notabene.neil.brown.name>
Date:   Tue, 06 Aug 2019 09:46:27 +1000
From:   NeilBrown <neilb@...e.com>
To:     Jinpu Wang <jinpu.wang@...ud.ionos.com>,
        linux-raid <linux-raid@...r.kernel.org>
Cc:     Alexandr Iarygin <alexandr.iarygin@...ud.ionos.com>,
        Guoqing Jiang <guoqing.jiang@...ud.ionos.com>,
        Paul Menzel <pmenzel@...gen.mpg.de>,
        linux-kernel@...r.kernel.org
Subject: Re: Bisected: Kernel 4.14 + has 3 times higher write IO latency than Kernel 4.4 with raid1

On Mon, Aug 05 2019, Jinpu Wang wrote:

> Hi Neil,
>
> For the md higher write IO latency problem, I bisected it to these commits:
>
> 4ad23a97 MD: use per-cpu counter for writes_pending
> 210f7cd percpu-refcount: support synchronous switch to atomic mode.
>
> Do you maybe have an idea? How can we fix it?

Hmmm.... not sure.

My guess is that the set_in_sync() call from md_check_recovery()
is taking a long time, and is being called too often.

Could you try two experiments please.

1/ set  /sys/block/md0/md/safe_mode_delay 
   to 20 or more.  It defaults to about 0.2.

2/ comment out the call the set_in_sync() in md_check_recovery().

Then run the least separately after each of these changes.

I the second one makes a difference, I'd like to know how often it gets
called - and why.  The test
	if ( ! (
		(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
		test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
		test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
		(mddev->external == 0 && mddev->safemode == 1) ||
		(mddev->safemode == 2
		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
		))
		return;

should normally return when doing lots of IO - I'd like to know
which condition causes it to not return.

Thanks,
NeilBrown

Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)