lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <999DF2B3-4EE8-4BDF-89C5-EB0C2D8BF69E@linaro.org>
Date:   Mon, 7 May 2018 16:03:34 +0200
From:   Paolo Valente <paolo.valente@...aro.org>
To:     Mike Galbraith <efault@....de>, Jens Axboe <axboe@...nel.dk>,
        Christoph Hellwig <hch@....de>
Cc:     linux-block <linux-block@...r.kernel.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>
Subject: bug in tag handling in blk-mq?

Hi Jens, Christoph, all,
Mike Galbraith has been experiencing hangs, on blk_mq_get_tag, only
with bfq [1].  Symptoms seem to clearly point to a problem in I/O-tag
handling, triggered by bfq because it limits the number of tags for
async and sync write requests (in bfq_limit_depth).

Fortunately, I just happened to find a way to apparently confirm it.
With the following one-liner for block/bfq-iosched.c:

@@ -554,8 +554,7 @@ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
        if (unlikely(bfqd->sb_shift != bt->sb.shift))
                bfq_update_depths(bfqd, bt);
 
-       data->shallow_depth =
-               bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
+       data->shallow_depth = 1;
 
        bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
                        __func__, bfqd->wr_busy_queues, op_is_sync(op),

Mike's machine now crashes soon and systematically, while nothing bad
happens on my machines, even with heavy workloads (apart from an
expected throughput drop).

This change simply reduces to 1 the maximum possible value for the sum
of the number of async requests and of sync write requests.

This email is basically a request for help to knowledgeable people.  To
start, here are my first doubts/questions:
1) Just to be certain, I guess it is not normal that blk-mq hangs if
async requests and sync write requests can be at most one, right?
2) Do you have any hint to where I could look for, to chase this bug?
Of course, the bug may be in bfq, i.e, it may be a somehow unrelated
bfq bug that causes this hang in blk-mq, indirectly.  But it is hard
for me to understand how.

Looking forward to some help.

Thanks,
Paolo

[1] https://www.spinics.net/lists/stable/msg215036.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ