linux-kernel - 3.0.101: "blk_rq_check_limits: over max size limit."

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <57BC81ED020000A100022558@gwsmtp1.uni-regensburg.de>
Date:   Tue, 23 Aug 2016 17:03:41 +0200
From:   "Ulrich Windl" <Ulrich.Windl@...uni-regensburg.de>
To:     <linux-kernel@...r.kernel.org>
Cc:     <axboe@...nel.dk>,
        "Ulrich Windl" <Ulrich.Windl@...uni-regensburg.de>
Subject: 3.0.101: "blk_rq_check_limits: over max size limit."

Hello!

While performance-testing a 3PARdata StorServ 8400 with SLES11SP4, I noticed that I/Os dropped, until everything stood still more or less. Looking into the syslog I found that multipath's TUR-checker considered the paths (FC, BTW) as dead. Amazingly I did not have this problem when I did read-only tests.

The start looks like this:
Aug 23 14:44:58 h10 multipathd: 8:32: mark as failed
Aug 23 14:44:58 h10 multipathd: FirstTest-32: remaining active paths: 3
Aug 23 14:44:58 h10 kernel: [  880.159425] blk_rq_check_limits: over max size limit.
Aug 23 14:44:58 h10 kernel: [  880.159611] blk_rq_check_limits: over max size limit.
Aug 23 14:44:58 h10 kernel: [  880.159615] blk_rq_check_limits: over max size limit.
Aug 23 14:44:58 h10 kernel: [  880.159623] device-mapper: multipath: Failing path 8:32.
Aug 23 14:44:58 h10 kernel: [  880.186609] blk_rq_check_limits: over max size limit.
Aug 23 14:44:58 h10 kernel: [  880.186626] blk_rq_check_limits: over max size limit.
Aug 23 14:44:58 h10 kernel: [  880.186628] blk_rq_check_limits: over max size limit.
Aug 23 14:44:58 h10 kernel: [  880.186631] device-mapper: multipath: Failing path 129:112.
[...]
It seems the TUR-checker does some ping-pong-like game: Paths go up and down

Now for the Linux part: I found the relevant message in blk-core.c (blk_rq_check_limits()).
First s/agaist/against/ in " *    Such request stacking drivers should check those requests agaist", the there's the problem that the message neither outputs the blk_rq_sectors(), nor the blk_queue_get_max_sectors(), nor the underlying device. That makes debugging somewhat difficult if you customize the block queue settings per device as I did:

Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/rotational for FirstTest-31 (0)
Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/add_random for FirstTest-31 (0)
Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/scheduler for FirstTest-31 (noop)
Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/max_sectors_kb for FirstTest-31 (128)
Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/rotational for FirstTest-32 (0)
Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/add_random for FirstTest-32 (0)
Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of queue/scheduler for FirstTest-32 (noop)
Aug 23 14:32:34 h10 blocktune: (notice) start: activated tuning of queue/max_sectors_kb for FirstTest-32 (128)

I suspect the "queue/max_sectors_kb=128" is the culprit:
# multipath -ll FirstTest-32
FirstTest-32 (360002ac000000000000000040001b383) dm-7 3PARdata,VV
size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 2:0:0:1  sdet 129:80  failed ready running
  |- 2:0:2:1  sdev 129:112 failed ready running
  |- 1:0:0:1  sdb  8:16    failed ready running
  `- 1:0:1:1  sdc  8:32    failed ready running
# cat /sys/block/{dm-7,sd{b,c},sde{t,v}}/queue/max_sectors_kb
128
128
128
128
128

While writing this message, I noticed that I had created a primary partition of dm-7:
# dmsetup ls |grep Fi
FirstTest-32_part1      (253:8)
FirstTest-32    (253:7)
# cat /sys/block/dm-8/queue/max_sectors_kb
1024

After "# echo 128 >/sys/block/dm-8/queue/max_sectors_kb" things still did not get better.

Can't blk_rq_check_limits() do anything more clever than returning -EIO?

Regards,
Ulrich
P.S: Keep me in CC:, please!