[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c85df85-58f7-4e44-8201-2f0562f93439@linux.ibm.com>
Date: Fri, 9 Jan 2026 19:53:00 +0530
From: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Christoph Hellwig <hch@...radead.org>, linux-block@...r.kernel.org,
linux-scsi@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
James.Bottomley@...senpartnership.com, leonro@...dia.com,
kch@...dia.com, LKML <linux-kernel@...r.kernel.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>, riteshh@...ux.ibm.com,
ojaswin@...ux.ibm.com
Subject: Re: [next-20260108]kernel BUG at drivers/scsi/scsi_lib.c:1173!
On 09/01/26 7:35 pm, Ming Lei wrote:
> On Fri, Jan 09, 2026 at 07:26:01PM +0530, Venkat Rao Bagalkote wrote:
>> On 09/01/26 6:28 pm, Ming Lei wrote:
>>> On Fri, Jan 09, 2026 at 05:51:15PM +0530, Venkat Rao Bagalkote wrote:
>>>> On 09/01/26 5:25 pm, Ming Lei wrote:
>>>>> On Fri, Jan 09, 2026 at 05:14:36PM +0530, Venkat Rao Bagalkote wrote:
>>>>>> On 09/01/26 12:19 pm, Ming Lei wrote:
>>>>>>> On Thu, Jan 08, 2026 at 09:56:39PM -0800, Christoph Hellwig wrote:
>>>>>>>> I've seen the same when running xfstests on xfs, and bisected it to:
>>>>>>>>
>>>>>>>> commit ee623c892aa59003fca173de0041abc2ccc2c72d
>>>>>>>> Author: Ming Lei <ming.lei@...hat.com>
>>>>>>>> Date: Wed Dec 31 11:00:55 2025 +0800
>>>>>>>>
>>>>>>>> block: use bvec iterator helper for bio_may_need_split()
>>>>>>>>
>>>>>>> Hi Christoph and Venkat Rao Bagalkote,
>>>>>>>
>>>>>>> Unfortunately I can't duplicate the issue in my environment, can you test
>>>>>>> the following patch?
>>>>>>>
>>>>>>> diff --git a/block/blk.h b/block/blk.h
>>>>>>> index 98f4dfd4ec75..980eef1f5690 100644
>>>>>>> --- a/block/blk.h
>>>>>>> +++ b/block/blk.h
>>>>>>> @@ -380,7 +380,7 @@ static inline bool bio_may_need_split(struct bio *bio,
>>>>>>> return true;
>>>>>>> bv = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>>>>>>> - if (bio->bi_iter.bi_size > bv->bv_len)
>>>>>>> + if (bio->bi_iter.bi_size > bv->bv_len - bio->bi_iter.bi_bvec_done)
>>>>>>> return true;
>>>>>>> return bv->bv_len + bv->bv_offset > lim->max_fast_segment_size;
>>>>>>> }
>>>>>> Hello Ming,
>>>>>>
>>>>>>
>>>>>> This is not helping. I am hitting this issue, during kernel build itself.
>>>>> Can you confirm if it can fix the blktests ext4/056 first?
>>>>>
>>>>> If kernel building is running over new patched kernel, please provide the
>>>>> dmesg log. And if it is reproduciable, can you confirm if it can be fixed
>>>>> by reverting ee623c892aa59003 (block: use bvec iterator helper for bio_may_need_split())?
>>>> Unfortunately, even with revert, build fails.
>>>>
>>>>
>>>>
>>>> commit c64b2ee9cddcb31546c8622ef018d344544a9388 (HEAD)
>>>> Author: Super User <root@...-zzci-1.ltc.tadn.ibm.com>
>>>> Date: Fri Jan 9 06:51:19 2026 -0600
>>>>
>>>> Revert "block: use bvec iterator helper for bio_may_need_split()"
>>>>
>>>> This reverts commit ee623c892aa59003fca173de0041abc2ccc2c72d.
>>> OK, then your issue isn't related with the above change.
>>>
>>> Can you reproduce & collect dmesg log with the bad sg/rq/bio/bvec info by
>>> applying the attached debug patch?
>>>
>>> Also if possible, please collect your scsi queue's limit info before
>>> reproducing the issue:
>>>
>>> (cd /sys/block/$SD/queue && find . -type f -exec grep -aH . {} \;)
>> Hello Ming,
>>
>> After applying the patch shared via attachment also, I see build failure.
>>
>> I have attached the kernel config file.
>>
>>
>> git diff
>> diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
>> index 752060d7261c..33c1b6a0a738 100644
>> --- a/block/blk-mq-dma.c
>> +++ b/block/blk-mq-dma.c
>> @@ -4,8 +4,75 @@
>> */
>> #include <linux/blk-integrity.h>
>> #include <linux/blk-mq-dma.h>
>> +#include <linux/scatterlist.h>
>> #include "blk.h"
> Hi Venkat,
>
> Thanks for your test.
>
> But you didn't apply the whole debug patch in the following link:
>
> https://lore.kernel.org/linux-block/aWD7j3NR_m6EyZv1@fedora/
>
> otherwise something like "=== __blk_rq_map_sg DEBUG DUMP ===" will be
> dumped in dmesg log.
>
>> make -j 48 -s && make modules_install && make install
>> [ 5625.770436] ------------[ cut here ]------------
>> [ 5625.770476] WARNING: block/blk-mq-dma.c:309 at
> If the whole debug patch is applied correctly, the above line number should
> have become 378 instead of original 309.
>
> Please re-apply the debug patch & reproduce again.
>
Hello Ming,
Apologies for back and forth. But I did apply the whole patch. Below is
the git diff from my machine. Let me know, if I am missing anything.
git diff
diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index 752060d7261c..33c1b6a0a738 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -4,8 +4,75 @@
*/
#include <linux/blk-integrity.h>
#include <linux/blk-mq-dma.h>
+#include <linux/scatterlist.h>
#include "blk.h"
+static void dump_rq_mapping_debug(struct request *rq, struct
scatterlist *sglist,
+ int nsegs)
+{
+ struct scatterlist *sg;
+ struct bio *bio;
+ struct bvec_iter iter;
+ struct bio_vec bv;
+ int i;
+
+ pr_err("=== __blk_rq_map_sg DEBUG DUMP ===\n");
+ pr_err("DISK: %s\n", rq->q->disk ? rq->q->disk->disk_name :
"(null)");
+
+ /* Dump nsegs vs expected */
+ pr_err("nsegs=%d nr_phys_segments=%u\n",
+ nsegs, blk_rq_nr_phys_segments(rq));
+
+ /* Dump request info */
+ pr_err("REQUEST: __data_len=%u __sector=%llu cmd_flags=0x%x "
+ "rq_flags=0x%x nr_phys_segments=%u phys_gap_bit=%u\n",
+ rq->__data_len, (unsigned long long)rq->__sector,
+ rq->cmd_flags, (__force unsigned int)rq->rq_flags,
+ rq->nr_phys_segments, rq->phys_gap_bit);
+
+ /* Dump each SG element */
+ pr_err("--- SG LIST (%d entries) ---\n", nsegs);
+ for_each_sg(sglist, sg, nsegs, i) {
+ pr_err(" sg[%d]: pfn=0x%lx offset=%u len=%u
dma_addr=0x%llx\n",
+ i, page_to_pfn(sg_page(sg)), sg->offset, sg->length,
+ (unsigned long long)sg_dma_address(sg));
+ }
+
+ /* Dump each bio */
+ pr_err("--- BIO LIST ---\n");
+ for (bio = rq->bio; bio; bio = bio->bi_next) {
+ pr_err(" BIO %p: bi_iter={sector=%llu size=%u idx=%u
bvec_done=%u} "
+ "bi_flags=0x%x bi_opf=0x%x bi_vcnt=%u
bi_bvec_gap_bit=%u\n",
+ bio,
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bio->bi_iter.bi_size, bio->bi_iter.bi_idx,
+ bio->bi_iter.bi_bvec_done,
+ bio->bi_flags, bio->bi_opf, bio->bi_vcnt,
+ bio->bi_bvec_gap_bit);
+
+ /* Dump each bvec in this bio */
+ pr_err(" --- BVECS (bi_vcnt=%u) ---\n", bio->bi_vcnt);
+ for (i = 0; i < bio->bi_vcnt; i++) {
+ struct bio_vec *bvp = &bio->bi_io_vec[i];
+
+ pr_err(" bvec[%d]: pfn=0x%lx len=%u
offset=%u\n",
+ i, page_to_pfn(bvp->bv_page), bvp->bv_len,
+ bvp->bv_offset);
+ }
+
+ /* Also dump effective bvecs via iterator */
+ pr_err(" --- EFFECTIVE BVECS (via iter) ---\n");
+ i = 0;
+ bio_for_each_bvec(bv, bio, iter) {
+ pr_err(" eff_bvec[%d]: pfn=0x%lx len=%u
offset=%u\n",
+ i++, page_to_pfn(bv.bv_page), bv.bv_len,
+ bv.bv_offset);
+ }
+ }
+
+ pr_err("=== END DEBUG DUMP ===\n");
+}
+
static bool __blk_map_iter_next(struct blk_map_iter *iter)
{
if (iter->iter.bi_size)
@@ -306,6 +373,8 @@ int __blk_rq_map_sg(struct request *rq, struct
scatterlist *sglist,
* Something must have been wrong if the figured number of
* segment is bigger than number of req's physical segments
*/
+ if (nsegs > blk_rq_nr_phys_segments(rq))
+ dump_rq_mapping_debug(rq, sglist, nsegs);
WARN_ON(nsegs > blk_rq_nr_phys_segments(rq));
return nsegs;
diff --git a/block/blk.h b/block/blk.h
index 98f4dfd4ec75..980eef1f5690 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -380,7 +380,7 @@ static inline bool bio_may_need_split(struct bio *bio,
return true;
bv = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
- if (bio->bi_iter.bi_size > bv->bv_len)
+ if (bio->bi_iter.bi_size > bv->bv_len - bio->bi_iter.bi_bvec_done)
return true;
return bv->bv_len + bv->bv_offset > lim->max_fast_segment_size;
}
(END)
Regards,
Venkat.
> Thanks,
> Ming
>
>
Powered by blists - more mailing lists