[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <FF69D584-EEF9-4B5A-BE30-24EEBF354780@gmail.com>
Date: Wed, 17 Sep 2025 18:04:42 +0800
From: 陈华昭(Lyican) <lyican53@...il.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>,
"seanjc@...gle.com" <seanjc@...gle.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
Xiubo Li <xiubli@...hat.com>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
"sboyd@...nel.org" <sboyd@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
"idryomov@...il.com" <idryomov@...il.com>,
"martin.petersen@...cle.com" <martin.petersen@...cle.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"mturquette@...libre.com" <mturquette@...libre.com>,
"linux-clk@...r.kernel.org" <linux-clk@...r.kernel.org>
Subject: Re: [RFC] Fix potential undefined behavior in __builtin_clz usage
with GCC 11.1.0
Hi Slava and Sean,
Thank you for the valuable feedback!
CEPH FORMAL PATCH:
=================
As requested by Slava, I've prepared a formal patch for the Ceph case.
The patch adds proper zero checking before __builtin_clz() to prevent
undefined behavior. Please find it attached as ceph_patch.patch.
PROOF-OF-CONCEPT TEST CASE:
==========================
I've also created a proof-of-concept test case that demonstrates the
problematic input values that could trigger this bug. The test identifies
specific input values where (x & 0x1FFFF) becomes zero after the increment
and condition check.
Key findings from the test:
- Inputs like 0x7FFFF, 0x9FFFF, 0xBFFFF, 0xDFFFF, 0xFFFFF can trigger the bug
- These correspond to x+1 values where (x+1 & 0x18000) == 0 and (x+1 & 0x1FFFF) == 0
The test can be integrated into Ceph's existing test framework or adapted
for KUnit testing as you suggested. Please find it as ceph_poc_test.c.
KVM CASE CLARIFICATION:
======================
Thank you Sean for the detailed explanation about the KVM case. You're
absolutely right that pages and test_dirty_ring_count are guaranteed to
be non-zero in practice. I'll remove this from my analysis and focus on
the genuine issues.
BITOPS WRAPPER DISCUSSION:
=========================
I appreciate you bringing Yuri into the discussion. The idea of using
existing fls()/fls64() functions or creating new fls8()/fls16() variants
sounds promising. Many __builtin_clz() calls in the kernel could indeed
benefit from these safer alternatives.
STATUS UPDATE:
=============
1. Ceph: Formal patch and test case ready for review
2. KVM: Confirmed not an issue in practice (thanks Sean)
3. SCSI: Still investigating the drivers/scsi/elx/libefc_sli/sli4.h case
4. Bitops: Awaiting input from Yuri on kernel-wide improvements
NEXT STEPS:
==========
1. Please review the Ceph patch and test case (Slava)
2. Happy to work with Yuri on bitops improvements if there's interest
3. For SCSI maintainers: would you like me to prepare a similar analysis for the sli_convert_mask_to_count() function?
4. Can prepare additional patches for any other confirmed cases
Questions for maintainers:
- Slava: Should the Ceph patch go through ceph-devel first, or directly to you?
- Any specific requirements for the test case integration?
- SCSI maintainers: Is the drivers/scsi/elx/libefc_sli/sli4.h case worth investigating further?
Best regards,
Huazhao Chen
lyican53@...il.com
---
Attachments:
- ceph_patch.patch: Formal patch for net/ceph/crush/mapper.c
- ceph_poc_test.c: Proof-of-concept test case demonstrating the issue
Download attachment "ceph_poc_test.c" of type "application/octet-stream" (5477 bytes)
Download attachment "ceph_patch.patch" of type "application/octet-stream" (1490 bytes)
Powered by blists - more mailing lists