lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <80e107f13c239f5a8f9953dad634c7419c34e31b.camel@ibm.com>
Date: Mon, 15 Sep 2025 18:46:49 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "lyican53@...il.com" <lyican53@...il.com>
CC: "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "seanjc@...gle.com"
	<seanjc@...gle.com>, Xiubo Li <xiubli@...hat.com>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        "sboyd@...nel.org"
	<sboyd@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "idryomov@...il.com"
	<idryomov@...il.com>,
        "martin.petersen@...cle.com"
	<martin.petersen@...cle.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "mturquette@...libre.com" <mturquette@...libre.com>,
        "linux-clk@...r.kernel.org" <linux-clk@...r.kernel.org>
Subject: Re:  [RFC] Fix potential undefined behavior in __builtin_clz usage
 with GCC 11.1.0

On Mon, 2025-09-15 at 10:51 +0800, 陈华昭 wrote:
> Hi all,
> 
> I've identified several instances in the Linux kernel where __builtin_clz()
> is used without proper zero-value checking, which may trigger undefined
> behavior when compiled with GCC 11.1.0 using -march=x86-64-v3 -O1 optimization.
> 
> PROBLEM DESCRIPTION:
> ===================
> 
> GCC bug 101175 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175  ) causes
> __builtin_clz() to generate BSR instructions without proper zero handling when
> compiled with specific optimization flags. The BSR instruction has undefined
> behavior when the source operand is zero, potentially causing incorrect results.
> 
> The issue manifests when:
> - GCC version: 11.1.0 (potentially other versions)
> - Compilation flags: -march=x86-64-v3 -O1
> - Code pattern: __builtin_clz(value) where value might be 0
> 
> AFFECTED LOCATIONS:
> ==================
> 
> 1. HIGH RISK: net/ceph/crush/mapper.c:265
> Problem: __builtin_clz(x & 0x1FFFF) when (x & 0x1FFFF) could be 0
> Impact: CRUSH hash algorithm corruption in Ceph storage
> 
> 2. HIGH RISK: drivers/scsi/elx/libefc_sli/sli4.h:3796
> Problem: __builtin_clz(mask) in sli_convert_mask_to_count() with no zero check
> Impact: Incorrect count calculations in SCSI operations
> 
> 3. HIGH RISK: tools/testing/selftests/kvm/dirty_log_test.c:314
> Problem: Two __builtin_clz() calls without zero validation
> Impact: KVM selftest framework reliability
> 
> 4. MEDIUM RISK: drivers/clk/clk-versaclock7.c:322
> Problem: __builtin_clzll(den) but prior checks likely prevent den=0
> Impact: Clock driver calculations (lower risk due to existing checks)
> 
> COMPARISON WITH SAFE PATTERNS:
> =============================
> 
> The kernel already implements safe patterns in many places:
> 
> // Safe pattern from include/asm-generic/bitops/builtin-fls.h
> return x ? sizeof(x) * 8 - __builtin_clz(x) : 0;
> 
> // Safe pattern from arch/powerpc/lib/sstep.c
> op->val = (val ? __builtin_clz(val) : 32);
> 
> PROPOSED FIXES:
> ==============
> 
> 1. net/ceph/crush/mapper.c:
> - int bits = __builtin_clz(x & 0x1FFFF) - 16;
> + u32 masked = x & 0x1FFFF;
> + int bits = masked ? __builtin_clz(masked) - 16 : 16;
> 
> 2. drivers/scsi/elx/libefc_sli/sli4.h:
> if (method) {
> - count = 1 << (31 - __builtin_clz(mask));
> + count = mask ? 1 << (31 - __builtin_clz(mask)) : 0;
> count *= 16;
> 
> 3. tools/testing/selftests/kvm/dirty_log_test.c:
> - limit = 1 << (31 - __builtin_clz(pages));
> - test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
> + limit = pages ? 1 << (31 - __builtin_clz(pages)) : 1;
> + test_dirty_ring_count = test_dirty_ring_count ?
> + 1 << (31 - __builtin_clz(test_dirty_ring_count)) : 1;
> 
> REPRODUCTION:
> ============
> 
> Based on the GCC bug report and analysis of the kernel code patterns, this
> issue can be reproduced by:
> 
> 1. Compiling affected code with: gcc -march=x86-64-v3 -O1
> 2. Examining generated assembly for BSR instructions
> 3. Triggering code paths where the __builtin_clz argument could be zero
> 
> QUESTIONS:
> =========
> 
> 1. Should I prepare formal patches for each affected subsystem?

Yes, please, send the formal patch for Ceph case.

> 2. Are there other instances I should investigate?
> 3. Would adding a kernel-wide safe wrapper for __builtin_clz be appropriate?
> 4. Would the maintainers like me to create a proof-of-concept test case?

Yes, it will be great to have this proof-of-concept test case for Ceph case. I
am still trying to imagine a real use-case when we could have likewise issue. I
believe it could be very useful to have some Kunit-based unit-test(s) for this
subsystem in Ceph.

Thanks,
Slava.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ