linux-kernel - Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200709101516.03234.vda.linux@googlemail.com>
Date:	Mon, 10 Sep 2007 15:16:02 +0100
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	Kyle Moffett <mrmacman_g4@....com>
Cc:	Arjan van de Ven <arjan@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Nick Piggin <piggin@...erone.com.au>,
	Satyam Sharma <satyam@...radead.org>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Paul Mackerras <paulus@...ba.org>,
	Christoph Lameter <clameter@....com>,
	Chris Snook <csnook@...hat.com>,
	Ilpo Jarvinen <ilpo.jarvinen@...sinki.fi>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Stefan Richter <stefanr@...6.in-berlin.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-arch@...r.kernel.org, Netdev <netdev@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>, ak@...e.de,
	heiko.carstens@...ibm.com, David Miller <davem@...emloft.net>,
	schwidefsky@...ibm.com, wensong@...ux-vs.org, horms@...ge.net.au,
	wjiang@...ilience.com, cfriesen@...tel.com, zlynx@....org,
	rpjday@...dspring.com, jesper.juhl@...il.com,
	segher@...nel.crashing.org
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

On Monday 10 September 2007 14:38, Denys Vlasenko wrote:
> You are basically trying to educate me how to use atomic properly.
> You don't need to do it, as I am (currently) not a driver author.
> 
> I am saying that people who are already using atomic_read()
> (and who unfortunately did not read your explanation above)
> will still sometimes use atomic_read() as a way to read atomic value
> *from memory*, and will create nasty heisenbugs for you to debug.

static inline int
qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
{
        int      return_status = QLA_SUCCESS;
        unsigned long loop_timeout ;
        scsi_qla_host_t *pha = to_qla_parent(ha);

        /* wait for 5 min at the max for loop to be ready */
        loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);

        while ((!atomic_read(&pha->loop_down_timer) &&
            atomic_read(&pha->loop_state) == LOOP_DOWN) ||
            atomic_read(&pha->loop_state) != LOOP_READY) {
                if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
                        return_status = QLA_FUNCTION_FAILED;
                        break;
                }
                msleep(1000);
                if (time_after_eq(jiffies, loop_timeout)) {
                        return_status = QLA_FUNCTION_FAILED;
                        break;
                }
        }
        return (return_status);
}

Is above correct or buggy? Correct, because msleep is a barrier.
Is it obvious? No.

static void
qla2x00_rst_aen(scsi_qla_host_t *ha)
{
        if (ha->flags.online && !ha->flags.reset_active &&
            !atomic_read(&ha->loop_down_timer) &&
            !(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) {
                do {
                        clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags);

                        /*
                         * Issue marker command only when we are going to start
                         * the I/O.
                         */
                        ha->marker_needed = 1;
                } while (!atomic_read(&ha->loop_down_timer) &&
                    (test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags)));
        }
}

Is above correct? I honestly don't know. Correct, because set_bit is
a barrier on _all _memory_? Will it break if set_bit will be changed
to be a barrier only on its operand? Probably yes.

drivers/kvm/kvm_main.c

        while (atomic_read(&completed) != needed) {
                cpu_relax();
                barrier();
        }

Obviously author did not know that cpu_relax is already a barrier.
See why I think driver authors will be confused?

arch/x86_64/kernel/crash.c

static void nmi_shootdown_cpus(void)
{
...
        msecs = 1000; /* Wait at most a second for the other cpus to stop */
        while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
                mdelay(1);
                msecs--;
        }
...
}

Is mdelay(1) a barrier? Yes, because it is a function on x86_64.
Absolutely the same code will be buggy on an arch where
mdelay(1) == udelay(1000), and udelay is implemented
as inline busy-wait.

arch/sparc64/kernel/smp.c

        /* Wait for response */
        while (atomic_read(&data.finished) != cpus)
                cpu_relax();
...later in the same file...
                while (atomic_read(&smp_capture_registry) != ncpus)
                        rmb();

I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()?
(No it doesn't). Which of those two while loops needs correcting?
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/