linux-kernel - Re: [PATCH arm/aspeed/ast2500 v2] ipmi: add an Aspeed KCS IPMI BMC driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4c0b8170-231b-4d8f-6238-8ab1404060bf@linux.intel.com>
Date:   Wed, 31 Jan 2018 10:01:34 +0800
From:   "Wang, Haiyue" <haiyue.wang@...ux.intel.com>
To:     minyard@....org, joel@....id.au, avifishman70@...il.com,
        openbmc@...ts.ozlabs.org, openipmi-developer@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org
Cc:     andriy.shevchenko@...el.com
Subject: Re: [PATCH arm/aspeed/ast2500 v2] ipmi: add an Aspeed KCS IPMI BMC
 driver



On 2018-01-31 09:52, Corey Minyard wrote:
> On 01/30/2018 07:37 PM, Wang, Haiyue wrote:
>>
>>
>> On 2018-01-31 09:25, Corey Minyard wrote:
>>> On 01/30/2018 07:02 PM, Wang, Haiyue wrote:
>>>>
>>>>
>>>> On 2018-01-31 08:52, Corey Minyard wrote:
>>>>> On 01/30/2018 06:02 PM, Wang, Haiyue wrote:
>>>>>>
>>>>>>
>>>>>> On 2018-01-30 21:49, Corey Minyard wrote:
>>>>>>> On 01/29/2018 07:57 AM, Wang, Haiyue wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2018-01-26 22:48, Corey Minyard wrote:
>>>>>>>>> On 01/26/2018 12:08 AM, Wang, Haiyue wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2018-01-25 01:48, Corey Minyard wrote:
>>>>>>>>>>> On 01/24/2018 10:06 AM, Haiyue Wang wrote:
>>>>>>>>>>>> The KCS (Keyboard Controller Style) interface is used to 
>>>>>>>>>>>> perform in-band
>>>>>>>>>>>> IPMI communication between a server host and its BMC 
>>>>>>>>>>>> (BaseBoard Management
>>>>>>>>>>>> Controllers).
>>>>>>>>>>>>
>>>>>>>>>>>> This driver exposes the KCS interface on ASpeed SOCs 
>>>>>>>>>>>> (AST2400 and AST2500)
>>>>>>>>>>>> as a character device. Such SOCs are commonly used as BMCs 
>>>>>>>>>>>> and this driver
>>>>>>>>>>>> implements the BMC side of the KCS interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Haiyue Wang <haiyue.wang@...ux.intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>> v1->v2
>>>>>>>>>>>>
>>>>>>>>>>>> - Divide the driver into two parts, one handles the BMC KCS 
>>>>>>>>>>>> IPMI 2.0 state;
>>>>>>>>>>>>    the other handles the BMC KCS controller such as AST2500 
>>>>>>>>>>>> IO accessing.
>>>>>>>>>>>> - Use the spin lock APIs to handle the device file 
>>>>>>>>>>>> operations and BMC chip
>>>>>>>>>>>>    IRQ inferface for accessing the same KCS BMC data 
>>>>>>>>>>>> structure.
>>>>>>>>>>>> - Enhanced the phases handling of the KCS BMC.
>>>>>>>>>>>> - Unified the IOCTL definition for IPMI BMC, it will be 
>>>>>>>>>>>> used by KCS and BT.
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void kcs_bmc_handle_data(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    u8 data;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    switch (kcs_bmc->phase) {
>>>>>>>>>>>> +    case KCS_PHASE_WRITE:
>>>>>>>>>>>> +        set_state(kcs_bmc, WRITE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* set OBF before reading data */
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>>>>>>>>>> + kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>>>>>>>>>> +                        read_data(kcs_bmc);
>>>>>>>>>
>>>>>>>>> I missed this earlier, you need to issue a length error if the 
>>>>>>>>> data is too large.
>>>>>>>>>
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_WRITE_END:
>>>>>>>>>>>> +        set_state(kcs_bmc, READ_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>>>>>>>>>> + kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>>>>>>>>>> +                        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>>>>>>>>>> +        if (kcs_bmc->running) {
>>>>>>>>>>>
>>>>>>>>>>> Why do you only do this when running is set? It won't hurt 
>>>>>>>>>>> anything if it's not
>>>>>>>>>>> set.  As it is, you have a race if something opens the 
>>>>>>>>>>> device while this code
>>>>>>>>>>> runs.
>>>>>>>>>>>
>>>>>>>>>>> Also, don't set the state to wait read until the "write" has 
>>>>>>>>>>> finished (userland has
>>>>>>>>>>> read the data out of the buffer.  More on that later.
>>>>>>>>>>>
>>>>>>>>>> Understood.
>>>>>>>>>>>> + kcs_bmc->data_in_avail = true;
>>>>>>>>>>>> + wake_up_interruptible(&kcs_bmc->queue);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_READ:
>>>>>>>>>>>> +        if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len)
>>>>>>>>>>>> +            set_state(kcs_bmc, IDLE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        data = read_data(kcs_bmc);
>>>>>>>>>>>> +        if (data != KCS_CMD_READ_BYTE) {
>>>>>>>>>>>> +            set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +            write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +            break;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len) {
>>>>>>>>>>>> +            write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +            kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>>> +            break;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc,
>>>>>>>>>>>> + kcs_bmc->data_out[kcs_bmc->data_out_idx++]);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_ABORT_ERROR1:
>>>>>>>>>>>> +        set_state(kcs_bmc, READ_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* Read the Dummy byte */
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc, kcs_bmc->error);
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ABORT_ERROR2;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_ABORT_ERROR2:
>>>>>>>>>>>> +        set_state(kcs_bmc, IDLE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* Read the Dummy byte */
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* Read the Dummy byte */
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void kcs_bmc_handle_command(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    u8 cmd;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    set_state(kcs_bmc, WRITE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /* Dummy data to generate OBF */
>>>>>>>>>>>> +    write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    cmd = read_data(kcs_bmc);
>>>>>>>>>>>
>>>>>>>>>>> Shouldn't you check the phase in all the cases below and do 
>>>>>>>>>>> error
>>>>>>>>>>> handling if the phase isn't correct?
>>>>>>>>>>>
>>>>>>>>>>> Similar thing if the device here isn't open. You need to handle
>>>>>>>>>>> that gracefully.
>>>>>>>>>>>
>>>>>>>>>>> Also, you should remove data_in_avail and data_in_idx 
>>>>>>>>>>> setting from
>>>>>>>>>>> here, for reasons I will explain later.
>>>>>>>>>>>
>>>>>>>>>> If host software sends the data twice such as a retry before 
>>>>>>>>>> the BMC's IPMI service starts,
>>>>>>>>>> then the two IPMI requests will be merged into one, if not 
>>>>>>>>>> clear data_in_idx after receving
>>>>>>>>>> KCS_CMD_WRITE_START. Most of the states are driven by host 
>>>>>>>>>> software (SMS). :(
>>>>>>>>>
>>>>>>>>> True, but what if the host issues WRITE_START or a WRITE_END 
>>>>>>>>> while this driver is in read
>>>>>>>>> state?  The spec is unclear on this, but it really only makes 
>>>>>>>>> sense for the host to issue
>>>>>>>>> WRITE_START in idle stat and WRITE_END in write state. IMHO it 
>>>>>>>>> should go to error
>>>>>>>>> state.  You might make the case that a WRITE_START anywhere 
>>>>>>>>> restarts the transaction,
>>>>>>>>> but the feel of the error state machine kind of goes against 
>>>>>>>>> that. WRITE_END is definitely
>>>>>>>>> wrong anywhere but write state.
>>>>>>>>>
>>>>>>>>> I just found the following in the spec (section 9.12):
>>>>>>>>>
>>>>>>>>>    Thus, since the interface will allow a command transfer to be
>>>>>>>>>    started or restarted
>>>>>>>>>    at any time when the input buffer is empty, software could 
>>>>>>>>> elect to
>>>>>>>>>    simply retry
>>>>>>>>>    the command upon detecting an error condition, or issue a 
>>>>>>>>> ‘known good’
>>>>>>>>>    command in order to clear ERROR_STATE
>>>>>>>>>
>>>>>>>>> So a WRITE_START anywhere is ok.  A WRITE_END in the wrong 
>>>>>>>>> state should probably
>>>>>>>>> still go to error state.  This means the user needs to be able 
>>>>>>>>> to handle a write error at
>>>>>>>>> any time.  It also means it's very important to make sure the 
>>>>>>>>> user does a read before
>>>>>>>>> doing a write.  If the host re-issues a WRITE_START and writes 
>>>>>>>>> a new command
>>>>>>>>> between the time the use reads the data and writes the 
>>>>>>>>> response, the response would
>>>>>>>>> be for the wrong command.
>>>>>>>>>
>>>>>>>>>>>> +    switch (cmd) {
>>>>>>>>>>>> +    case KCS_CMD_WRITE_START:
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>> +        kcs_bmc->data_in_idx   = 0;
>>>>>>>>>>>> +        kcs_bmc->phase         = KCS_PHASE_WRITE;
>>>>>>>>>>>> +        kcs_bmc->error         = KCS_NO_ERROR;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_CMD_WRITE_END:
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_WRITE_END;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_CMD_ABORT:
>>>>>>>>>>>> +        if (kcs_bmc->error == KCS_NO_ERROR)
>>>>>>>>>>>> +            kcs_bmc->error = KCS_ABORTED_BY_COMMAND;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ABORT_ERROR1;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        kcs_bmc->error = KCS_ILLEGAL_CONTROL_CODE;
>>>>>>>>>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +        write_data(kcs_bmc, kcs_bmc->error);
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ERROR;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +int kcs_bmc_handle_event(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    unsigned long flags;
>>>>>>>>>>>> +    int ret = 0;
>>>>>>>>>>>> +    u8 status;
>>>>>>>>>>>> +
>>>>>>>>>>>> + spin_lock_irqsave(&kcs_bmc->lock, flags);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    status = read_status(kcs_bmc) & (KCS_STATUS_IBF | 
>>>>>>>>>>>> KCS_STATUS_CMD_DAT);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    switch (status) {
>>>>>>>>>>>> +    case KCS_STATUS_IBF | KCS_STATUS_CMD_DAT:
>>>>>>>>>>>> +        kcs_bmc_handle_command(kcs_bmc);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_STATUS_IBF:
>>>>>>>>>>>> +        kcs_bmc_handle_data(kcs_bmc);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        ret = -1;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> + spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL(kcs_bmc_handle_event);
>>>>>>>>>>>> +
>>>>>>>>>>>> +static inline struct kcs_bmc *file_kcs_bmc(struct file *filp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return container_of(filp->private_data, struct 
>>>>>>>>>>>> kcs_bmc, miscdev);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int kcs_bmc_open(struct inode *inode, struct file 
>>>>>>>>>>>> *filp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    int ret = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (!kcs_bmc->running) {
>>>>>>>>>>>> +        kcs_bmc->running       = 1;
>>>>>>>>>>>> +        kcs_bmc->phase         = KCS_PHASE_IDLE;
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>
>>>>>>>>>>> If you do everything right, setting the phase and 
>>>>>>>>>>> data_in_avail should not
>>>>>>>>>>> be necessary here.
>>>>>>>>>>>
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        ret = -EBUSY;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static unsigned int kcs_bmc_poll(struct file *filp, 
>>>>>>>>>>>> poll_table *wait)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    unsigned int mask = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    poll_wait(filp, &kcs_bmc->queue, wait);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (kcs_bmc->data_in_avail)
>>>>>>>>>>>> +        mask |= POLLIN;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return mask;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>>>>>>>>>> +                size_t count, loff_t *offset)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    ssize_t ret = -EAGAIN;
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> This function still has some issues.
>>>>>>>>>>>
>>>>>>>>>>> You can't call copy_to_user() with a spinlock held or 
>>>>>>>>>>> interrupts disabled.
>>>>>>>>>>> To handle readers, you probably need a separate mutex.
>>>>>>>>>>>
>>>>>>>>>>> Also, this function can return -EAGAIN even if O_NONBLOCK is 
>>>>>>>>>>> not set if
>>>>>>>>>>> kcs_bmc->data_in_avail changes between when you wait on the 
>>>>>>>>>>> event
>>>>>>>>>>> and when you check it under the lock.
>>>>>>>>>>>
>>>>>>>>>>> You also clear data_in_avail even if the copy_to_user() 
>>>>>>>>>>> fails, which is
>>>>>>>>>>> wrong.
>>>>>>>>>>>
>>>>>>>>>>> I believe the best way to handle this would be to have the 
>>>>>>>>>>> spinlock
>>>>>>>>>>> protect the inner workings of the state machine and a mutex 
>>>>>>>>>>> handle
>>>>>>>>>>> copying data out, setting/clearing the running flag (thus a 
>>>>>>>>>>> mutex
>>>>>>>>>>> instead of spinlock in open and release) and the ioctl 
>>>>>>>>>>> settings (except
>>>>>>>>>>> for abort where you will need to grab the spinlock).
>>>>>>>>>>>
>>>>>>>>>>> After the wait event below, grab the mutex. If data is not 
>>>>>>>>>>> available
>>>>>>>>>>> and O_NONBLOCK is not set, drop the mutex and retry. Otherwise
>>>>>>>>>>> this is the only place (besides release) that sets 
>>>>>>>>>>> data_in_avail to false.
>>>>>>>>>>> Do the copy_to_user(), grab the spinlock, clear 
>>>>>>>>>>> data_in_avail and
>>>>>>>>>>> data_in_idx, then release the lock and mutex. If you are really
>>>>>>>>>>> adventurous you can do this without grabbing the lock using
>>>>>>>>>>> barriers, but it's probably not necessary here.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With the state machine being able to be restarted at any time, 
>>>>>>>>> you need
>>>>>>>>> something a little different here.  You still need the mutex 
>>>>>>>>> to handle
>>>>>>>>> multiple readers and the copy.  I think the function should be 
>>>>>>>>> something
>>>>>>>>> like:
>>>>>>>>>
>>>>>>>> Since KCS is not a multi-reader protocol from BMC's view, you 
>>>>>>>> makes things complex. :-)
>>>>>>>
>>>>>>> No, I don't think you understand.  The primary purpose of the 
>>>>>>> complexity
>>>>>>> here is to protect the driver from the host system (on the other 
>>>>>>> side of
>>>>>>> the KCS interface).  Without this protection, it is possible for 
>>>>>>> the host
>>>>>>> system to start a new write while the user on the BMC side is 
>>>>>>> reading
>>>>>>> data out, resulting in corrupt data being read.
>>>>>>>
>>>>>>> I haven't thought too much about this.  There may be a simpler way,
>>>>>>> but the protection needs to be there.
>>>>>>>
>>>>>>> And you may not think you need to protect the driver against a
>>>>>>> malicious BMC side user code, but you would be wrong. You can
>>>>>>> only have one opener, but with threads or a fork you can have
>>>>>>> multiple readers.  And you don't know if a malicious piece of
>>>>>>> code has taken over userland.  You always need to protect the
>>>>>>> kernel.
>>>>>>>
>>>>>> Sure, the read/write have protected the critical data area with 
>>>>>> IRQ, and also, these
>>>>>> functions should be thread local safe I believe.
>>>>>>
>>>>>> spin_lock_irq(&kcs_bmc->lock);
>>>>>> ...
>>>>>> spin_unlock_irq(&kcs_bmc->lock);
>>>>>>
>>>>>
>>>>> But remember, you can't call copy_to_user() when IRQs are off or 
>>>>> when you are holding
>>>>> a spinlock.  That is an absolute no.  It can crash the kernel.
>>>>>
>>>>> So you need a design that takes this into account, but will not 
>>>>> result in the possibility
>>>>> of bad data being read.
>>>>>
>>>> Yes, sure, as I said before: access_ok(VERIFY_WRITE, to, n), then 
>>>> memcpy in spin_lock.
>>>
>>> Where did you get the idea that this was ok?  It's not. access_ok() 
>>> is not actually very
>>> useful, since the permissions on memory can change at any time 
>>> unless you are holding
>>> the mm lock, which is also not an ok thing to do.  It is entirely 
>>> possible for access_ok()
>>> to pass and copy_to_user() to fail.
>>>
>> I thought memcpy will not fail. :(
>
> Oh, memcpy won't fail as long as the source and destination is kernel 
> memory.
> I was a little confused by the access_ok() thing, it's common for 
> people to
> assume that if they do access_ok(), that copy_to_user() won't fail.
>
Yes, commonly misunderstand,  didn't well understand the hidden things 
that kernel do for memory
management.
>>> I'm not exactly sure what you are saying, though.  In any event, a 
>>> well-designed read()/write()
>>> operation should leave the system unchanged if it gets an error.
>>>
>> I saw BT use a local buffer, If I change the '#define 
>> KCS_MSG_BUFSIZ    1024' to ".. 512", should it be OK
>> as BT ?
>>
>> static ssize_t bt_bmc_read(struct file *file, char __user *buf,
>>                size_t count, loff_t *ppos)
>> {
>>     struct bt_bmc *bt_bmc = file_bt_bmc(file);
>>     u8 len;
>>     int len_byte = 1;
>>     u8 kbuffer[BT_BMC_BUFFER_SIZE];  --> #define BT_BMC_BUFFER_SIZE 256
>
> It's good practice to keep larger things off the stack, which is why I 
> dynamically
> allocated it.  But if you have a mutex, you can put that buffer in 
> struct bt_bmc
> since it would only be accessed when holding the mutex.
>
Got it, looks like this is the best idea. I will rewrite the driver 
again, hope I can catch all of your code review
comments. :-)
>>
>>> -corey
>>>
>>>>>>>>>    static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>>>>>>>                     size_t count, loff_t *offset)
>>>>>>>>>    {
>>>>>>>>>         struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>         ssize_t ret;
>>>>>>>>>         bool avail;
>>>>>>>>>         size_t data_size;
>>>>>>>>>         u8 *data;
>>>>>>>>>
>>>>>>>>>         data = kmalloc(KCS_MSG_BUFSIZ, GFP_KERNEL);
>>>>>>>>>         if (!data)
>>>>>>>>>             return -ENOMEM;
>>>>>>>>>
>>>>>>>>>    retry:
>>>>>>>>>         ret = -EAGAIN;
>>>>>>>>>         if (!(filp->f_flags & O_NONBLOCK))
>>>>>>>>> wait_event_interruptible(kcs_bmc->queue,
>>>>>>>>> kcs_bmc->data_in_avail);
>>>>>>>>>
>>>>>>>>>         mutex_lock(&kcs_bmc->read_mutex);
>>>>>>>>>
>>>>>>>>>         spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>         avail = kcs_bmc->data_in_avail;
>>>>>>>>>         if (avail) {
>>>>>>>>>             memcpy(data, kcs_bmc->data_in, kcs_bmc->data_in_idx);
>>>>>>>>>             data_size = kcs_bmc->data_in_idx;
>>>>>>>>>         }
>>>>>>>>>         spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>
>>>>>>>>>         if (!avail) {
>>>>>>>>>             if (filp->f_flags & O_NONBLOCK)
>>>>>>>>>                 goto out_mutex_unlock;
>>>>>>>>> mutex_unlock(&kcs_bmc->read_mutex);
>>>>>>>>>             goto retry;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         if (count < data_size) {
>>>>>>>>>             ret = -EOVERFLOW;
>>>>>>>>>              ? I'm not sure about the error, but userspace 
>>>>>>>>> needs to know.
>>>>>>>>>             goto out_mutex_unlock;
>>>>>>>
>>>>>>> Maybe a length error to the host side here?
>>>>>
>>>>> You didn't comment on this or the other length error. That needs 
>>>>> to be
>>>>> handled.
>>>>>
>>>> Yes, will send a length error by following KCS spec.
>>>>>>>
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         if (!copy_to_user(buf, data, data_size)) {
>>>>>>>>>             ret = -EFAULT;
>>>>>>>>>             goto out_mutex_unlock;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         ret = data_size;
>>>>>>>>>
>>>>>>>>>         spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>
>>>>>>>>>         if (kcs_bmc->phase != KCS_PHASE_WRITE_END_DONE)
>>>>>>>>>             /* Something aborted or restarted the state 
>>>>>>>>> machine. */
>>>>>>>>>             ? Maybe restart if O_NONBLOCK is not set and 
>>>>>>>>> -EAGAIN if it is?
>>>>>>>>>             ret = -EIO;
>>>>>>>>>         } else {
>>>>>>>>>             kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>>>>>>>             kcs_bmc->data_in_avail = false;
>>>>>>>>>             kcs_bmc->data_in_idx = 0;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>
>>>>>>>>>    out_mutex_unlock:
>>>>>>>>>         mutex_unlock(&kcs_bmc->read_mutex);
>>>>>>>>>
>>>>>>>>>         kfree(data);
>>>>>>>>>
>>>>>>>>>         return ret;
>>>>>>>>>    }
>>>>>>>>> Note that I added a state, KCS_PHASE_WRITE_END_DONE, which 
>>>>>>>>> would be
>>>>>>>>> set after the final byte from the host is received. You want 
>>>>>>>>> the read here
>>>>>>>>> done before you can do the write below to avoid the race I 
>>>>>>>>> talked about.
>>>>>>>>>
>>>>>>>>> There is a local copy made of the data.  What you *never* want 
>>>>>>>>> to happen
>>>>>>>>> here is for the state machine to start processing a new write 
>>>>>>>>> command
>>>>>>>>> while the data is being copied.  It could result in corrupt 
>>>>>>>>> data being read
>>>>>>>>> and some random operation being done by the BMC.
>>>>>>>>>
>>>>>>>>> If you want to avoid the local copy, it could be done, but 
>>>>>>>>> it's more complex.
>>>>>>>>>
>>>>>>>>>>>> +    if (!(filp->f_flags & O_NONBLOCK))
>>>>>>>>>>>> + wait_event_interruptible(kcs_bmc->queue,
>>>>>>>>>>>> + kcs_bmc->data_in_avail);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (kcs_bmc->data_in_avail) {
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (count > kcs_bmc->data_in_idx)
>>>>>>>>>>>> +            count = kcs_bmc->data_in_idx;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (!copy_to_user(buf, kcs_bmc->data_in, count))
>>>>>>>>>>>> +            ret = count;
>>>>>>>>>>>> +        else
>>>>>>>>>>>> +            ret = -EFAULT;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static ssize_t kcs_bmc_write(struct file *filp, const char 
>>>>>>>>>>>> *buf,
>>>>>>>>>>>> +                 size_t count, loff_t *offset)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    ssize_t ret = count;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (count < 1 || count > KCS_MSG_BUFSIZ)
>>>>>>>>>>>> +        return -EINVAL;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
>>>>>>>>>>>> +        if (copy_from_user(kcs_bmc->data_out, buf, count)) {
>>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +            return -EFAULT;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_READ;
>>>>>>>>>>>> +        kcs_bmc->data_out_idx = 1;
>>>>>>>>>>>> +        kcs_bmc->data_out_len = count;
>>>>>>>>>>>> +        write_data(kcs_bmc, kcs_bmc->data_out[0]);
>>>>>>>>>>>> +    } else if (kcs_bmc->phase == KCS_PHASE_READ) {
>>>>>>>>>>>> +        ret = -EBUSY;
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        ret = -EINVAL;
>>>>>>>>>>>
>>>>>>>>>>> Is there a reason you return -EINVAL here? Why not just 
>>>>>>>>>>> -EBUSY in all
>>>>>>>>>>> cases?  Is there something that userland will need to do 
>>>>>>>>>>> differently?
>>>>>>>>>>>
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static long kcs_bmc_ioctl(struct file *filp, unsigned int 
>>>>>>>>>>>> cmd,
>>>>>>>>>>>> +              unsigned long arg)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    long ret = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    switch (cmd) {
>>>>>>>>>>>> +    case IPMI_BMC_IOCTL_SET_SMS_ATN:
>>>>>>>>>>>> +        update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>>>>>>>>>> + KCS_STATUS_SMS_ATN);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case IPMI_BMC_IOCTL_CLEAR_SMS_ATN:
>>>>>>>>>>>> +        update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>>>>>>>>>> +                        0);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case IPMI_BMC_IOCTL_FORCE_ABORT:
>>>>>>>>>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ERROR;
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        ret = -EINVAL;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int kcs_bmc_release(struct inode *inode, struct 
>>>>>>>>>>>> file *filp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> What happens if the device gets closed in the middle of a 
>>>>>>>>>>> transaction?  That's
>>>>>>>>>>> an important case to handle.  If something is in process, 
>>>>>>>>>>> you need to abort it.
>>>>>>>>>>>
>>>>>>>>>> The device just provides the read & write data, the 
>>>>>>>>>> transaction is handled in the KCS
>>>>>>>>>> controller's IRQ handler.
>>>>>>>>>
>>>>>>>>> From the spec, section 9.14:
>>>>>>>>>
>>>>>>>>>    The BMC must change the status to ERROR_STATE on any 
>>>>>>>>> condition where it
>>>>>>>>>    aborts a command transfer in progress.
>>>>>>>>>
>>>>>>>>> So you need to do something here.
>>>>>>>>>
>>>>>>>> In practice, we do this as spec said in ipmid, NOT in driver, 
>>>>>>>> driver can't handle anything, let's
>>>>>>>> make it simple, thanks!
>>>>>>>
>>>>>>> If ipmid crashes or is killed, how does it accomplish this?
>>>>>>>
>>>>>> Every time ipmids (or kcsd) crashed or killed, it needs start to 
>>>>>> call FORCE_ARBORT firstly, to sync with
>>>>>> host side software.
>>>>>>>>
>>>>>>>> Whenever the BMC is reset (from power-on or a hard reset), the 
>>>>>>>> State Bits are initialized to “11 - Error State”. Doing so
>>>>>>>> allows SMS to detect that the BMC has been reset and that any 
>>>>>>>> message in process has been terminated by the BMC.
>>>>>>>
>>>>>>> Right, that's fine, like it should be.  But we are not talking 
>>>>>>> about a reset.
>>>>>>>
>>>>>> I think the final error handling solution is that kcsd (user 
>>>>>> land) runs, otherwise, the host software side still got stuck. We 
>>>>>> meet
>>>>>> this kind of issue, so in general, we just doesn't handle some 
>>>>>> mirror errors in driver, then in kcsd, when it can provide the real
>>>>>> IPMI service, it will reset the channel firstly to sync with host 
>>>>>> side software.
>>>>>
>>>>> "Userland will do the right thing" is not very convincing to a 
>>>>> kernel developer.
>>>>>
>>>>> Plus if the above is true, I would think that you would just want 
>>>>> to hold the device
>>>>> in an error state when it wasn't opened.
>>>>>
>>>> I understand your concern, of course, driver need handles things 
>>>> well. But in fact, if a user app is truly a bad boy, it still can hang
>>>> the host side: set SMS_ATN, but no message returned when software 
>>>> host side requests, then host open-ipmi driver will hang, we
>>>> meet this kind of error to hang the customer's host. :) In my 
>>>> understanding, kcs-bmc should do the right thing about read and write,
>>>> the real transaction should be handled correctly by the kcsd.
>>>>
>>>> And if no kcsd starts, then this kind of BMC can't be sold out. :)
>>>
>>> True.  I'm not as concerned about this sort of thing.  It's nicer to 
>>> the host side if
>>> it can detect problems quickly, but it will eventually time out.
>>>
>>> From what I can tell from the current design, if the BMC userland is 
>>> not running,
>>> the driver will step through the state machine until it hits read 
>>> state, then it
>>> will sit there until the host times out and aborts the operation.
>>>
>>> IMHO, it would be better for the host side if the driver just stayed 
>>> in error state
>>> if nothing had it open.  It would think the spec says that in the 
>>> quote I referenced
>>> above, but that quote, like many things in the IPMI spec, is fairly 
>>> vague and could
>>> be interpreted many ways.
>>>
>> Well, I will try to fix this errors as possible.
>>> -corey
>>>
>>>
>>>>> -corey
>>>>>
>>>>>>> -corey
>>>>>>>
>>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    kcs_bmc->running = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>