Thursday, September 10, 2009

Some things have happened.

I have to apologize for the very bad update ratio on the blog. Hopefully someone will return to it anyway!

Since my last post, some 3 months ago, the kernel now implements 5 different system calls. These are:
  • yield
  • allocate a buffer
  • transmit a buffer to another process
  • receive a buffer from some other process
  • dispose of a buffer
The kernel is now a very simple message passing based one. It also handles different priorities, so that at every moment, the highest-prioritized ready process will run. Processes can now be written in C, as system call library functions are provided.

A couple of months ago, someone asked about a description of the entire process from booting up to scheduling. I will attempt to describe a few important steps:

First of all, the kernel and applications only run from RAM. Some basic support for running from ROM is implemented in the kernel and in the BSP (startup routines). There should not really need to be a lot of differences between these two cases and support for running from ROM will be there in time.

The way I run the system right now is using OpenOCD and GDB to download an elf file to the board. The elf file contains the monolith consisting of the kernel, the BSP and the application, everything linked to run from RAM. After downloading, I reset the board and let it run.

The first thing that happens is that the CPU is reset, which means that it will use the default exception vector location (which is at address 0x00000000, i.e. in flash) to find the reset routine. I have (once and for all) put a small assembly routine in flash, which is pointed out by the reset vector. This routine reads the desired reset vector from RAM (at 0x20000004), the initial main stack pointer (at 0x20000000) and simply jumps to the boot routine pointed out by the address in 0x20000004, which is the BSP startup. This bootstrapping assembly routine is not part of the OS, nor the BSP.

The BSP startup routine first of all sets up the STM32 clocks, so that the CPU core runs at 72 MHz (SYSCLK) which is generated from an 8 MHz external crystal which is multiplied by 9 using a PLL. AHB and APB2 clocks are also set up properly.

After setting up all clocks, the BSS segment is cleared. The area to be cleared is obtained by getting the addresses of the _bss_start and _bss_end symbols. These symbols are defined in the linker command file which is supplied by the BSP.

Next, by writing to the NVIC_VTOR register, the exception table location is moved to RAM (where it already is loaded). The GPIO module is set up to allow flashing the status LED.

After that, the BSP is almost finished setting up the board. There is one more thing to consider before handing over to the kernel.

All kernel setup processes (kernel initialization, process creation) must run from handler mode. I'll get back to the reason for that later. The default CPU mode after reset is thread mode, and the only way to switch to handler mode is through an exception. Therefore, the BSP writes the address of the kernel startup routine,
to the SVC (system call) vector in RAM and triggers it by issuing an svc instruction. This results in a switch to handler mode and a jump to the kernel.

The kernel startup routine clears the kernel pool area, then calls a hook which the application is supposed to implement. This hook function should do a number of rtos_create_process calls. This call allows the application to specify a process entry point, a priority and a stack size. For each call, a PCB is created in the kernel pool area, as well as a stack. The PCB is sorted into a readylist. After all these calls, the kernel is ready to start the highest prioritized process.

I mentioned earlier that the kernel startup must run from handler mode. The reason for this is that when the first process is scheduled, it will be started in the same way as if it was resumed after preemption. Therefore, for each created process, the architecture specific code is called to prepare the process stack in a way such that returning to the process and resuming it actually starts the process from the first instruction and with an empty stack.

When an application issues a system call, this is done by putting the syscall id (an integer, ranging from 0 to 4) in register r12 (done by an inline assembly routine). The scratch registers r0-r3 are transparently sent from the function call to the svc exception handler. The exception handler uses r12 as an offset into a jump table. Then a jump is taken to the system call implementation.

No interrupt handling is implemented yet, i.e. I haven't had to deal with critical regions in the kernel yet. It is still to be decided for instance what exception priority levels will be used for the SVC call in relation to other exceptions (interrupts from peripherals).

There is yet some more to be written about for instance context switching using the PendSV exception. I'll return to that later!

4 comments:

  1. Great to hear that you're making progress.

    I've sorted out the hardware init for my board, and set up a "tick" interrupt to call a scheduler. Do I understand correctly if I say that your kernel has cooperative multitasking rather than pre-emptive? You mentioned that you've implemented a yield system call, but you're not dealing with interrupts yet.

    My blog is getting updated even less often than yours... but you're welcome to see it http://danielromaniuk.com/

    Keep up the good work.

    ReplyDelete
  2. You're right; right now the kernel has only cooperative multitasking. But it will be preemptive quite soon. I am working with handling a SysTick interrupt which can call the scheduler (processes will be able to make delay system calls and interrupt handlers may send messages to processes that can wake up and preempt lower priority processes). I am currently trying to under the priority handling system of the Cortex-M3, including sub priorities and pre-emption priorities. I think I understand the basics, but not the details. Do you also code for the Cortex-M3?

    I'll check out your blog!

    ReplyDelete
  3. I'm playing with an LPC2378-STK, based on the ARM7TDMI. I want to use assembler as much as possible, to learn it.

    I found an interesting set of lecture and lab notes for a course where they develop an RTOS over one semester, here:
    http://www.et.byu.edu/groups/ece425web/current/sched.html

    I will use that as a structure, to not get too lost. Still just in the early stages though!

    ReplyDelete
  4. Here is some HC08 code to translate into Cortex M3.

    Time is considered as a system resource and therefore
    required to be managed by the OS. Even in multitask,
    application must not loop on itself to generate a wait
    time. Most examples do this wrong. For compensation,
    they need multitask OS what runs furhter tasks in
    background while the waiting task wastes CPU time with
    looping. If more than one time required conincidental,
    the performance gets poor. The more elegant real time
    solution is to increment only one time variable for
    all times inside the system.

    This can be done with the CM3 systick timer what
    should be connected to to a system clock tictac:
    interrupt service

    tictac: ldhx stime ;16 bit system time
    aix #1
    sthx stime
    rts

    To start a timer there is a setflop: call
    A contains the desired time, X points to the
    timer varible (what does not need to be incremented)

    sflop: add stime+1 ;actual time plus system time lowbyte
    clrh ;8 bit version with 16 bit system time
    sta 0,x ;store new reference in timer var
    rts

    To test if the time is over, there is a testflop: call.
    At entry, X points to the timer variable to test. Return
    value is sign flag what indicates if time is over or not.

    tflop: clrh
    lda stime+1
    sub 0,x ;Lowbyte systemtime minus timervalue
    rts

    To use a time inside a task, simply start it

    lda #timelength
    ldx #timervariable
    jsr sflop

    now do anything else or goto context switch to do more
    meaningfull things than looping in a timer loop.

    To test, if the time is over, simply call tflop.

    ldx #timervariable
    jsr tflop
    branchifplus timeover

    If time is not over, do anything else or goto
    context switch to do more meaningfull things than
    looping in a timer loop.

    This way, not a single CPU cycle is wasted with counting
    several times in parallel and the amount of concurrent
    timers is only limited by the available system RAM
    instead of CPU time.

    Note, that the system timer here is 16 bit, but timers
    are only 8 bit. For 32 bit Cortex it would be ok to
    implement 32 bit system time and 32 bit set/testflop.
    Try to pass the negative flag to C Calls with GCC inline
    macro like described in GCC chapter 5.34
    Keep in mind, that the maximum time value for a 8 bit
    timer is only 7 bit in length, for 16 bit timer it will be
    15 bit (32768) and for 32 bit it will be 31 bit. The other
    half of the time can be considered as a time buffer
    resource if the task does not poll the timer immediately
    after the time is over.

    With other words: Many contemporary times elapse without
    polling them. The context swith to a polling task can be
    much later (the maximum timer length). Of coarse, if should
    be held as small as possible to avoid the additional
    time jitter and to keep the times as accurate as CPU time
    allows. This can be done easily if all times are implemented
    in this way. Good luck with your embedded spare time....

    ReplyDelete