Embedded Spare Time: Some things have happened.

Thursday, September 10, 2009

Some things have happened.

I have to apologize for the very bad update ratio on the blog. Hopefully someone will return to it anyway!

Since my last post, some 3 months ago, the kernel now implements 5 different system calls. These are:

yield
allocate a buffer
transmit a buffer to another process
receive a buffer from some other process
dispose of a buffer

The kernel is now a very simple message passing based one. It also handles different priorities, so that at every moment, the highest-prioritized ready process will run. Processes can now be written in C, as system call library functions are provided.

A couple of months ago, someone asked about a description of the entire process from booting up to scheduling. I will attempt to describe a few important steps:

First of all, the kernel and applications only run from RAM. Some basic support for running from ROM is implemented in the kernel and in the BSP (startup routines). There should not really need to be a lot of differences between these two cases and support for running from ROM will be there in time.

The way I run the system right now is using OpenOCD and GDB to download an elf file to the board. The elf file contains the monolith consisting of the kernel, the BSP and the application, everything linked to run from RAM. After downloading, I reset the board and let it run.

The first thing that happens is that the CPU is reset, which means that it will use the default exception vector location (which is at address 0x00000000, i.e. in flash) to find the reset routine. I have (once and for all) put a small assembly routine in flash, which is pointed out by the reset vector. This routine reads the desired reset vector from RAM (at 0x20000004), the initial main stack pointer (at 0x20000000) and simply jumps to the boot routine pointed out by the address in 0x20000004, which is the BSP startup. This bootstrapping assembly routine is not part of the OS, nor the BSP.

The BSP startup routine first of all sets up the STM32 clocks, so that the CPU core runs at 72 MHz (SYSCLK) which is generated from an 8 MHz external crystal which is multiplied by 9 using a PLL. AHB and APB2 clocks are also set up properly.

After setting up all clocks, the BSS segment is cleared. The area to be cleared is obtained by getting the addresses of the _bss_start and _bss_end symbols. These symbols are defined in the linker command file which is supplied by the BSP.

Next, by writing to the NVIC_VTOR register, the exception table location is moved to RAM (where it already is loaded). The GPIO module is set up to allow flashing the status LED.

After that, the BSP is almost finished setting up the board. There is one more thing to consider before handing over to the kernel.

All kernel setup processes (kernel initialization, process creation) must run from handler mode. I'll get back to the reason for that later. The default CPU mode after reset is thread mode, and the only way to switch to handler mode is through an exception. Therefore, the BSP writes the address of the kernel startup routine,
to the SVC (system call) vector in RAM and triggers it by issuing an svc instruction. This results in a switch to handler mode and a jump to the kernel.

The kernel startup routine clears the kernel pool area, then calls a hook which the application is supposed to implement. This hook function should do a number of rtos_create_process calls. This call allows the application to specify a process entry point, a priority and a stack size. For each call, a PCB is created in the kernel pool area, as well as a stack. The PCB is sorted into a readylist. After all these calls, the kernel is ready to start the highest prioritized process.

I mentioned earlier that the kernel startup must run from handler mode. The reason for this is that when the first process is scheduled, it will be started in the same way as if it was resumed after preemption. Therefore, for each created process, the architecture specific code is called to prepare the process stack in a way such that returning to the process and resuming it actually starts the process from the first instruction and with an empty stack.

When an application issues a system call, this is done by putting the syscall id (an integer, ranging from 0 to 4) in register r12 (done by an inline assembly routine). The scratch registers r0-r3 are transparently sent from the function call to the svc exception handler. The exception handler uses r12 as an offset into a jump table. Then a jump is taken to the system call implementation.

No interrupt handling is implemented yet, i.e. I haven't had to deal with critical regions in the kernel yet. It is still to be decided for instance what exception priority levels will be used for the SVC call in relation to other exceptions (interrupts from peripherals).

There is yet some more to be written about for instance context switching using the PendSV exception. I'll return to that later!

4 comments:

Daniel RomaniukOctober 3, 2009 at 7:10 PM
Great to hear that you're making progress.

I've sorted out the hardware init for my board, and set up a "tick" interrupt to call a scheduler. Do I understand correctly if I say that your kernel has cooperative multitasking rather than pre-emptive? You mentioned that you've implemented a yield system call, but you're not dealing with interrupts yet.

My blog is getting updated even less often than yours... but you're welcome to see it http://danielromaniuk.com/

Keep up the good work.
ReplyDelete
Replies
MarcusOctober 3, 2009 at 10:06 PM
You're right; right now the kernel has only cooperative multitasking. But it will be preemptive quite soon. I am working with handling a SysTick interrupt which can call the scheduler (processes will be able to make delay system calls and interrupt handlers may send messages to processes that can wake up and preempt lower priority processes). I am currently trying to under the priority handling system of the Cortex-M3, including sub priorities and pre-emption priorities. I think I understand the basics, but not the details. Do you also code for the Cortex-M3?

I'll check out your blog!
ReplyDelete
Replies
Daniel RomaniukOctober 4, 2009 at 12:45 AM
I'm playing with an LPC2378-STK, based on the ARM7TDMI. I want to use assembler as much as possible, to learn it.

I found an interesting set of lecture and lab notes for a course where they develop an RTOS over one semester, here:
http://www.et.byu.edu/groups/ece425web/current/sched.html

I will use that as a structure, to not get too lost. Still just in the early stages though!
ReplyDelete
Replies
janviOctober 27, 2009 at 10:20 AM
Here is some HC08 code to translate into Cortex M3.

Time is considered as a system resource and therefore
required to be managed by the OS. Even in multitask,
application must not loop on itself to generate a wait
time. Most examples do this wrong. For compensation,
they need multitask OS what runs furhter tasks in
background while the waiting task wastes CPU time with
looping. If more than one time required conincidental,
the performance gets poor. The more elegant real time
solution is to increment only one time variable for
all times inside the system.

This can be done with the CM3 systick timer what
should be connected to to a system clock tictac:
interrupt service

tictac: ldhx stime ;16 bit system time
aix #1
sthx stime
rts

To start a timer there is a setflop: call
A contains the desired time, X points to the
timer varible (what does not need to be incremented)

sflop: add stime+1 ;actual time plus system time lowbyte
clrh ;8 bit version with 16 bit system time
sta 0,x ;store new reference in timer var
rts

To test if the time is over, there is a testflop: call.
At entry, X points to the timer variable to test. Return
value is sign flag what indicates if time is over or not.

tflop: clrh
lda stime+1
sub 0,x ;Lowbyte systemtime minus timervalue
rts

To use a time inside a task, simply start it

lda #timelength
ldx #timervariable
jsr sflop

now do anything else or goto context switch to do more
meaningfull things than looping in a timer loop.

To test, if the time is over, simply call tflop.

ldx #timervariable
jsr tflop
branchifplus timeover

If time is not over, do anything else or goto
context switch to do more meaningfull things than
looping in a timer loop.

This way, not a single CPU cycle is wasted with counting
several times in parallel and the amount of concurrent
timers is only limited by the available system RAM
instead of CPU time.

Note, that the system timer here is 16 bit, but timers
are only 8 bit. For 32 bit Cortex it would be ok to
implement 32 bit system time and 32 bit set/testflop.
Try to pass the negative flag to C Calls with GCC inline
macro like described in GCC chapter 5.34
Keep in mind, that the maximum time value for a 8 bit
timer is only 7 bit in length, for 16 bit timer it will be
15 bit (32768) and for 32 bit it will be 31 bit. The other
half of the time can be considered as a time buffer
resource if the task does not poll the timer immediately
after the time is over.

With other words: Many contemporary times elapse without
polling them. The context swith to a polling task can be
much later (the maximum timer length). Of coarse, if should
be held as small as possible to avoid the additional
time jitter and to keep the times as accurate as CPU time
allows. This can be done easily if all times are implemented
in this way. Good luck with your embedded spare time....
ReplyDelete
Replies

Add comment