Thursday, September 10, 2009

Some things have happened.

I have to apologize for the very bad update ratio on the blog. Hopefully someone will return to it anyway!

Since my last post, some 3 months ago, the kernel now implements 5 different system calls. These are:
  • yield
  • allocate a buffer
  • transmit a buffer to another process
  • receive a buffer from some other process
  • dispose of a buffer
The kernel is now a very simple message passing based one. It also handles different priorities, so that at every moment, the highest-prioritized ready process will run. Processes can now be written in C, as system call library functions are provided.

A couple of months ago, someone asked about a description of the entire process from booting up to scheduling. I will attempt to describe a few important steps:

First of all, the kernel and applications only run from RAM. Some basic support for running from ROM is implemented in the kernel and in the BSP (startup routines). There should not really need to be a lot of differences between these two cases and support for running from ROM will be there in time.

The way I run the system right now is using OpenOCD and GDB to download an elf file to the board. The elf file contains the monolith consisting of the kernel, the BSP and the application, everything linked to run from RAM. After downloading, I reset the board and let it run.

The first thing that happens is that the CPU is reset, which means that it will use the default exception vector location (which is at address 0x00000000, i.e. in flash) to find the reset routine. I have (once and for all) put a small assembly routine in flash, which is pointed out by the reset vector. This routine reads the desired reset vector from RAM (at 0x20000004), the initial main stack pointer (at 0x20000000) and simply jumps to the boot routine pointed out by the address in 0x20000004, which is the BSP startup. This bootstrapping assembly routine is not part of the OS, nor the BSP.

The BSP startup routine first of all sets up the STM32 clocks, so that the CPU core runs at 72 MHz (SYSCLK) which is generated from an 8 MHz external crystal which is multiplied by 9 using a PLL. AHB and APB2 clocks are also set up properly.

After setting up all clocks, the BSS segment is cleared. The area to be cleared is obtained by getting the addresses of the _bss_start and _bss_end symbols. These symbols are defined in the linker command file which is supplied by the BSP.

Next, by writing to the NVIC_VTOR register, the exception table location is moved to RAM (where it already is loaded). The GPIO module is set up to allow flashing the status LED.

After that, the BSP is almost finished setting up the board. There is one more thing to consider before handing over to the kernel.

All kernel setup processes (kernel initialization, process creation) must run from handler mode. I'll get back to the reason for that later. The default CPU mode after reset is thread mode, and the only way to switch to handler mode is through an exception. Therefore, the BSP writes the address of the kernel startup routine,
to the SVC (system call) vector in RAM and triggers it by issuing an svc instruction. This results in a switch to handler mode and a jump to the kernel.

The kernel startup routine clears the kernel pool area, then calls a hook which the application is supposed to implement. This hook function should do a number of rtos_create_process calls. This call allows the application to specify a process entry point, a priority and a stack size. For each call, a PCB is created in the kernel pool area, as well as a stack. The PCB is sorted into a readylist. After all these calls, the kernel is ready to start the highest prioritized process.

I mentioned earlier that the kernel startup must run from handler mode. The reason for this is that when the first process is scheduled, it will be started in the same way as if it was resumed after preemption. Therefore, for each created process, the architecture specific code is called to prepare the process stack in a way such that returning to the process and resuming it actually starts the process from the first instruction and with an empty stack.

When an application issues a system call, this is done by putting the syscall id (an integer, ranging from 0 to 4) in register r12 (done by an inline assembly routine). The scratch registers r0-r3 are transparently sent from the function call to the svc exception handler. The exception handler uses r12 as an offset into a jump table. Then a jump is taken to the system call implementation.

No interrupt handling is implemented yet, i.e. I haven't had to deal with critical regions in the kernel yet. It is still to be decided for instance what exception priority levels will be used for the SVC call in relation to other exceptions (interrupts from peripherals).

There is yet some more to be written about for instance context switching using the PendSV exception. I'll return to that later!

Wednesday, June 10, 2009

Flash a LED!

Yesterday, I configured GPIO port C to be able to flash the status LED. I now have two processes yielding to each other. One of them turns on the LED, the other turns it off. Here is one of them. It writes to the GPIO C BSRR (bit set/reset register) to set port C bit 12 high, which turns the LED off.


.global CM3_proc0
.thumb_func

CM3_proc0:
ldr r1, =0x40011010 @ PORT C BSRR register.
mov r2, 0x00001000 @ Set bit 12.
str r2, [r1]
mov r0, 0x1 @ Yield system call.
svc @ Perform the system call.
mov r0, 0x800000 @ Delay loop.
loop0:
sub r0, 0x1
cmp r0, 0x0
bne loop0
b CM3_proc0


Tuesday, May 26, 2009

A Fablab opens up in Malmö!

A couple of days ago I read about the Fablab opening up soon in Malmö in the southern part of Sweden. I am really happy about this. At least a couple of my sparetime electronic projects have ended up as a PCB without a box or any mechanics at all. For instance, my MP3 player looks like this:


When I designed the PCB, I was planning to design and build a box afterwards, but I realize I'll never get down to actually doing that, drilling and sawing acrylic plastic is just too cumbersome. With a fablab, I will be able to design and mill my own boxes and mechanics without an enormous budget.

Be sure to check in http://www.fablab.se/.

Wednesday, May 13, 2009

First successful context switch!

A couple of days ago, I ran my first fully working context switches. I have two small test processes that actually just yield to each other:

/* Test processes. */
__attribute__ ((naked)) static void proc0()
{
for (;;)
{
asm("mov r0, 0x1\n\t"
"svc\n\t");
}
}

__attribute__ ((naked)) static void proc1()
{
for (;;)
{
asm("mov r0, 0x1\n\t"
"svc\n\t");
}
}



The code is really an ugly mix of C and assembly, and it seems like the __attribute__ ((naked)) thing completely fools gdb as well. If i put a breakpoint on one of these two functions gdb does not put it on the first instruction, as you would probably want it to. Instead, it puts it one instruction after the first instruction. My guess is that gdb believes the first instruction to be a preamble for setting up the stack frame. But we don't have any stack frames here, due to the special attribute used. The workaround is to put a breakpoint on the address of the first instruction.

The processes put 1 in r0, which means call the yield system call, which in turns causes the simple scheduler to switch to the other process.

Thursday, May 7, 2009

A few shots of the development board.

Yesterday, I took a few pictures of the development board and the OCD module I use.

This is the STM32-P103 from Olimex. It features an STM32F103RBT6 µC from ST Microelectronics. It is based on the Cortex M3 architecture from ARM. The 32-bit CPU runs at up to 72 MHz. Equipped with 128 KiB of flash and 20 KiB of SRAM, it should be enough for many fun projects:



A view from the bottom side, showing the SD/MMC card connector:



The ARM-USB-OCD adapter that I use to hook the board up to my Linux laptop:

Monday, May 4, 2009

stmdb, ldmdb, stmfd etc...

It seems that I am getting quite close to my first working context switch right now. As recommended for RTOS:es on Cortex-M3, my context switches actually takes place in the PendSV handler which looks like this:

    .global    CM3_handler_pendsv
.thumb_func
.extern current_pcb
.extern new_pcb
CM3_handler_pendsv:
mrs r12, PSP @ Get PSP for current process.
stmfd r12!, {r4-r11} @ Save remaining registers.
ldr r0, =current_pcb @ r0 = &current_pcb.
ldr r0, [r0] @ r0 = current_pcb
str r12, [r0, 8] @ Update SP in PCB.
ldr r0, =new_pcb @ r0 = &new_pcb
ldr r0, [r0] @ r0 = new_pcb
ldr r12, [r0, 8] @ r12 = SP for new process.
ldmfd r12!, {r4-r11} @ Restore r4-r11 for new process.
msr PSP, r12 @ Update SP for new process.
ldr lr, =0xfffffffd @ Use process stack when returning.
bx lr @ Return to new process.



Make sure to use the proper variants of the stm and ldm instructions.The fd suffix stands for full descending which means that the stack pointer decreases before data is written to the stack using it. Therefore, the stack is always full ie. the stack pointer always points at valid data. When reading from the stack, the stack pointer increases after the reading operation.

Tuesday, April 28, 2009

Fighting the GNU as assembler (contd.)

Okay, yesterday I tested the RTOS prototype with the exception vectors put in an assembly language file instead. It did not work quite as expected...

To begin with, I should tell you that my code is linked to be run from the embedded SRAM in the STM32. This is because I think it is prettier to not have to erase and reprogram the flash all the time. On the other hand, the Cortex-M3 always has it's exception vectors placed at 0x0 out of reset, and that is within the flash area. That means that following a reset, the CPU will always start executing from the reset vector at 0x4. It is possible to relocate the exception vectors to 0x20000000 (SRAM) by modifying the vector table offset register once you're running and thereby pointing out your own vectors placed in SRAM.

To be able to practically run my code from SRAM I implemented a minimal bootstrap routine that I put into the flash area:

static void reset();

unsigned int *vectors[2] __attribute__ ((section(".vectors"))) = {
(unsigned int *) 0, /* No stack used. */
(unsigned int *) reset
};

__attribute__ ((naked)) static void reset()
{
/* We enter here, running as privileged in thread mode. TRM 2.2.
We use SP_main. TRM 2.2.1 and 5.4.
NVIC interrupts disabled. NMI and Hard Fault disabled. TRM 5.9. */

/* Remap vectors to 0x20000000.
Read stack top and entry point from
user's RAM-based vectors. Jump to
entry point. */
asm("mov r0, #0x20000000\n\t"
"ldr r1, =0xE000ED08\n\t"
"str r0, [r1]\n\t"
"ldr sp, [r0]\n\t"
"ldr pc, [r0, #4]");
}

TRM refers to the Cortex-M3 Technical Reference Manual. The reset routine reads the initial stack pointer and SRAM-based boot routine address from the user's exception vector at 0x20000000. It set the vector table offset register and jumps to the user's reset routine. This way, it appears like the board boots from SRAM.

So first, load your code including your exception vectors to SRAM, then issue a system reset to the board.

The problem I ran into yesterday was when performing an svc (system call). The CPU properly jumped to my exception handler:

  .global  CM3_handler_svc
.extern portable_syscall
CM3_handler_svc:
bl portable_syscall
bx lr

Problem was that when executing the first instruction (the bl), an exception occured. I noticed that the T-bit in the xPSR register was cleared when taking the svc exception. Executing instructions in the Cortex-M3 with the T-bit (thumb mode) cleared is not allowed. The fix was to add the pseudo-op .thumb_func before the exception handler code. That way, the exception table entry corresponding to svc sets the LSB of the jump address and the T-bit gets set when running the svc handler. Haven't tested this yet, but it should work.

Monday, April 27, 2009

Fighting the GNU as assembler.

For my RTOS experiments, I need vectors to be nicely linked into the binary, which I download to the Cortex M3 target, being an STM32 development board from Olimex.

To start with, I used to put exception vectors in the C source file, in the following way:


/* Vector Table */
unsigned int *vectors[12]
__attribute__ ((section(".vectors")))= {
(unsigned int *) START_STACK_TOP,
(unsigned int *) boot_rtos,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) 0x00000000,
(unsigned int *) CM3_SVC
};


That way, I tell gcc to put the array of vectors in the .vectors section. As I use OpenOCD to communicate with the board, using the ARM-USB-OCD, I can download the executable directly from gdb, using the "load" command.

However, I would like to have the vectors defined in an assembly language source file, together with a few of my Cortex M3 specific routines (such as system call handling etc). Therefore I tried to put the vectors in the .s file:

    .section        .vectors,"aw"
.global CM3_vectors
.extern boot_rtos
CM3_vectors:
.word START_STACK_TOP
.word boot_rtos
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word CM3_handler_svc
.word 0
.word 0
.word CM3_handler_pendsv


However, I initially did not use the "aw" attribute to the .section operator. This resulted in the vectors area being considered a debug symbol when using nm. Therefore, gdb did not download it to the board.

Using the -S flag to gcc, I could examine the assembly output from gcc and that way find out that I should use the "aw" attribute. Apparently GNU as does not apply any default attributes on a section called .vectors.

First entry...

Hello!

I intend to write a few lines now and then about my electronics projects. Hopefully, it will be of interest to someone! :-)