Microcontroller Architecture

A microcontroller (MCU) is a self-contained computer on a single chip: CPU, memory, and peripherals integrated together. Unlike a microprocessor (which needs external RAM, ROM, and IO chips), an MCU runs standalone.

Why It Matters

Every embedded system starts with an MCU. Understanding its internal architecture — bus structure, memory map, clock tree, boot sequence — determines whether you can debug a hardfault, optimize power consumption, or choose the right chip for a project.

How It Works

Harvard vs Von Neumann

Harvard (most MCUs):              Von Neumann:
  CPU                               CPU
  / \                                |
 /   \                          Single Bus
Instr  Data                      /       \
Bus    Bus                    Instr     Data
 |      |                    (shared memory)
Flash  SRAM

ARM Cortex-M uses a modified Harvard: separate instruction and data buses internally (ICode, DCode), but a unified address space so code and data share one memory map. This gives the speed benefit of Harvard with the flexibility of Von Neumann.

Memory Map (STM32F4 example)

0xFFFF_FFFF ┌──────────────────┐
            │  Cortex-M4 core  │  SysTick, NVIC, SCB, MPU
0xE000_0000 ├──────────────────┤
            │  (reserved)      │
0x5000_0000 ├──────────────────┤
            │  AHB2 peripherals│  USB OTG, RNG
0x4002_0000 ├──────────────────┤
            │  AHB1 peripherals│  GPIO, DMA, RCC
0x4001_0000 ├──────────────────┤
            │  APB2 peripherals│  SPI1, USART1, TIM1, ADC
0x4000_0000 ├──────────────────┤
            │  APB1 peripherals│  TIM2-7, SPI2/3, USART2/3, I2C
0x2002_0000 ├──────────────────┤
            │  SRAM (112-192KB)│  Variables, stack, heap
0x2000_0000 ├──────────────────┤
            │  (reserved)      │
0x0810_0000 ├──────────────────┤
            │  Flash (512KB-1M)│  Program code, constants, vector table
0x0800_0000 ├──────────────────┤
            │  Aliased to Flash│  Boot region (remappable)
0x0000_0000 └──────────────────┘

All peripherals are accessed through Memory-Mapped IO — reading/writing specific addresses controls hardware.

Bus Architecture

              ┌──────┐
              │ CPU  │
              └──┬───┘
       ICode    DCode   System
       Bus      Bus     Bus
        │        │       │
   ┌────┴────────┴───────┴────┐
   │       Bus Matrix (AHB)   │  <- arbitrates CPU + DMA access
   └──┬──────┬──────┬─────┬───┘
      │      │      │     │
    Flash  SRAM   AHB1  AHB2
                   │
              ┌────┴────┐
              │APB bridge│  <- AHB-to-APB (slower peripherals)
              └──┬───┬──┘
               APB1 APB2

AHB (Advanced High-performance Bus) runs at system clock speed. APB (Advanced Peripheral Bus) runs at a divided clock — APB1 typically at half the system clock, APB2 at full. This matters when calculating timer prescalers and baud rates.

Clock Tree and PLL

The clock tree generates all system clocks from a source oscillator:

HSI (16 MHz internal RC) --+
                            +--> PLL --> SYSCLK (up to 168 MHz)
HSE (8 MHz crystal) -------+      |
                                   +--> AHB prescaler --> HCLK --> CPU, DMA
                                   +--> APB1 prescaler --> PCLK1 (max 42 MHz)
                                   +--> APB2 prescaler --> PCLK2 (max 84 MHz)

PLL multiplies the input: SYSCLK = (HSE / M) * N / P. For STM32F4 with 8 MHz crystal: (8/8) * 336 / 2 = 168 MHz.

Lower clock = less power. Many MCUs support dynamic clock switching for power optimization.

Boot Sequence

  1. Power-on reset releases the CPU
  2. CPU reads initial stack pointer from address 0x0000_0000 (vector table entry 0)
  3. CPU reads reset handler address from 0x0000_0004 (vector table entry 1)
  4. Reset handler runs: sets up clock (PLL), initializes .data section from Flash to SRAM, zeroes .bss section, calls main()
  5. If using a bootloader (DFU, custom), it runs first and may jump to application code at a different Flash offset

ARM Cortex-M Specifics

Most modern MCUs use ARM Cortex-M cores:

CorePipelineFPUDSPUse case
M0/M0+2-stageNoNoUltra-low-power, simple
M33-stageNoYesGeneral purpose
M43-stageSP floatYesSignal processing, motor control
M76-stageDP floatYesHigh-performance, cache
M333-stageSP floatYesM4 + TrustZone security

Key hardware blocks in the core:

  • NVIC: Nested Vectored Interrupt Controller — handles all interrupt priorities and nesting
  • SysTick: 24-bit down-counter, standard across all Cortex-M, used for RTOS tick or simple delays
  • SCB: System Control Block — fault status, vector table offset, sleep modes
  • MPU: Memory Protection Unit (optional) — enforces access rules, useful with RTOS to isolate tasks

Common MCU Families

FamilyArchitectureNotes
STM32ARM Cortex-MLargest ecosystem, hundreds of variants
ESP32Xtensa/RISC-VWiFi + BT built-in, popular for IoT
nRF52ARM Cortex-M4BLE focused, Nordic Semiconductor
RP2040ARM Cortex-M0+Raspberry Pi Pico, programmable IO (PIO)
AVR8-bit AVRArduino legacy, still used for simple projects

Code Example

// Clock setup: configure PLL for 168 MHz on STM32F4
// HSE = 8 MHz crystal, target SYSCLK = 168 MHz
RCC->CR |= RCC_CR_HSEON;                     // enable external crystal
while (!(RCC->CR & RCC_CR_HSERDY));          // wait for HSE ready
 
RCC->PLLCFGR = (8  << RCC_PLLCFGR_PLLM_Pos) // M = 8  -> PLL input = 1 MHz
             | (336 << RCC_PLLCFGR_PLLN_Pos) // N = 336 -> VCO = 336 MHz
             | (0  << RCC_PLLCFGR_PLLP_Pos)  // P = 2  -> SYSCLK = 168 MHz
             | RCC_PLLCFGR_PLLSRC_HSE;       // PLL source = HSE
 
RCC->CR |= RCC_CR_PLLON;                     // enable PLL
while (!(RCC->CR & RCC_CR_PLLRDY));          // wait for PLL lock
 
// Set flash wait states for 168 MHz (5 WS)
FLASH->ACR = FLASH_ACR_LATENCY_5WS | FLASH_ACR_PRFTEN | FLASH_ACR_ICEN;
 
// Set bus prescalers: AHB/1, APB1/4 (42MHz), APB2/2 (84MHz)
RCC->CFGR = RCC_CFGR_HPRE_DIV1 | RCC_CFGR_PPRE1_DIV4 | RCC_CFGR_PPRE2_DIV2;
 
// Switch system clock to PLL
RCC->CFGR |= RCC_CFGR_SW_PLL;
while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_PLL);