Microcontroller Architecture
A microcontroller (MCU) is a self-contained computer on a single chip: CPU, memory, and peripherals integrated together. Unlike a microprocessor (which needs external RAM, ROM, and IO chips), an MCU runs standalone.
Why It Matters
Every embedded system starts with an MCU. Understanding its internal architecture — bus structure, memory map, clock tree, boot sequence — determines whether you can debug a hardfault, optimize power consumption, or choose the right chip for a project.
How It Works
Harvard vs Von Neumann
Harvard (most MCUs): Von Neumann:
CPU CPU
/ \ |
/ \ Single Bus
Instr Data / \
Bus Bus Instr Data
| | (shared memory)
Flash SRAM
ARM Cortex-M uses a modified Harvard: separate instruction and data buses internally (ICode, DCode), but a unified address space so code and data share one memory map. This gives the speed benefit of Harvard with the flexibility of Von Neumann.
Memory Map (STM32F4 example)
0xFFFF_FFFF ┌──────────────────┐
│ Cortex-M4 core │ SysTick, NVIC, SCB, MPU
0xE000_0000 ├──────────────────┤
│ (reserved) │
0x5000_0000 ├──────────────────┤
│ AHB2 peripherals│ USB OTG, RNG
0x4002_0000 ├──────────────────┤
│ AHB1 peripherals│ GPIO, DMA, RCC
0x4001_0000 ├──────────────────┤
│ APB2 peripherals│ SPI1, USART1, TIM1, ADC
0x4000_0000 ├──────────────────┤
│ APB1 peripherals│ TIM2-7, SPI2/3, USART2/3, I2C
0x2002_0000 ├──────────────────┤
│ SRAM (112-192KB)│ Variables, stack, heap
0x2000_0000 ├──────────────────┤
│ (reserved) │
0x0810_0000 ├──────────────────┤
│ Flash (512KB-1M)│ Program code, constants, vector table
0x0800_0000 ├──────────────────┤
│ Aliased to Flash│ Boot region (remappable)
0x0000_0000 └──────────────────┘
All peripherals are accessed through Memory-Mapped IO — reading/writing specific addresses controls hardware.
Bus Architecture
┌──────┐
│ CPU │
└──┬───┘
ICode DCode System
Bus Bus Bus
│ │ │
┌────┴────────┴───────┴────┐
│ Bus Matrix (AHB) │ <- arbitrates CPU + DMA access
└──┬──────┬──────┬─────┬───┘
│ │ │ │
Flash SRAM AHB1 AHB2
│
┌────┴────┐
│APB bridge│ <- AHB-to-APB (slower peripherals)
└──┬───┬──┘
APB1 APB2
AHB (Advanced High-performance Bus) runs at system clock speed. APB (Advanced Peripheral Bus) runs at a divided clock — APB1 typically at half the system clock, APB2 at full. This matters when calculating timer prescalers and baud rates.
Clock Tree and PLL
The clock tree generates all system clocks from a source oscillator:
HSI (16 MHz internal RC) --+
+--> PLL --> SYSCLK (up to 168 MHz)
HSE (8 MHz crystal) -------+ |
+--> AHB prescaler --> HCLK --> CPU, DMA
+--> APB1 prescaler --> PCLK1 (max 42 MHz)
+--> APB2 prescaler --> PCLK2 (max 84 MHz)
PLL multiplies the input: SYSCLK = (HSE / M) * N / P. For STM32F4 with 8 MHz crystal: (8/8) * 336 / 2 = 168 MHz.
Lower clock = less power. Many MCUs support dynamic clock switching for power optimization.
Boot Sequence
- Power-on reset releases the CPU
- CPU reads initial stack pointer from address
0x0000_0000(vector table entry 0) - CPU reads reset handler address from
0x0000_0004(vector table entry 1) - Reset handler runs: sets up clock (PLL), initializes
.datasection from Flash to SRAM, zeroes.bsssection, callsmain() - If using a bootloader (DFU, custom), it runs first and may jump to application code at a different Flash offset
ARM Cortex-M Specifics
Most modern MCUs use ARM Cortex-M cores:
| Core | Pipeline | FPU | DSP | Use case |
|---|---|---|---|---|
| M0/M0+ | 2-stage | No | No | Ultra-low-power, simple |
| M3 | 3-stage | No | Yes | General purpose |
| M4 | 3-stage | SP float | Yes | Signal processing, motor control |
| M7 | 6-stage | DP float | Yes | High-performance, cache |
| M33 | 3-stage | SP float | Yes | M4 + TrustZone security |
Key hardware blocks in the core:
- NVIC: Nested Vectored Interrupt Controller — handles all interrupt priorities and nesting
- SysTick: 24-bit down-counter, standard across all Cortex-M, used for RTOS tick or simple delays
- SCB: System Control Block — fault status, vector table offset, sleep modes
- MPU: Memory Protection Unit (optional) — enforces access rules, useful with RTOS to isolate tasks
Common MCU Families
| Family | Architecture | Notes |
|---|---|---|
| STM32 | ARM Cortex-M | Largest ecosystem, hundreds of variants |
| ESP32 | Xtensa/RISC-V | WiFi + BT built-in, popular for IoT |
| nRF52 | ARM Cortex-M4 | BLE focused, Nordic Semiconductor |
| RP2040 | ARM Cortex-M0+ | Raspberry Pi Pico, programmable IO (PIO) |
| AVR | 8-bit AVR | Arduino legacy, still used for simple projects |
Code Example
// Clock setup: configure PLL for 168 MHz on STM32F4
// HSE = 8 MHz crystal, target SYSCLK = 168 MHz
RCC->CR |= RCC_CR_HSEON; // enable external crystal
while (!(RCC->CR & RCC_CR_HSERDY)); // wait for HSE ready
RCC->PLLCFGR = (8 << RCC_PLLCFGR_PLLM_Pos) // M = 8 -> PLL input = 1 MHz
| (336 << RCC_PLLCFGR_PLLN_Pos) // N = 336 -> VCO = 336 MHz
| (0 << RCC_PLLCFGR_PLLP_Pos) // P = 2 -> SYSCLK = 168 MHz
| RCC_PLLCFGR_PLLSRC_HSE; // PLL source = HSE
RCC->CR |= RCC_CR_PLLON; // enable PLL
while (!(RCC->CR & RCC_CR_PLLRDY)); // wait for PLL lock
// Set flash wait states for 168 MHz (5 WS)
FLASH->ACR = FLASH_ACR_LATENCY_5WS | FLASH_ACR_PRFTEN | FLASH_ACR_ICEN;
// Set bus prescalers: AHB/1, APB1/4 (42MHz), APB2/2 (84MHz)
RCC->CFGR = RCC_CFGR_HPRE_DIV1 | RCC_CFGR_PPRE1_DIV4 | RCC_CFGR_PPRE2_DIV2;
// Switch system clock to PLL
RCC->CFGR |= RCC_CFGR_SW_PLL;
while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_PLL);Related
- Memory-Mapped IO — how peripheral registers are accessed
- GPIO and Digital IO — first peripheral to learn
- Interrupts and Timers — NVIC and timer peripherals
- Pointers and Memory — C-level view of memory access
- How Computers Execute Code — general CPU execution model
- Digital Logic — gates and flip-flops that build up to an MCU