Systems
A mindmap of OS-related notes.
- Interupts
- Synchronous
- Called Exceptions
- Processor detected
- Faults (before execution of instruction)
- Traps (after execution of instruction)
- aborts
- Programmed
- int d
- Processor detected
- Called Exceptions
- Asynchronous
- Interrupts from external events, like I/O devices
- Can be controlled at:
- Device level
- APIC level (Advanced Programmable Interrupt Controller)
- CPU level
- cli (Clear interrupt flag)
- sti (set interrupt flag)
- Some can be masked, some not (NMI)
- Interrupt Descriptor Table
- When an interrupt happens
- CPU checks privilege level
- If needed, change to require privilege
- Save stack info
- Execute handler
- Return with IRET
- Top half, until IRET
- IRQ disabled for this duration
- Called the interrupt context
- Can be preempted by one interrupt, not exception
- Not allowed to context switch, sleep, schedule or access memory
- Bottom half, deferred execution
- Two categories
- Interrupt context deferrables
- Process context deferrables
- Reason: avoid running too long in interupt handler with irq disabled
- Three types
- SoftIRQs
- Interrupt context
- Statically allocated
- Same handler can run in parallel on many cores
- Tasklet
- Inerrup context
- Dynamically allocated
- Same handlers run serialized
- Workqueues
- Run in process context
- SoftIRQs
- Two categories
- Synchronous
- System calls
- Flow
- Set call number and parameters in registers
- Issue a trap
- Enter kernel context
- Kernel saves registers in stack
- Identify syscall from syscall table and run it
- Restore registers
- Max 6 parameters (set in registers)
- Retrieve result
- VDSO
- Optimize system call implementation by vsyscalls that run on user space
- Access VDSO page that is either static or modified by kernel
- E.g. getpid or gettimeofday
- Flow
- Booting
- BIOS / POST
- verify hardware integrity
- cpu registers, bios code, dma, interrupt controllers
- memory, bios init
- identify and select devices for booting
- Written in EEPROM
- verify hardware integrity
- Bios reads MBR
- MBR is 512 bytes, in the first sector of the selected boot disk
- Contains program and partiotion table details
- Last 2 bytes: 0xAA55, magic number
- Contains first stage loader, loads second stage
- Stage 1 boot loader
- Stage 2 boot loader
- Calls the kernel loader
- Kernel stage
- Kernel loaded in memory
- Image file containing basic root FS with kernel modules is loaded in memory. This is initrd.
- After kernel detectes hardware, the root file system on disk will take over the one from memory
- INIT (systemd)
- Read /etc/fsstab
- UEFI is next level
- Proves > 2TB partition size, has GUID Partition Table (GPT) instead of MBR
- Comes with its own FS (/EFI)
- Separate folder per system
- Systems provide their own bootloader
- BIOS / POST
- VFS
- read(), write(), open(), close(), stat(), lseek()
- inodes
- dentry
- file
- mounting, superblock operations
- Limits
- CGroups
- /sys/fs/cgroups API
- Applied to tasks
- Has controllers
- The core, pseudo-fs at /sys/fs/cgroup
- Other controlles
- Memory
- IO
- CPU
- Namespaces
- Used with unshare, nsenter
- Mount
- PID
- Network
- UTS (Hostname)
- IPC
- Cgroup
- Implemented as flags in clone() syscall
- Also unshare()
- Capabilities
- Set per thread, also to files
- Finer grained control that setuid-as-root
- CAP_IPC_LOCK
- CAP_MKNOD
- CAP_NET_RAW
- CAP_SYS_ADMIN
- shown in /proc/[pid]/status
- setrlimits()
- tc (qdisk - queue discipline)
- Overlap with Cgroups
- nice
- seccomp - limit system calls
- CGroups
- Network
- XDP and BPF
- Process RX before any sk_buffs are allocated or queues entered
- Not a Kernel bypass, integrated in the path
- actions
- Forward, drop, receive
- PF_RING or AF_PACKET, in contrast, bypasses the kernel
- Multiple receive queues in NICs
- Receive side scaling
- Remember to set IRQ affinity to all cpus
- Receive side scaling
- RPS, software implementation of RSS
- RPS doesn’t take into account applicatino locality
- RFS, use flow lookup table to steer packets to CPU where the flow is processed
- XPS, Transmit packet steering
- XDP and BPF