A mindmap of OS-related notes.

  • Interupts
    • Synchronous
      • Called Exceptions
        • Processor detected
          • Faults (before execution of instruction)
          • Traps (after execution of instruction)
          • aborts
        • Programmed
          • int d
    • Asynchronous
      • Interrupts from external events, like I/O devices
    • Can be controlled at:
      • Device level
      • APIC level (Advanced Programmable Interrupt Controller)
      • CPU level
        • cli (Clear interrupt flag)
        • sti (set interrupt flag)
    • Some can be masked, some not (NMI)
    • Interrupt Descriptor Table
    • When an interrupt happens
      • CPU checks privilege level
      • If needed, change to require privilege
      • Save stack info
      • Execute handler
    • Return with IRET
    • Top half, until IRET
      • IRQ disabled for this duration
      • Called the interrupt context
      • Can be preempted by one interrupt, not exception
      • Not allowed to context switch, sleep, schedule or access memory
    • Bottom half, deferred execution
      • Two categories
        • Interrupt context deferrables
        • Process context deferrables
      • Reason: avoid running too long in interupt handler with irq disabled
      • Three types
        • SoftIRQs
          • Interrupt context
          • Statically allocated
          • Same handler can run in parallel on many cores
        • Tasklet
          • Inerrup context
          • Dynamically allocated
          • Same handlers run serialized
        • Workqueues
          • Run in process context
  • System calls
    • Flow
      • Set call number and parameters in registers
      • Issue a trap
      • Enter kernel context
      • Kernel saves registers in stack
      • Identify syscall from syscall table and run it
      • Restore registers
    • Max 6 parameters (set in registers)
      • Retrieve result
    • VDSO
      • Optimize system call implementation by vsyscalls that run on user space
      • Access VDSO page that is either static or modified by kernel
      • E.g. getpid or gettimeofday
  • Booting
    • BIOS / POST
      • verify hardware integrity
        • cpu registers, bios code, dma, interrupt controllers
        • memory, bios init
        • identify and select devices for booting
      • Written in EEPROM
    • Bios reads MBR
      • MBR is 512 bytes, in the first sector of the selected boot disk
      • Contains program and partiotion table details
        • Last 2 bytes: 0xAA55, magic number
        • Contains first stage loader, loads second stage
    • Stage 1 boot loader
    • Stage 2 boot loader
      • Calls the kernel loader
    • Kernel stage
      • Kernel loaded in memory
      • Image file containing basic root FS with kernel modules is loaded in memory. This is initrd.
      • After kernel detectes hardware, the root file system on disk will take over the one from memory
    • INIT (systemd)
      • Read /etc/fsstab
    • UEFI is next level
      • Proves > 2TB partition size, has GUID Partition Table (GPT) instead of MBR
      • Comes with its own FS (/EFI)
        • Separate folder per system
        • Systems provide their own bootloader
  • VFS
    • read(), write(), open(), close(), stat(), lseek()
    • inodes
    • dentry
    • file
    • mounting, superblock operations
  • Limits
    • CGroups
      • /sys/fs/cgroups API
      • Applied to tasks
      • Has controllers
        • The core, pseudo-fs at /sys/fs/cgroup
        • Other controlles
          • Memory
          • IO
          • CPU
    • Namespaces
      • Used with unshare, nsenter
      • Mount
      • PID
      • Network
      • UTS (Hostname)
      • IPC
      • Cgroup
      • Implemented as flags in clone() syscall
      • Also unshare()
    • Capabilities
      • Set per thread, also to files
      • Finer grained control that setuid-as-root
      • CAP_IPC_LOCK
      • CAP_MKNOD
      • CAP_NET_RAW
      • CAP_SYS_ADMIN
      • shown in /proc/[pid]/status
    • setrlimits()
    • tc (qdisk - queue discipline)
      • Overlap with Cgroups
    • nice
    • seccomp - limit system calls
  • Network
    • XDP and BPF
      • Process RX before any sk_buffs are allocated or queues entered
      • Not a Kernel bypass, integrated in the path
      • actions
        • Forward, drop, receive
      • PF_RING or AF_PACKET, in contrast, bypasses the kernel
    • Multiple receive queues in NICs
      • Receive side scaling
        • Remember to set IRQ affinity to all cpus
    • RPS, software implementation of RSS
    • RPS doesn’t take into account applicatino locality
      • RFS, use flow lookup table to steer packets to CPU where the flow is processed
    • XPS, Transmit packet steering