Scaling

Rough notes on high-scalability systems

Scalability
- Scalability is the capability of a system, process, or a network to grow and manage increased demand.
- Horizontal
  - Scaling up by adding more servers
  - Easier to adjust dynamically
- Vertical
  - Scaling up by beefing up server resources/power
Reliability
- The probability a system will fail in a given period
- Achieved through redundancy of both the software components and data.
Availability
- The time a system remains operational
- If a system is reliable, it is available.
- However, if it is available, it is not necessarily reliable.
Efficiency
- Response Time
- Throughput
Serviceability or Manageability
- Also Observability
Load Balancing
- Prevents single point of failure
- Improves overall application availability and response time
- Distributes Load, no single struggling server
- Provides analytics
- Handles failures
- Can add LBs in three places
  - Between the user and the web server
  - Between web servers and an internal platform layer, like application servers or cache servers
  - Between internal platform layer and database.
- How do they work
  - Health checks to backend servers
  - Methods
    - Round Robin
    - Weighted round robin
    - Least connections
    - Least response time
    - IP hash
- LBs must also be redundant
Caching
- LB helps scaling, but caching enables:
  - Vastly better use of resources
  - Implementing requirements that would otherwise be unattainable
  - Pros: Better for static data
  - Cons: Not good for user generated data, where freshness and consistencty is critical
- Assume 1M requests for 100 items
  - If we split 50 items to 2 servers, each will still have to deal with 1M requests
  - If we replicate 100 items to 2 servers, each will deal with 500K requests instead
- GET requests can be UDP, SET/DELETE can be TCP
- Application server cache
  - Could improve request/response time, but if we have scaled with many nodes that are behind a load balancer, we’ll have many cache misses, because requests will be routed to different nodes
- CDNs
  - Tip: use a different subdomain (cdn.facebook.com) for static content, will help later when passing DNS to your CDN
- Cache invalidation
  - Cache is amazing, but needs management to keep cache coherent with DB
  - If data is updated on DB, needs to be updated in cache.
    - Known as cache invalidation
- Types of Cache
  - Write-through
    - Write to cache and DB at the same time
    - More reliable, less data losses
    - But, with increased latency for write operations
  - Write-around
    - Bypass cache and write directly to DB
    - If data is being reread, cache miss will happen
  - Write-back
    - Data is written to cache and confirmed immediately to client
    - Data will be written to DB at a later time
    - Low latency and high thoughput for write-intensive apps
    - Comes with risk of data loss
- Cache Eviction policies
  - FIFO
  - LIFO
  - LRU
  - LFU
  - RR
Data Partitioning
- Breaking up big DB in smaller parts
- Easier to scale vertically than vertically after a point
- Types
  - Horizontal Partitioning / Sharding / Range based partitioning
    - Put ranges of rows in different tables
      - If we split across multiple servers, then we have sharding
    - Main problem is that if range isn’t chosen properly it’s lead to unbalanced servers
      - e.g. using zip code can result in unbalances (bigger vs smaller cities)
  - Vertical Partitioning
    - Different tables in different servers
    - e.g. for instagram, separate servers for
      - Profile info
      - Friend list
      - Photos
    - Main problem is that you might need to distribute to more servers if specific server grows big (e.g. billions of photos)
  - Federation
    - split into multiple DB based on function (product, user, forums DBs)
  - Directory based paritioning
    - Abstracts away partitioning details
- Partitioning Criteria
  - Key or Hash-based
    - 100 servers, ID % 100
    - Must ensure uniform allocation
    - Con: changinc hash function requires redistributing the data
  - List partitioning
    - Each parittion is assigned a key
    - In order to inser a record, we see which partition has the respective key
      - e.g. store Sweden, Nowray etc. in partition for Nordic countries
  - Round Robin paritioning
    - Ensures uniform distribution
  - Composite paritioning
    - Combine above methods
      - E.g. list partitioning then hash-based
- Problems with partitioning
  - Joins and Denormalization
    - Join needs to combine data from many servers
    - Can be mitigated with denormalization, but we need to deal with data inconsistency if we go down that path
  - Referential integrity
    - Data integrity constraints, like foreign keys, can be hard
    - Frequently not supported by RDBMS, need to be enforced in application code
  - Rebalancing
    - Needed when data distribution is not uniform, or if there’s a lot of load in one partition
    - Perdorming rebalancing is difficult to do without downtime
DB Indexing
- One of the first methods to consider when performance is no longer satisfactory
- Makes searches of records faster
- Slows down data insertion & update
- Can be created using one or more columns
- Makes sense for read-intensive, not for write intenstive apps
Proxy Servers
- Used to
  - Filter requests
  - Log
  - Transform requests
    - Add/remove headers
    - Encrypt/Decrypt
    - Compressing
  - Caching
Redundancy & Replication
- Redundancy is the duplication of critical components or functions of a system
- Goals
  - Increasing the reliability of the system, usually in the form of a backup
  - Improve actual system performance
- Without redundancy, losing a server means losing the file
- Replication means sharing info to ensure consistency between redundant resources
  - Widely used in RDBMS systems, usually with a master-slave relationships
  - Master gets all updates which then ripple to the slaves
SQL vs NoSQL
- In SQL, you design your schema, in noSQL, you design your Queries
- Relational DBs
  - Have predefined schema
  - Use rows and columns
  - Each row contains all info for entity
  - ACID compliant
    - Atomicity, Consistency, Isolation, Durability
      - Atomicity, each transaction is all or nothing
      - Consistency, any transaction will bring the database from one valid state to another
      - Isolation, executing transactions concurrently has the same results as if the transactions were executed serially
      - Durability, once a transaction has been committed, it will remain so
- Non-Relational DBs
  - Sacrifice ACID for scalability and performance
  - Unstructured, Distributed, with dynamic schema
  - Hold all the data
  - Types
    - Key-Value storage
      - Redis
    - Document DBs
      - Data stored in documents that are grouped in collections
      - Each can have entirely different structure
        
        MongoDB, CouchDB
    - Graph DBs
      - Store data whose relationship is better represented as a graph
      - Data saved in nodes, properties, lines
        
        InfiniteGraph, Neo4j
CAP Theorem
- Consistency, Availability, Partition tolerance
- Distributed systems can provide at most two guarantees ^
- Consistency
  - Every read receives the most recent write or error
  - Achieved by updating all nodes and then allow further reads
    - Sacrifices Availability
  - Different from Consistency as defined for ACID
- Availability
  - Every request receives a non-error response, w/o guarantee that it contains the most recent write
  - Every request gets a response, even if some nodes fail
- Partition tolerance
  - The system continues to function even if nodes can’t communicate between them
- When a network failure happens, we have to decide to:
  - 1. Cancel the operation, i.e. decrease availability but ensure consistency
  - 1. Proceed with the operation, i.e. provide availability w/o consistency
Consistent Hashing
- Use Binary Search Tree (support successor) to represent cache nodes
- Only k/n keys need to be moved (k: total keys, n: servers)
  - Contrast with normal hashing, all keys need to be moved
- DHT is one of the fundamendal components in distributed systems
- Problems with normal hashing (e.g. key % n)
  - Not horizontally scalable. It breaks when we add one more node
  - Might not be load balanced
- To reduce variance, we can use virtual copies of caching servers
  - k copies of each server will make Binary Search Tree k times bigger
Polling strategies
- AJAX Polling
  - Repeatedly poll server
- HTTP Long Polling
  - Like repeat polling, but wait and keep connection open
  - “Hanging GET”
- Websockets
  - Low overhead, real-time TCP connection with server
- Server-Sent Events (SSEs)
  - Persistent and long-term connection with server, which server uses to send data to client
Eventual Consistency
- Provide BASE, instead of ACID
  - Basically aVAILABLE, Soft state, Eventual consistency
Strategies
- Android and iPhone push services
- For 10k-100k users, consider using CDN for static content
  - Pull vs Push CDN
- For 500k users, consider breaking up service to microservices
  - Each microservice can be scaled independently
- For 1M users, consider LBs between tiers (web, app, DB)
- Use sql EXPLAIN to see how queries are running