Exercise 6.8 — Resilience Probe: Web + API + Redis

Exercises are guided practice for the lecture material. They are non-graded, but later graded assignments assume this work has been completed.

System Diagram

Web API Redis system diagram

Starter Code Download

Download Exercise 6.8 Starter Code

Why you are doing this (and why it matters later)

Lecture 08 introduces the core reliability distinction between process startup and service readiness. In real systems, dependencies restart and transient failures happen. This exercise practices designing /healthz and /readyz, observing Compose health states, and injecting failures so students can prove the system degrades gracefully and recovers without manual intervention.

This exercise makes us practice:

use health and readiness endpoints for machine-readable service state
interpret Docker Compose health transitions during boot and failures
simulate dependency disruption and validate retry/backoff behavior
collect evidence from logs and probes to locate the failing service

Starting Instructions

Download the Exercise 6.8 starter code archive from the course site and extract it.
Open a terminal in the extracted folder that contains compose.yml.
Run docker compose up -d --build to build and start web, api, and redis.
Run docker compose ps and confirm all services are up before starting probes.
Keep a second terminal ready for logs with docker compose logs -f web api redis.

Exercise Instructions

Part A (10-15 min): Baseline Health and Readiness

Probe api with curl http://localhost:3000/healthz and curl http://localhost:3000/readyz.
Probe web with curl http://localhost:3001 and record the response when the stack is healthy.
Capture one docker compose ps snapshot and identify health states (starting, healthy, or unhealthy).

Part B (15-20 min): Inject and Observe Failure

Inject disruption using docker compose restart redis.
While Redis restarts, repeatedly run curl http://localhost:3001 and curl http://localhost:3000/readyz.
Follow logs with docker compose logs -f web api redis and note exactly when symptoms begin and recovery occurs.
Describe whether failures are brief degradation (503) or persistent crash behavior.

Part C (15-20 min): Tune Recovery Behavior

In web/server.js, change retry parameters in withRetry usage to attempts = 6 and baseMs = 200.
Rebuild only web using docker compose up -d --build web.
Repeat the Redis restart experiment and compare user-visible failures before vs after tuning.
Write a short evidence-based conclusion about whether increased retry/backoff improved resilience.

AI Assist Ideas (Optional)

AI use is allowed for this exercise. Prompt log requirement: AI assistance is allowed, but you must log every prompt/response in prompt-log.md with tool name, exact prompt, and a short note describing how you used the result.

Suggested prompts (adapt to your exact code context):

Explain the withRetry function and predict total worst-case wait time for 6 attempts with 200ms base delay.
Given the readyz handler in the API service, what false-positive readiness risks still exist?
I observed intermittent 503 responses after restarting Redis with Docker Compose. Suggest a debugging checklist focused on logs and probe sequencing.

Target behavior:

/healthz remains 200 while service processes are alive
/readyz reflects dependency state and can return 503 during Redis outages
web may briefly degrade during Redis disruption but should recover without manual restarts
students can identify the offending dependency from probe outputs plus logs

Saving Your Work

Save only the files you changed for this exercise in your extracted exercise folder.
Keep notes and timeline evidence in exercise-notes.md (or an equivalent text file).
Keep at least one screenshot of docker compose ps and your before/after retry comparison for class discussion.

Verifying Your Work

Run docker compose ps and confirm healthy state after experiments complete.
Re-run curl checks for http://localhost:3000/healthz, http://localhost:3000/readyz, and http://localhost:3001.
Demonstrate that the stack recovers after docker compose restart redis.
Ensure your before/after comparison of retry settings is based on observed outputs, not assumptions.

Solution Walkthrough

Compare your probe order against the lecture's observability sequence: compose ps -> logs -> endpoint probes -> in-container checks.
Review web/server.js and api/server.js from your downloaded exercise code to explain why behavior changed.
Use your timeline to justify which service was the initial source of failure and why.

System Diagram​

Starter Code Download​

Why you are doing this (and why it matters later)​

Starting Instructions​

Exercise Instructions​

Part A (10-15 min): Baseline Health and Readiness​

Part B (15-20 min): Inject and Observe Failure​

Part C (15-20 min): Tune Recovery Behavior​

AI Assist Ideas (Optional)​

Saving Your Work​

Verifying Your Work​

Solution Walkthrough​