# Module 07: Synthetic Monitoring Basics (Labs 10-13) Continuous Uptime Monitoring with Grafana SM
Navigate: [All Slides](../index.html) | [Prev: Cloud Integration](../06_Cloud_Integration/index.html) | [Next: Browser Testing](../08_Browser_Testing/index.html)
## What is Synthetic Monitoring? Continuous, scheduled checks from global probe locations that answer: **"Is my service reachable and working correctly right now?"** From the perspective of users around the world.
## SM vs Load Testing | | Grafana SM | k6 Load Testing | |---|---|---| | Question | Is it up? | How does it perform under load? | | Trigger | Scheduled (every N minutes) | On-demand (manual or CI) | | Virtual users | 1 | Many | | Duration | Seconds | Minutes to hours | | Goal | Detect outages | Find capacity limits | You use both: SM for always-on visibility, k6 for periodic performance analysis.
## DataDog Comparison Grafana Synthetic Monitoring = DataDog Synthetic API Tests Both run scheduled HTTP checks from global locations, both alert on failures, both display uptime percentages. **Key difference:** SM results live natively in Grafana alongside metrics, logs, and traces. No separate portal.
## Lab 10: SM Introduction In your Grafana Cloud stack, expand **Testing & synthetics → Synthetics** in the sidebar. SM sub-nav: - **Checks** — main list of all your monitors - **Probes** — global probe locations (AMER / APAC / EMEA) - **Alerts (Legacy)** — pre-built alerting rules; new work lives under **Alerts & IRM** - **Config** — plugin settings The fleet-level view lives on the **Synthetics** landing page itself.
## Creating Your First HTTP Check **Checks → + Create new check → API Endpoint** 5-step wizard: **Request → Uptime → Labels → Execution → Alerting** ```yaml Request Job name: Workshop Demo Request type: HTTP (radio; row also has Ping/DNS/TCP/Traceroute) Request target: https://grafana.com Uptime Valid status codes: 2xx (default) Execution Probe locations: North Virginia US, Frankfurt DE, Singapore SG Frequency: 1m pill ``` Use the right-hand **Test** panel for immediate execution without waiting.
## Understanding Check Results Check detail page shows: - **Uptime percentage** (100% after successful runs) - **Response time graph** by probe location - **Probe breakdown table** with per-location stats - **Error log** for failed attempts Geographic latency differences are visible: US East will be faster to grafana.com than Asia Pacific.
## Lab 11: HTTP Validation Beyond status code 200, add: - **Regexp validation** — assert body or header content with a regex (replaces old "Body Contains" / "Body Matches Regex" assertions) - **Valid status codes** range — 2xx by default - **TLS Validation** — detect certificate expiry (on by default) ```yaml Uptime step Valid status codes: 2xx Regexp validation: Source: Body Match condition: slideshow Invert: ✓ (fail unless body matches) Request → Request options → TLS Disable target certificate validation: unchecked (TLS validation on) ``` > Response-time pass/fail is gone — latency is a global threshold now, not a per-check assertion.
## Custom Request Headers ```yaml Header name: X-Workshop Header value: true ``` Use cases: - Bypass WAF rules with secret headers - Activate feature flags - Mark traffic in logs (filter synthetic requests from analytics)
## Check Status Logic A check is **Up** when ALL of the following pass: - HTTP connection established within timeout - TLS validation passes (if enabled) - HTTP response received - All assertions pass A check is **Down** when ANY fail: - Connection timeout or refused - TLS certificate invalid/expired - Any assertion fails
## Lab 12: DNS and TCP Checks **DNS Check** — verify hostname resolves to expected records ```yaml Target: grafana.com Record type: A Assertion: Answer contains grafana.com ``` **TCP Check** — verify port is open and accepting connections ```yaml Target: grafana.com:443 TLS: Enabled ```
## When to Use Each Check Type | Scenario | Check Type | |---|---| | Web pages and REST APIs | HTTP | | Multi-step user flow | Scripted (Lab 13) | | Hostname resolution | DNS | | DNS hijacking detection | DNS with assertions | | Non-HTTP services (databases, Redis, Kafka) | TCP | | TLS cert validity on non-HTTP port | TCP with TLS |
## DNS Check Results Results show: - **Reachability** — percentage that resolved - **Resolution time** — pure DNS lookup latency (5-50ms typical) - **Response by probe** — table per location Resolution time varies by location: Singapore probe querying US nameserver will be slower than Virginia.
## TCP Check Results For TCP checks with TLS: - **Connection time** — TCP three-way handshake (SYN, SYN-ACK, ACK) - **TLS handshake time** — time to negotiate TLS session - **Total duration** — TCP + TLS combined Typical for grafana.com:443: - TCP: 20-150ms - TLS: 30-200ms - Total: < 500ms
## Lab 13: Scripted Checks Upload a k6 script to run as a scheduled synthetic monitor. **Key insight:** The scripting language is exactly the same k6 JavaScript you've been writing. No separate DSL. Constraint: SM scripted checks run with 1 VU on a schedule — not load tests.
## Multi-Step Workflow Example ```javascript export default function () { // Step A: Verify headers endpoint const headersRes = http.get('https://httpbin.org/headers'); check(headersRes, { 'headers: status is 200': (r) => r.status === 200, }); sleep(1); // Step B: POST and verify echo const payload = JSON.stringify({ workflow: 'lab-13' }); const postRes = http.post('https://httpbin.org/post', payload); check(postRes, { 'post: body echoes payload': (r) => { return JSON.parse(r.body).json.workflow === 'lab-13'; }, }); } ```
## Testing Scripts Locally First Always validate before uploading: ```bash k6 run scripts/starters/lab-13-starter.js ``` Watch for check pass/fail counts. Fix failures locally — diagnosing through SM UI is slower. SM ignores `vus` and `duration` options. Keep them for local testing.
## Uploading to SM 1. Testing & synthetics → Synthetics → Checks → **+ Create new check** → **Scripted** 2. Paste full script content 3. Configure: - Job name - Frequency (e.g., 5 minutes) - Timeout (60s for multi-step scripts) - Probe locations (2-3 locations sufficient) Use **Test** button to verify script runs in SM environment.
## Scripted Check Best Practices Production-quality patterns: - **Groups** — `group('step-name', () => {...})` organizes output - **Custom Trend metrics** — track per-step latency independently - **try/catch/finally** — graceful error handling - **Descriptive check names** — `'step-name: what we expected'` Keep scripts under 30 seconds execution time.
## DataDog Comparison: Scripted Checks | DataDog | Grafana SM | |---|---| | Multistep API Test | Scripted Check | | JSON test definition or UI builder | k6 JavaScript | | Proprietary assertion syntax | Standard k6 `check()` calls | | Separate from load test tooling | Same language as k6 load tests | Significant advantage: shared k6 skills across load testing and synthetic monitoring.
## Key Takeaways - SM runs 1-VU scheduled checks from global probes; load testing runs many VUs to find limits - HTTP checks validate endpoints; DNS checks verify resolution; TCP checks test connectivity - Each probe location runs independently — response time graph shows geographic latency variation - Scripted checks use the exact same k6 JavaScript as local load tests - The **Test** button provides instant feedback during check configuration - SM results are native Grafana data: dashboards, alerting, SLOs work without export
# Lab Complete! Ready to explore browser testing
Navigate: [All Slides](../index.html) | [Prev: Cloud Integration](../06_Cloud_Integration/index.html) | [Next: Browser Testing](../08_Browser_Testing/index.html)