dynatrace-expert
Dynatrace Platform operations expertise — DQL queries, entity inventory, metrics analysis, problem triage, dashboard management, and Settings API for Grail-based tenants.
Dynatrace Expert Skill
Core Competencies
- DQL async query lifecycle — submit via POST, poll for results, parse JSON response
- Entity inventory — hosts, services, process groups via
fetch dt.entity.* - Infrastructure metrics — CPU, memory, disk timeseries with
timeseriescommand - Davis problem triage — query active/recent problems, correlate with entities
- Log analysis — search and filter log records by level, content, entity
- Settings API — read/write metric events, alerting profiles, maintenance windows
- Dashboard management — list, read, create, update via Documents API
- SLO monitoring — query SLO status and burn rates
Code Style & Conventions
- Always use
python3for JSON encoding/parsing (notjq) - Use
strict=Falsefor all JSON parsing of DQL responses (json.loads(data, strict=False)) - URL-encode request tokens before polling
- Use
Bearerauth (notApi-Token) - Present results as markdown tables
Common Patterns
DQL Submit / Poll
# Submit
RESPONSE=$(curl -s -X POST "${DT_PLATFORM_URL}/platform/storage/query/v1/query:execute" \
-H "Authorization: Bearer ${DT_API_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"query\": \"${DQL}\", \"maxResultRecords\": 1000}")
STATE=$(echo "$RESPONSE" | python3 -c "import sys,json; d=json.loads(sys.stdin.read(),strict=False); print(d.get('state',''))")
TOKEN=$(echo "$RESPONSE" | python3 -c "import sys,json; d=json.loads(sys.stdin.read(),strict=False); print(d.get('requestToken',''))")
# Poll (URL-encode the token)
ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('${TOKEN}', safe=''))")
RESULT=$(curl -s "${DT_PLATFORM_URL}/platform/storage/query/v1/query:poll?request-token=${ENCODED}" \
-H "Authorization: Bearer ${DT_API_TOKEN}")
MZ-Filtered Timeseries Lookup
timeseries avg(dt.host.cpu.usage), by:{dt.entity.host}
| lookup [fetch dt.entity.host | filter in(managementZones, "MY_MZ")],
sourceField:dt.entity.host, lookupField:id
| filter isNotNull(lookup.id)
Settings API Read
curl -s "${DT_PLATFORM_URL}/platform/classic/environment-api/v2/settings/objects?schemaIds=builtin:anomaly-detection.metric-events" \
-H "Authorization: Bearer ${DT_API_TOKEN}"
Documents API (Dashboard) — Multipart Parsing
Dashboard responses are multipart; parse with python3 to extract the JSON body from the multipart envelope.
Security Best Practices
- Never expose
DT_API_TOKENin output or logs - Validate token prefix (
dt0s16.for Platform,dt0c01.for classic) - Read-only by default; write operations require explicit approval
- No hardcoded entity IDs, MZ names, or AIDE IDs in scripts
When to Apply This Skill
- When working with Dynatrace monitoring data
- When querying infrastructure metrics or entity inventory
- When triaging Davis problems or reviewing logs
- When managing dashboards or alerting configuration
- When creating or modifying metric event rules or maintenance windows
Resources
Related Assets
Dynatrace Operations Agent
Autonomous Dynatrace Platform agent that executes DQL queries, reads settings, and runs diagnostic workflows against any Grail-based tenant. Discovers credentials automatically (env var, .dtenv file, or prompt), executes live API calls, and presents formatted results. Use for entity inventory, metrics analysis, problem triage, log review, and guided troubleshooting.
Owner: platform-infrastructure
Dynatrace Kubernetes Service Triage
Systematic triage of a Dynatrace-monitored Kubernetes service using DQL queries for entity discovery, JVM health, thread analysis, pod generation comparison, and Davis problem correlation. Produces structured root cause analysis with Splunk query handoffs for restricted log environments.
Owner: epic-platform-sre
dynatrace-k8s-triage
Systematic Kubernetes service triage using Dynatrace DQL — entity discovery, JVM health, thread analysis, pod generation comparison, Davis problem correlation, and Splunk SPL query generation for restricted log environments.
Owner: epic-platform-sre
Azure Resource Health Diagnosis
Analyze an Azure resource’s health, diagnose issues using logs and telemetry, and produce a remediation plan for identified problems.
Owner: epic-platform-sre
Spring Boot Container Crash Triage
Diagnose Spring Boot container crashes in Kubernetes by correlating Dynatrace JVM telemetry, pod lifecycle events, and deployment state. Covers rolling deployment failures, OOM kills, thread exhaustion, startup failures, and major framework upgrades.
Owner: epic-platform-sre
Azure Resource Troubleshooter
Goal-oriented Azure specialist that autonomously diagnoses and resolves Azure resource issues. Queries Azure APIs, analyzes logs, checks configurations, and provides actionable remediation steps. Use for infrastructure debugging and incident response.
Owner: platform-infrastructure

