Skip to content

dynatrace-expert

Dynatrace Platform operations expertise — DQL queries, entity inventory, metrics analysis, problem triage, dashboard management, and Settings API for Grail-based tenants.

active
IDE:
codex
Version:
1.0.0
Owner:platform-infrastructure
dynatrace
monitoring
observability
dql
grail
infrastructure

Dynatrace Expert Skill

Core Competencies

  • DQL async query lifecycle — submit via POST, poll for results, parse JSON response
  • Entity inventory — hosts, services, process groups via fetch dt.entity.*
  • Infrastructure metrics — CPU, memory, disk timeseries with timeseries command
  • Davis problem triage — query active/recent problems, correlate with entities
  • Log analysis — search and filter log records by level, content, entity
  • Settings API — read/write metric events, alerting profiles, maintenance windows
  • Dashboard management — list, read, create, update via Documents API
  • SLO monitoring — query SLO status and burn rates

Code Style & Conventions

  • Always use python3 for JSON encoding/parsing (not jq)
  • Use strict=False for all JSON parsing of DQL responses (json.loads(data, strict=False))
  • URL-encode request tokens before polling
  • Use Bearer auth (not Api-Token)
  • Present results as markdown tables

Common Patterns

DQL Submit / Poll

# Submit
RESPONSE=$(curl -s -X POST "${DT_PLATFORM_URL}/platform/storage/query/v1/query:execute" \
  -H "Authorization: Bearer ${DT_API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{\"query\": \"${DQL}\", \"maxResultRecords\": 1000}")

STATE=$(echo "$RESPONSE" | python3 -c "import sys,json; d=json.loads(sys.stdin.read(),strict=False); print(d.get('state',''))")
TOKEN=$(echo "$RESPONSE" | python3 -c "import sys,json; d=json.loads(sys.stdin.read(),strict=False); print(d.get('requestToken',''))")

# Poll (URL-encode the token)
ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('${TOKEN}', safe=''))")
RESULT=$(curl -s "${DT_PLATFORM_URL}/platform/storage/query/v1/query:poll?request-token=${ENCODED}" \
  -H "Authorization: Bearer ${DT_API_TOKEN}")

MZ-Filtered Timeseries Lookup

timeseries avg(dt.host.cpu.usage), by:{dt.entity.host}
| lookup [fetch dt.entity.host | filter in(managementZones, "MY_MZ")],
    sourceField:dt.entity.host, lookupField:id
| filter isNotNull(lookup.id)

Settings API Read

curl -s "${DT_PLATFORM_URL}/platform/classic/environment-api/v2/settings/objects?schemaIds=builtin:anomaly-detection.metric-events" \
  -H "Authorization: Bearer ${DT_API_TOKEN}"

Documents API (Dashboard) — Multipart Parsing

Dashboard responses are multipart; parse with python3 to extract the JSON body from the multipart envelope.

Security Best Practices

  • Never expose DT_API_TOKEN in output or logs
  • Validate token prefix (dt0s16. for Platform, dt0c01. for classic)
  • Read-only by default; write operations require explicit approval
  • No hardcoded entity IDs, MZ names, or AIDE IDs in scripts

When to Apply This Skill

  • When working with Dynatrace monitoring data
  • When querying infrastructure metrics or entity inventory
  • When triaging Davis problems or reviewing logs
  • When managing dashboards or alerting configuration
  • When creating or modifying metric event rules or maintenance windows

Resources

Related Assets

Dynatrace Operations Agent

active

Autonomous Dynatrace Platform agent that executes DQL queries, reads settings, and runs diagnostic workflows against any Grail-based tenant. Discovers credentials automatically (env var, .dtenv file, or prompt), executes live API calls, and presents formatted results. Use for entity inventory, metrics analysis, problem triage, log review, and guided troubleshooting.

claude
dynatrace
monitoring
observability
dql
grail
+4

Owner: platform-infrastructure

Dynatrace Kubernetes Service Triage

active

Systematic triage of a Dynatrace-monitored Kubernetes service using DQL queries for entity discovery, JVM health, thread analysis, pod generation comparison, and Davis problem correlation. Produces structured root cause analysis with Splunk query handoffs for restricted log environments.

claude
codex
vscode
dynatrace
kubernetes
troubleshooting
spring-boot
jvm
+2

Owner: epic-platform-sre

dynatrace-k8s-triage

active

Systematic Kubernetes service triage using Dynatrace DQL — entity discovery, JVM health, thread analysis, pod generation comparison, Davis problem correlation, and Splunk SPL query generation for restricted log environments.

codex
dynatrace
kubernetes
troubleshooting
jvm
spring-boot
+3

Owner: epic-platform-sre

Azure Resource Health Diagnosis

experimental

Analyze an Azure resource’s health, diagnose issues using logs and telemetry, and produce a remediation plan for identified problems.

claude
codex
vscode
azure
diagnostics
monitoring
incident
remediation
+1

Owner: epic-platform-sre

Spring Boot Container Crash Triage

active

Diagnose Spring Boot container crashes in Kubernetes by correlating Dynatrace JVM telemetry, pod lifecycle events, and deployment state. Covers rolling deployment failures, OOM kills, thread exhaustion, startup failures, and major framework upgrades.

claude
codex
vscode
spring-boot
java
kubernetes
troubleshooting
jvm
+3

Owner: epic-platform-sre

Azure Resource Troubleshooter

active

Goal-oriented Azure specialist that autonomously diagnoses and resolves Azure resource issues. Queries Azure APIs, analyzes logs, checks configurations, and provides actionable remediation steps. Use for infrastructure debugging and incident response.

vscode
azure
troubleshooting
infrastructure
debugging
incident-response
+2

Owner: platform-infrastructure