Skip to content

Troubleshooting

For general K8s and Inspect sandbox debugging, see the Debugging K8s Sandboxes guide.

Capture Inspect SANDBOX-level logs

A good starting point to most issues is to capture the output of the Python logging module at SANDBOX level. See the SANDBOX log level section.

I'm seeing "Helm install: context deadline exceeded" errors

This means that the Helm chart installation timed out. When installing the Helm chart, the k8s_sandbox package uses the --wait flag to wait for all Pods to be ready.

Therefore, this error can be an indication of:

  • Cluster capacity issues. Consider increasing the timeout or scaling up your cluster.
  • A Pod failing to enter the ready state (could be a failing readiness probe, failing to pull the image, crash loop backoff, etc.)

Try installing the chart again (this can also be done manually) and check the Pod statuses and logs using a tool like K9s. Use the helm release name (will be in error message and SANDBOX -level logs) to filter the Pods.

I'm seeing "Helm uninstall failed" errors

These are likely because the Helm chart was never installed. This typically happens if you cancel an eval, or an eval fails before a certain sample's Helm chart was installed (including if the chart installation failed).

Check to see if any Helm releases were left behind:

helm list

And if you wish to uninstall them:

helm uninstall <release-name>

I'm seeing "Handshake status 404 Not Found" errors from Pod operations

This typically indicates that the Pod has been killed. This may be due to:

  • cluster issues (see View cluster events)
  • because the eval had already failed for an unrelated reason and the Helm releases were uninstalled whilst some operations were queued or in flight. Check the .json or .eval log produced by Inspect to see the underlying error.

View cluster events

Certain cluster events may impact your eval, for example, a node failure.

The following commands are a primitive way to view cluster events. Your cluster may have observability tools which collect these events and provide a more user-friendly interface.

kubectl get events --sort-by='.metadata.creationTimestamp'

To also see timestamps:

kubectl get events --sort-by='.metadata.creationTimestamp' \
  -o custom-columns=LastSeen:.lastTimestamp,Type:.type,Object:.involvedObject.name,Reason:.reason,Message:.message

To filter to a particular release or Pod, either pipe into grep or use the --field-selector flag:

kubectl get events --sort-by='.metadata.creationTimestamp' \
  --field-selector involvedObject.name=agent-env-xxxxxxxx-default-0

Find the Pod name (including the random 8-character identifier) in the SANDBOX-level logs or the stack trace.

To specify a namespace other than the default, use the -n flag.