Troubleshooting
Common issues and resolution steps for the Oracle Cloud K3s cluster.
1. Oracle Instance Reclamation
Issue: Server is flagging for reclamation due to low CPU.
Resolution: Verify the stress-ng or urandom loop is running. See Always Free Policy.
2. Storage Issues (Longhorn)
Issue: Pods stuck in ContainerCreating with volume errors.
Resolution:
- Check the Longhorn UI (
Port 30082). - Verify if the volume is "Attached" but "Unhealthy".
- Restart the
longhorn-managerpods if synchronization is lost.
3. Database Connectivity
Issue: Apps cannot connect to PostgreSQL.
Resolution:
- Check if the
pgpod in thedbnamespace is running. - Verify
fsGroup: 999is set in the deployment. - Check the internal DNS:
nslookup pg-svc.db.
4. Monitoring Stack Storage Migration (Longhorn to Local-Path)
Transitioning the monitoring stack from Longhorn to local-path storage to eliminate overhead.
1. Overview
Longhorn is designed for distributed block storage and introduces unnecessary I/O and CPU overhead for ephemeral time-series data on k3s. The monitoring stack is migrated to local-path-provisioner to utilize direct, host-bound directories (/var/lib/rancher/k3s/storage/). This ensures native NVMe/block speed with zero CPU overhead.
2. Implementation Procedure
A. Phase 1: Removing Immutable Resources
Kubernetes forbids mutating PersistentVolumeClaim (PVC) or volumeClaimTemplates in a StatefulSet after creation.
- Delete existing StatefulSets and PVCs to allow recreation with the new
storageClassName.
B. Phase 2: Resolving Finalizer Deadlock
Deletion of PVCs may deadlock if the kubernetes.io/pvc-protection finalizer waits for a Longhorn detachment signal that never arrives.
- Remove finalizers via the API to bypass Longhorn:
C. Phase 3: Restarting Local-Path Provisioner
Pods may remain in Pending state if the local-path-provisioner pod loses network connectivity (dial tcp ... no route to host) during rapid resource changes.
- Delete the provisioner pod to force a network reset:
- Verify the provisioner reconnects and binds the pending PVCs.
3. Configuration Standards
Storage capacity is managed by the application, as local-path does not enforce filesystem limits:
- Prometheus: Application-level retention is hard-limited to 1.5GiB or 24 hours. TSDB prunes data to maintain this limit.
- Grafana: Configuration uses a lightweight SQLite database (<50MB).
- Alertmanager: Stores only transient notification routing data.
4. Verification
The monitoring stack operates with zero distributed storage overhead on the Oracle VM disk.
Changelog
| Date | Change |
|---|---|
| Feb 13, 2026 | Added Monitoring Stack Migration report. |
| Feb 09, 2026 | Added PostgreSQL connection checks and Longhorn UI details. |
| Jan 20, 2026 | Initial placeholder created. |