Monitoring, Debugging & Recovery
Monitoring, Debugging & Recovery
Monitoring, Debugging & Recovery
Once an endpoint is live, the webhook portal becomes your operational control surface.
Use it to answer four questions quickly:
From the endpoint page you can review the messages sent to that endpoint and drill into individual deliveries.
This is where you inspect:
The portal is the first place to check before debugging your own application logs.
When you are tracking down a specific delivery, filter the message list by:
If you know roughly when the issue occurred, date filtering is usually the fastest way to narrow the list.
An event is considered successfully delivered when your endpoint returns a 2xx response within 15 seconds.
Everything else is treated as a failure, including:
3xx redirects4xx responses5xx responsesThis means a redirecting URL is not acceptable for production webhook delivery. The configured endpoint must be the final destination.
If an attempt fails, YunoJuno retries automatically using this schedule:
In practice, the final automatic attempt happens roughly 27 hours and 35 minutes after the first attempt, assuming each prior attempt failed.
If all automatic attempts are exhausted, the message is marked as failed for that endpoint.
Operationally, this is the dead-letter state:
If an endpoint keeps failing over multiple days, the portal can automatically disable the endpoint. That is intended to stop endless failed traffic against a broken destination.
The portal gives you three useful recovery patterns:
Resend: replay a single message to the endpointRecover Failed Messages: replay failed messages from a chosen point in timeReplay Missing: replay messages that were never attempted to that endpointThese tools are what you use after downtime, misconfiguration, or a bug in your receiving service.
Resend for a single known failure, or Recover Failed Messages for a wider outage.The portal retains payloads long enough to make debugging and recovery practical. The current default retention period is 90 days.
That retention window is another reason to use the portal early in an incident rather than trying to reconstruct failed deliveries later from incomplete logs.