Increased error rate in event ingestion
Incident Report for LaunchDarkly
Resolved
All available data has been recovered, and all systems fully operational.

Data is permanently incomplete for experimentation and usage metrics for the period between 11/25 5:15AM PT and 11/25 9:23AM PT.
Posted Dec 08, 2020 - 10:32 PST
Update
Data recovery for experimentation has been completed. We have permanently incomplete data for the period between 11/25 5:15AM PT and 11/25 9:23AM PT due to the outage.

We are continuing to work on recovery for flag evaluation data.
Posted Dec 03, 2020 - 18:18 PST
Update
New events are being ingested as normal and all services are operational. The timeframes below have been updated from our previous update.

The following past data is known to be missing or incomplete:
* 11/25 05:00 - 11/25 08:30 PT: Data for all affected services is missing and not expected to be recovered
* 11/25 08:30-11/26 16:40 PT: Data for experimentation is incomplete. We expect to be able to recover this data, but have no ETA at this time
* 11/25 08:30-11/27 00:00 PT: Flag evaluation data for flag insights is incomplete. We expect to be able to recover this data, but have no ETA at this time
* 11/30 15:00-11/30 21:30 PT: Flag evaluation data for flag insights is incomplete. We expect to be able to recover this data, but have no ETA at this time
Posted Dec 01, 2020 - 12:41 PST
Monitoring
New events are being ingested as normal and all services are operational.

The following past data is known to be missing or incomplete:
* 11/25 05:00 - 11/25 08:30 PT: Data for all affected services is missing and not expected to be recovered
* 11/25 8:30-11/26 16:40 PT: Data for experimentation is incomplete. We expect to be able to recover this data, but have no ETA at this time
* 11/25 8:30-11/27 00:00: Flag evaluation data for flag insights is incomplete. We expect to be able to recover this data, but have no ETA at this time
Posted Nov 27, 2020 - 20:58 PST
Update
The issues with the upstream provider have been fully mitigated, and we are working to recover data and restore event processing. Affected systems will gradually begin to receive recovered data until we are up to date.
Posted Nov 26, 2020 - 08:08 PST
Update
The execution of scheduled flag changes has returned to normal.

We are still seeing degraded performance for event ingestion and other services.
Posted Nov 25, 2020 - 14:35 PST
Update
We have mitigated the increased error rate for event ingestion and events are being stored again. Event processing has not recovered yet, so all affected services will continue to be delayed for the time being.

There is no currently no ETA for full recovery, but we are continuing to work with the upstream provider to resolve the issue.
Posted Nov 25, 2020 - 08:36 PST
Update
The execution of scheduled flag changes is also impacted by the upstream provider issues. Scheduled changes will not be applied until this incident has been mitigated.
Posted Nov 25, 2020 - 08:15 PST
Identified
An upstream provider has alerted us to an issue that is causing the increased error rate in event ingestion. We are working with them to resolve the issue.
Posted Nov 25, 2020 - 06:55 PST
Investigating
We are currently investigating errors in event ingestion. This is causing delays in several services including user indexing, experimentation, insights, and debugger events.
Posted Nov 25, 2020 - 05:50 PST
This incident affected: Non-core Services (Event Ingestion, Analytics Data Stream, Usage Metrics, Experimentation, Data Export).