LaunchDarkly engineers noted at 8:23am PDT that error rates increased across LaunchDarkly’s systems. We identified that our DNS registration had expired at 8:17am PDT which caused widespread impact across launchdarkly.com properties. At 9:07am PDT our team made changes to the DNS system to remediate the issue, and within a few minutes, as customers' DNS caches were cleared, all monitoring returned to normal levels.
During the roughly 50 minute incident timeframe, the following impact was encountered:
Some new connections from our SDKs failed. SDKs that were connected prior or after this window continued functioning normally. Impacted SDKs automatically recovered on their own, as the default behavior is to continue trying to connect with exponential backoff.
Some event data captured by SDKs was not recorded. Lost event data during the incident window impacts data export, experimentation results, as well as the Users page, flag statuses, flag insights charts, and the debugger.
We have identified several process gaps that led to the DNS issue, and are addressing to ensure that a similar incident does not occur in the future.