[US-East] Transient Disconnection of Network Disks Impacting Compute and GPU Instances

Incident Report for Crusoe Cloud

Resolved

This incident is now resolved. If you have any questions or experience any further issues, please reach out to our support team at support@crusoecloud.com.

Posted Mar 04, 2025 - 15:33 UTC

Monitoring

We have implemented preventive measures to mitigate recurrence of this issue and are actively monitoring for any further transient disconnects.

Posted Mar 04, 2025 - 00:53 UTC

Update

We resolved a network issue that caused temporary inaccessibility and availability problems for some VMs. We are currently identifying impacted servers and are proactively contacting affected customers to migrate them to remediated servers.

Posted Mar 03, 2025 - 23:31 UTC

Update

We have identified a network trigger for a kernel bug with the assistance of our storage vendor which causes certain disk connection failures and cause a VM to go into an unresponsive state. We are actively investigating the conditions that trigger the issue, as well as potential remediation steps.

Posted Mar 03, 2025 - 21:15 UTC

Update

We have identified a potential trigger for a kernel bug with the assistance of our storage vendor which causes certain disk connection failures, causing the VMs to go into an unresponsive state. We are actively investigating the conditions that trigger the issue, as well as potential remediation steps.

Posted Mar 03, 2025 - 18:41 UTC

Update

We have identified a potential trigger for a kernel bug with the assistance of our storage vendor and are actively investigating the conditions that trigger the issue, as well as potential remediation steps.

Posted Mar 03, 2025 - 18:36 UTC

Identified

We've identified a transient disconnection of network disks impacting compute and GPU instances in the US-East region leading to some VMs becoming unresponsive (including SSH). Our team is working to actively mitigate this issue. You may notice some interruptions to compute instances during this time.

Posted Mar 02, 2025 - 17:15 UTC

Update

We are continuing to investigate this issue.

Posted Mar 01, 2025 - 12:27 UTC

Investigating

We are currently investigating an issue in the us-east region, causing a few instances to be unreachable

Posted Mar 01, 2025 - 12:26 UTC

This incident affected: GPU Virtual Machines (us-east1).