First off, let's just get this out of the way. Yes, this is a post about a Homelab issue I diagnosed, and not a post about Cybersecurity. What's it doing on my blog, which is mainly about Cybersecurity? Well first off, if you're in this business, you should have some kind of Homelab, or practice environment, that you can break and figure out things, in order to improve your skills.
Second, what is cybersecurity at its core, other than something you do to try and prevent things from taking down your business? Not running redundant firewalls/ISP's? Security issue. Not having tested "Clean" backups? Security issue. Someone has a nicer shirt than you? Insecurity issue; but don't worry, you're amazing!
The start of my long weekend, but not what I was expecting.
It started the way these things always do — something that was working fine suddenly wasn't.
My Dell R730xd had been running reliably for months. A dense homelab workhorse packed with enterprise Samsung MZ-ILS3T8B 3.84TB SAS SSDs, dual 25GbE NICs (Yes, with MPIO), an 4xNVMe PCIe adapter card, and a pair of 750W power supplies idling at a comfortable 240 watts. Then, I finally decided to rackmmount it in order to make room for my new AI Server (I'll do a post on that one once it arrives).
Shortly after getting everything re-wired, I noticed my fans going from Idle to the "Scream of the banshee" setting while I was cleaning up my boxes. Puzzled, I checked my iDRAC and had noticed that the server started rebooting itself every fifteen minutes like clockwork, and the iDRAC log had one message repeating over and over:
This is the story of how I tracked it down.
What Does "BP1 5V PG" Even Mean?
Before diving into the troubleshooting, it helps to understand what this error is actually saying. "BP1" refers to Backplane 1 — one of two drive backplanes in the R730xd's 24-bay SFF chassis. "5V PG" stands for "5V Power Good," a hardware-level signal where the voltage regulator responsible for producing 5V reports whether its output is within acceptable tolerance. When the server detects that signal drop, it treats it as a critical fault and resets to protect components.

The R730xd's 24 front bays are split across two backplanes. Looking at the front of the server, the drives are numbered 0–23 left to right. But when you open the chassis from the top, the left bank is labeled A1 (Backplane 1) and the right bank is labeled A0 (Backplane 0). This labeling discrepancy *will* become important later.

[📸 Photo: Interior view showing A0 and A1 backplane labels Coming Soon]
Evidence Gathering
My configuration at the time of the failure was straightforward. Backplane 1 (the left bank, A0) was fully populated with 12 Samsung enterprise SAS SSDs, including the boot drives. Backplane 1 (right bank, A1) had only a couple of SATA drives. The server also had multiple dual-port 25GbE NICs and a recently added NVMe PCIe adapter card installed in PCIe slots.
The only recent change was adding one more Samsung MZ-ILS3T8B — the same model already filling BP0 — into one of BP1's slots.
The Log Tells the Story
Here's what the iDRAC system event log looked like by the time I started investigating. The faults were relentless:
Sat Feb 14 2026 21:28:09 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 21:12:24 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 20:40:51 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 20:25:05 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 19:29:27 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 19:12:54 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 18:56:23 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 18:27:24 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 18:10:41 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 17:32:41 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 17:02:03 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 16:41:07 The system board BP1 5V PG voltage is outside of range.
Mon Feb 09 2026 00:05:45 The system board BP1 5V PG voltage is outside of range.
Every 15 to 30 minutes, the 5V PG signal on BP1 would trip, and the server would reset. But buried at the bottom of the log was something subtle — a single, lone fault from five days earlier on February 9th. Whatever was failing had been marginal for days before it started cascading.

First Suspect: The PCIe Cards
My initial thought was where most good troubleshooting starts; Asking myself "what changed"? Could the recently added NVMe PCIe card, or the multiple dual 25GbE NICs be destabilizing the 5V rail?
The answer turned out to be no. PCIe slots provide power on the 12V and 3.3V rails only — there are literally no 5V pins in the PCIe connector specification. More importantly, the "BP1 5V PG" error is specific to the backplane's own local power delivery circuit, not the system-wide 5V rail.
Second Suspect: The Power Supply
Once you move from "what changed" and rule those possibilities out, you start with the OSI Layer 1 and move your way up to simplify troubleshooting efforts. Was it the cable I used on the rack? Nope, but a good one to test instead of dismissing outright. I've seen guys with 10+ years experience get humbled by that one, thinking they were too skilled to consider it was a cable issue.
What about the power supplies? With 240W idle draw on 750W-rated PSUs, I had plenty of headroom on total wattage. But total capacity isn't the only thing that can fail — individual voltage regulators inside a PSU can degrade. A PSU with a weakening 5V regulator stage could cause exactly this symptom.
I shut down, disconnected PSU 1, and brought the server up on PSU 2 alone. Then I waited.
The log continued:
Sat Feb 14 2026 21:58:53 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 21:43:36 The power input for power supply 1 is lost.
I let out the usual "verbal frustration" when the problem continued.

The PSU 1 power loss entry at 21:43 was me physically disconnecting it. The fault at 21:58 — fifteen minutes into the PSU 2 solo test — confirmed that swapping power supplies changed nothing. The problem was downstream of the PSUs.
Third Suspect: The New Drive
The timeline was suspicious. BP1 had been running fine with its existing drive, I added one more Samsung SAS SSD, and the faults started. I pulled the newly added drive and let the server continue running.
It faulted again within minutes.
Sat Feb 14 2026 22:13:45 The system board BP1 5V PG voltage is outside of range.
Sat Feb 14 2026 21:59:53 Drive 12 is removed from disk drive bay 1.
Sat Feb 14 2026 21:58:53 The system board BP1 5V PG voltage is outside of range.
The drive was already out by 21:59, and BP1 faulted again at 22:13. The new drive wasn't the sole culprit — BP1 was still tripping with eleven SSDs.
The Real Problem: Backplane Saturation
This is where the pieces finally came together. I opened my Server Chassis and noticed that looking from the front, BP0 Was labled on the right Side, while BP1 was labeled on the Left! BP1 wasn't failing because of one bad drive. It was failing because of 12.
Twelve enterprise SAS SSDs — even eleven — are enough to draw more from the 5V rail at peak load than standard spinning disks. Once I got my backplane dyslexia sorted out, things started to click. The new drive hadn't broken anything; it had likely been the straw that broke the camel's back, pushing an already-marginal rail past its limit.
The lone fault on February 9th was the first warning sign.
A Clue in the Voltage Sensors
Looking at the iDRAC voltage sensor list revealed something telling: only BP1 has a dedicated 5V PG sensor. There is no corresponding "BP0 5V PG" entry:
System Board 5V SWITCH PG Good
System Board BP1 5V PG Good

This strongly suggests the two backplanes are not powered identically. BP0 likely draws from the main "System Board 5V SWITCH" rail — a potentially more robust power path designed for the primary drive load. This was what confused me, as the drives were labled 0-23 from the front, BP1 Started on the left, and BP0 was for the Right 1/2 Bank! Which meant my assumption it was related to the new drive couldn't be the case, it was my saturated backplane all along.
The R740xd Gets a Fix. The R730xd Doesn't.
Looking into the issue, I found that the R740xd (14th gen) has this exact same error as a documented known issue — and Dell fixed it with a CPLD firmware update (version 1.0.6+). The CPLD (Complex Programmable Logic Device) handles power sequencing and monitoring, and the updated firmware corrected the voltage sensing thresholds.
The R730xd has no equivalent fix. Dell's guidance for the 13th gen platform points toward hardware replacement when PSU swaps and power cycling don't resolve the issue. But in this case, my presumption is that it's not a hardware defect — it's a design limitation being exposed by a workload the platform wasn't originally spec'd for.
The Fix: Re-balance the Load
The solution is straightforward: redistribute drives across both backplanes so that BP1 isn't carrying the full load alone. I moved three SSDs from BP1 over to BP0 reduces the 5V current draw on BP1's regulator and shifts it onto BP0's.

Since the server uses an HBA 330i rather than a RAID controller, the drives are passed straight through to the operating system. I'm running a ZFS Volume with TrueNAS, so the drives are identified by WWN and serial number, not by slot position, so physically moving them between backplanes won't break any pools or volumes — as long as mount points use UUIDs rather than device paths.

After re-balancing, the verification should be simple: monitor iDRAC for BP1 5V PG faults. Given that the faults were occurring every 15–30 minutes, even a few hours of clean operation is a strong indication of the fix working. 24 hours without a fault confirms the fix. It's also worth keeping an eye on the "System Board 5V SWITCH" sensor to make sure BP0's rail handles the increased load gracefully.
Lessons Learned
Old servers weren't designed for your new drives. The R730xd is a fantastic platform for a home-built SAN, and is incredibly cheap for the drive density and versatility of the platform, but its power delivery was likely designed for spinning disks. A full backplane of enterprise SAS SSDs likely exceeded the 5V budget the server was designed for.
Read the log carefully. The lone fault on February 9th — five days before the cascading failures — was the canary in the coal mine. If I'd caught it then, I might have saved time troubleshooting the issue by not chasing down paths that weren't related to the fault.
Establishing consistency. Understanding how often the fault was occuring was instrumental in knowing how to track once the issue was fixed. Many times people assume a fix is working without giving it ample time to work its way through your system.
Understand your power topology. Not all backplanes are created equal. The absence of a BP0 5V PG sensor told me the two backplanes have different power architectures, which helped me understand the fix.
Newer generations may have firmware fixes for your problems. Always check if a related platform has a known solution — even if it doesn't directly apply, it can explain the root cause and point you in the right direction.

