Check RAID Status Detail on Linux Server

March 12, 2021 masmind

How to check RAID status detail on Linux server via SSH.

DISCLAIMER : This article only used when nagios showing WARNING / CRITICAL of RAID status.

Table of Contents

Check RAID Error via Nagios

First, check raid status via nagios. Some of notifications may lead to false alarm, so we need to check thoroughly from time to time. For example, there are some server that having issue WARNING status of RAID disk.

Check via SSH

Check Overall Status of Logical Disk

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0|grep "Firmware state"

It will appear like this :

[root@ssdvps22 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0|grep "Firmware state"
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

How to Read Status Result

Based on previous check result, ssdvps22 showing good result of array logical disk RAID. We need to expect all disk showing Online, Spun Up to all servers.

What to Do the Disk Showing Bad or Degraded?

When one of disk showing Bad result or Degraded result, it is time to replace the disk. However, we need to find which disk that causing issue.

Find The Problematic Disk

Before proceed, make sure to install smartctl first.

yum install smartmontools -y

Simply use this command for a detailed look which disk that causing issue of bad or degraded.

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0| egrep 'Slot\ Number|Device\ Id|Inquiry\ Data|Raw|Firmware\ state' | sed 's/Slot/\nSlot/g'

You will see exactly which slot that causing a disk issue.

Find Disk Serial Number (SN)

Then issue smartctl command to find disk serial number.

NOTE : N means disk slot, replace with numeric.

smartctl -a -d megaraid,N /dev/sdb

The result would be something like this :

[root@ssdvps22 ~]# smartctl -a -d megaraid,6 /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.2.2.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: HFS960G32FEH-7A10A
Serial Number: NJ04N6393I1204G1J
LU WWN Device Id: 5 ace42e 0251411f7
Add. Product Id: DELL(tm)
Firmware Version: DE03
User Capacity: 960,197,124,096 bytes [960 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Mar 12 10:02:43 2021 WIB
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Get DC Team to Replace Disk ASAP

As soon as you got serial number that has degraded or bad disk, then forward it to datacenter team for replacement. DO NOT wait for tomorrow!! Replace the disk before completely down. Please take a note, disk failure tolerance only 1 disk. You can escalate to huda via slack and let him know.

Hot Swap Disk

After the disk has been identified, then ask DC Team to replace the disk. This can be done while server is running. After disk has been replaced, the array RAID will be form automatically after you successfully plug the disk on and read by system.

How Long RAID Will be Re-formed?

It depends how many percentage build rate has been set. Basically build rate set to 30% ~ 50%. While build rate running, it will causing huge usage of I/O disk. It is best to set up build rate around 30%-50% only.

/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllinfo -aALL | grep -i rebuild

Rebuild will be set automatically (depends what will be set there).

Approximately 120 minutes – 180 minutes all disk will be sync up after disk replacement.

Check RAID Error via Nagios

Check via SSH

Check Overall Status of Logical Disk

How to Read Status Result

What to Do the Disk Showing Bad or Degraded?

Find The Problematic Disk

Find Disk Serial Number (SN)

Get DC Team to Replace Disk ASAP

Hot Swap Disk

How Long RAID Will be Re-formed?

You May Also Like

Solusi Entering Emergency Mode pada RedHat / Centos

MySQL : Recovering after a crash using tc.log

Grub: How to Make screen splash longer

Leave a Reply Cancel reply