Yes, I'm really enjoying everything!

smilemark blog

自宅サーバ日記

WD RED 3TB故障

投稿日:

4年前にサーバに増設したWD RED 3TBが故障した。

このHDDはもう1基のWD RED 4TBとLVMで統合して7TBとして運用、サーバのデータのバックアップとMacのTime Machineデータを保持してます。

smilemark_server_hdd2

異常はこんな症状でした。

  • Time Machineが取れなくなる。
    「最古のバックアップ:なし」となってデータが存在しないような挙動を示す。
    fsck_hfsでもリカバリできず。
  • サーバのログ(/var/log/messages)にHDDの異常を示すメッセージが度々出ていた。
    fsckを掛けると多量のエラーがあったけどリカバリできてログに異常は現れなくなった。
  • Time Machineデータを削除して新たにTime Machineを取り直した。
    なぜかいつもより非常に時間が掛かる。
    サーバのログは異常なし。
  • その後再びTime Machineが取れなくなる。
    fsck_hfsでもリカバリできず。
  • 問題の7TBのパーテションに単純に大きなファイルをコピーしても時間がやたら掛かる。
    サーバのログは異常なし。

サーバのログにはHDDの異常を示すものは何故か出ない。強制的にfsckしても異常なし。
でもこれは明らかにHDDがおかしい。
ということで2基のHDD(3TBと4TB)をsmartで確認してみたところ、3TBのドライブに多量のエラーが出てた。
これは交換するしかなさそうです。

SMART Error Log Version: 1
ATA Error Count: 17235 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 17235 occurred at disk power-on lifetime: 26819 hours (1117 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 28 11 00 e8 Error: UNC 8 sectors at LBA = 0x08001128 = 134222120

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 28 11 00 e8 08 48d+19:47:57.614 READ DMA
ec 00 00 00 00 00 a0 08 48d+19:47:57.611 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 48d+19:47:57.589 SET FEATURES [Set transfer mode]

Error 17234 occurred at disk power-on lifetime: 26819 hours (1117 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 28 11 00 e8 Error: UNC 8 sectors at LBA = 0x08001128 = 134222120

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 28 11 00 e8 08 48d+19:47:54.563 READ DMA
ec 00 00 00 00 00 a0 08 48d+19:47:54.558 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 48d+19:47:54.537 SET FEATURES [Set transfer mode]

Error 17233 occurred at disk power-on lifetime: 26819 hours (1117 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 28 11 00 e8 Error: UNC 8 sectors at LBA = 0x08001128 = 134222120

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 28 11 00 e8 08 48d+19:47:51.498 READ DMA
ec 00 00 00 00 00 a0 08 48d+19:47:51.494 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 48d+19:47:51.473 SET FEATURES [Set transfer mode]

Error 17232 occurred at disk power-on lifetime: 26819 hours (1117 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 28 11 00 e8 Error: UNC 8 sectors at LBA = 0x08001128 = 134222120

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 28 11 00 e8 08 48d+19:47:48.446 READ DMA
ec 00 00 00 00 00 a0 08 48d+19:47:48.442 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 48d+19:47:48.421 SET FEATURES [Set transfer mode]

Error 17231 occurred at disk power-on lifetime: 26819 hours (1117 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 28 11 00 e8 Error: UNC 8 sectors at LBA = 0x08001128 = 134222120

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 28 11 00 e8 08 48d+19:47:45.415 READ DMA
ec 00 00 00 00 00 a0 08 48d+19:47:45.411 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 48d+19:47:45.390 SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 80% 26958 1563613616
# 2 Short offline Completed: read failure 80% 26958 1563613616
# 3 Short offline Completed: read failure 80% 26954 1563613624

-自宅サーバ日記

Copyright© smilemark blog , 2024 All Rights Reserved Powered by STINGER.