|
|
|
UPS Testing
Power Failure Results
On Tuesday May 4 a large portion of the campus lost power for about an hour. Below are the details of what happened with the cluster.
- Power failed at around 15:27
- Nodes Shutdown
Between 15:29:24 and 15:29:48 nodes started shutdown process.
Between 15:29:55 and 15:30:02 upsd on ups maters "disconnected"
- UWMLSC Powered Down
15:27:21 UWMLSC reported being on battery
15:47:21 User requested FSD!
15:47:37 System is being shutdown by UPS
15:47:52 127.0.0.1 disconnected
15:48:03 UWMLSC exited on signal 15
- Powerware UPS failed on switch at about 16:06:32
- The following machines were then shutdown by hand:
15:34:42 contra told to shutdown - 15:35:22 contra went down
15:34:55 hydra told to shutdown - 15:35:37 hydra went down
15:37:02 condor told to shutdown - 15:37:14 condor went down
15:39:40 hades
15:42:54 tigger
15:43:42 watchtower
15:45:26 nest
15:46:26 kanga
15:47:24 dataserver started shutdown (nuts) - 15:47:52 down
15:51:56 gravity FSD -> 15:53:52 shutdown (nuts) - 15:56:12 gravity
(powered back on for email) - 16:03:41 down again
15:55:40 medusa rebooted - 16:03:51 medusa shutdown
- The following machines shut down hard after power was lost:
15:43:11 storage2 last log message
16:04:13 storage1 started to go down - 16:09:10 last log entry, still not down
- Power was restored at around 16:25
- 17:10:56 slave nodes started coming up - a few had problems (3 batteriesand 1 cabling)
- 16:30:56 uwmlsc partly up - 16:39:44 uwmlsc partly up - 16:47:48 uwmlsc
partly up - 16:54:07 uwmlsc up, yay! ** medusa set as primary
DNS; "Starting NFS Services" hung but worked after medusa turned on.
When other machines came up:
17:07:09 hydra
17:08:41 contra
17:05:54 condor
17:38:46 kanga
17:30:45 hades
16:26:56 nest
17:11:42 watchtower
17:38:49 tigger
16:55:51 gravity
16:51:16 medusa
16:35:51 storage1 partly up - 16:35:51 partly up - 16:52:42 up
16:35:23 storage2 started coming up - 16:51:36 storage2 finished coming up (after medusa and thus DNS came up)
|