does anyone know a health check steps for scom otherwise the below steps?
Override all script errors to be informational. Create a view for script errors. Then check once per day on the repeat count. Anything less than 5 (in general) resolve and ignore.
copy the disk state view and check each morning (especially for warning which don't generate alerts)
copy health service state - again, you should get an alert for this but it sometimes helps to have this in one place with other "quick check" views.
I always tend to check the operations manager event logs on the Management servers for warnings and criticals errors. It is an early indication of possible problems.
Check proper database sizing based on monitored server / device count
Check SQL configuration against supported configurations / best practices
Look for database latency - (event 2115) in the RMS / MS OpsMgr Event Log
Collect agent count reporting to each MS & RMS
Retrieve recent warning and critical events from RMS OpsMgr Event Log
Locate "grey" hosts
Verify fixes for OpsMgr / Agents
We can check for a failed backup as well as a successful backup but we never monitor for the fact a backup is even being attempted.
We also check the fact the Transaction logs are getting close to be full
We could check that the SQL Server is setup correctly like Auto Grow is False etc