Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Go to file
Jason Kulatunga 32682283da
(0.2.2) Automated packaging of release by Packagr
4 years ago
.github make sure the webfrontend is packaged and uploaded to each release for manual installation. 4 years ago
collector retrieve all device data. SAS devices do not return power on hours when using `-a` flag (unlike other device types). 4 years ago
docker fix #3 4 years ago
docs add thankyou comment to README. 4 years ago
rootfs fix #3 4 years ago
webapp (0.2.2) Automated packaging of release by Packagr 4 years ago
.gitignore keep example config file in sync with config defaults. fixes #3 4 years ago
CONTRIBUTING.md update getting started & documentation to remove -v /dev/ mount and --privileged requirement. Uses --cap-add and --device instead 4 years ago
LICENSE add license and logo link. 4 years ago
README.md retrieve all device data. SAS devices do not return power on hours when using `-a` flag (unlike other device types). 4 years ago
REFERENCES.md init 4 years ago
example.scrutiny.yaml keep example config file in sync with config defaults. fixes #3 4 years ago
go.mod new device detection engine (OS aware). Uses smartctl --scan to detect drives (and conditionally uses jaypipes/ghw). WWN is calculated from smartctl data, then retrieved from GHW, and fallsback to serial number. WWN calcuation code is based on IEEE spec, for "Registered" IEEE format - NAA5. TODO: support NAA6 and other formats? 4 years ago
go.sum new device detection engine (OS aware). Uses smartctl --scan to detect drives (and conditionally uses jaypipes/ghw). WWN is calculated from smartctl data, then retrieved from GHW, and fallsback to serial number. WWN calcuation code is based on IEEE spec, for "Registered" IEEE format - NAA5. TODO: support NAA6 and other formats? 4 years ago
packagr.yml upadte packggr. 4 years ago

README.md

scrutiny_view

scrutiny

CI codecov GitHub license Godoc Go Report Card GitHub release

WebUI for smartd S.M.A.R.T monitoring

NOTE: Scrutiny is a Work-in-Progress and still has some rough edges.

Introduction

If you run a server with more than a couple of hard drives, you're probably already familiar with S.M.A.R.T and the smartd daemon. If not, it's an incredible open source project described as the following:

smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.

Theses S.M.A.R.T hard drive self-tests can help you detect and replace failing hard drives before they cause permanent data loss. However, there's a couple issues with smartd:

  • There are more than a hundred S.M.A.R.T attributes, however smartd does not differentiate between critical and informational metrics
  • smartd does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
  • S.M.A.R.T attribute thresholds are set by the manufacturer. In some cases these thresholds are unset, or are so high that they can only be used to confirm a failed drive, rather than detecting a drive about to fail.
  • smartd is a command line only tool. For head-less servers a web UI would be more valuable.

Scrutiny is a Hard Drive Health Dashboard & Monitoring solution, merging manufacturer provided S.M.A.R.T metrics with real-world failure rates.

Features

Scrutiny is a simple but focused application, with a couple of core features:

  • Web UI Dashboard - focused on Critical metrics
  • smartd integration (no re-inventing the wheel)
  • Auto-detection of all connected hard-drives
  • S.M.A.R.T metric tracking for historical trends
  • Customized thresholds using real world failure rates
  • Temperature tracking
  • Provided as an all-in-one Docker image (but can be installed manually)
  • (Future) Configurable Alerting/Notifications via Webhooks
  • (Future) Hard Drive performance testing & tracking

Getting Started

Docker

If you're using Docker, getting started is as simple as running the following command:

docker run -it --rm -p 8080:8080 \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
--name scrutiny \
analogj/scrutiny
  • /run/udev is necessary to provide the Scrutiny collector with access to your device metadata
  • --cap-add SYS_RAWIO is necessary to allow smartctl permission to query your device SMART data
    • NOTE: If you have NVMe drives, you must use --cap-add SYS_ADMIN instead. See issue #26
  • --device entries are required to ensure that your hard disk devices are accessible within the container
  • analogj/scrutiny is a omnibus image, containing both the webapp server (frontend & api) as well as the S.M.A.R.T metric collector. (see below)

Hub/Spoke Deployment

In addition to the Omnibus image (available under the latest tag) there are 2 other Docker images available:

  • analogj/scrutiny:collector - Contains the Scrutiny data collector, smartctl binary and cron-like scheduler. You can run one collector on each server.
  • analogj/scrutiny:web - Contains the Web UI, API and Database. Only one container necessary
docker run -it --rm -p 8080:8080 \
--name scrutiny-web \
analogj/scrutiny:web

docker run -it --rm \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e SCRUTINY_API_ENDPOINT=http://SCRUTINY_WEB_IPADDRESS:8080 \
--name scrutiny-collector \
analogj/scrutiny:collector

Usage

Once scrutiny is running, you can open your browser to http://localhost:8080 and take a look at the dashboard.

Initially it will be empty, however after the first collector run, you'll be greeted with a list of all your hard drives and their current smart status.

The collector is configured to run once a day, but you can trigger it manually by running the following command

docker exec scrutiny /scrutiny/bin/scrutiny-collector-metrics run

Configuration

We support a global YAML configuration file that must be located at /scrutiny/config/scrutiny.yaml

Check the example.scrutiny.yml file for a fully commented version.

Contributing

Please see the CONTRIBUTING.md for instructions for how to develop and contribute to the scrutiny codebase.

Work your magic and then submit a pull request. We love pull requests!

If you find the documentation lacking, help us out and update this README.md. If you don't have the time to work on Scrutiny, but found something we should know about, please submit an issue.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Jason Kulatunga - Initial Development - @AnalogJ

Licenses

Sponsors

Scrutiny is only possible with the help of my Github Sponsors.

They read a simple reddit announcement post and decided to trust & finance a developer they've never met. It's an exciting and incredibly humbling experience.

If you found Scrutiny valuable, please consider supporting my work