scrutiny/README.md

<p align="center">
  <a href="https://github.com/AnalogJ/scrutiny">
  <img width="300" alt="scrutiny_view" src="webapp/frontend/src/assets/images/logo/scrutiny-logo-dark.png">
  </a>
</p>


# scrutiny

[![CI](https://github.com/AnalogJ/scrutiny/workflows/CI/badge.svg?branch=master)](https://github.com/AnalogJ/scrutiny/actions?query=workflow%3ACI)
[![codecov](https://codecov.io/gh/AnalogJ/scrutiny/branch/master/graph/badge.svg)](https://codecov.io/gh/AnalogJ/scrutiny)
[![GitHub license](https://img.shields.io/github/license/AnalogJ/scrutiny.svg?style=flat-square)](https://github.com/AnalogJ/scrutiny/blob/master/LICENSE)
[![Godoc](https://img.shields.io/badge/godoc-reference-blue.svg?style=flat-square)](https://godoc.org/github.com/analogj/scrutiny)
[![Go Report Card](https://goreportcard.com/badge/github.com/AnalogJ/scrutiny?style=flat-square)](https://goreportcard.com/report/github.com/AnalogJ/scrutiny)
[![GitHub release](http://img.shields.io/github/release/AnalogJ/scrutiny.svg?style=flat-square)](https://github.com/AnalogJ/scrutiny/releases)


WebUI for smartd S.M.A.R.T monitoring

> NOTE: Scrutiny is a Work-in-Progress and still has some rough edges.

[![](docs/dashboard.png)](https://imgur.com/a/5k8qMzS)

# Introduction

If you run a server with more than a couple of hard drives, you're probably already familiar with S.M.A.R.T and the `smartd` daemon. If not, it's an incredible open source project described as the following:

> smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.

Theses S.M.A.R.T hard drive self-tests can help you detect and replace failing hard drives before they cause permanent data loss. However, there's a couple issues with `smartd`:

- There are more than a hundred S.M.A.R.T attributes, however `smartd` does not differentiate between critical and informational metrics
- `smartd` does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
- S.M.A.R.T attribute thresholds are set by the manufacturer. In some cases these thresholds are unset, or are so high that they can only be used to confirm a failed drive, rather than detecting a drive about to fail.
- `smartd` is a command line only tool. For head-less servers a web UI would be more valuable.

**Scrutiny is a Hard Drive Health Dashboard & Monitoring solution, merging manufacturer provided S.M.A.R.T metrics with real-world failure rates.**

# Features

Scrutiny is a simple but focused application, with a couple of core features:

- Web UI Dashboard - focused on Critical metrics
- `smartd` integration (no re-inventing the wheel)
- Auto-detection of all connected hard-drives
- S.M.A.R.T metric tracking for historical trends
- Customized thresholds using real world failure rates
- Temperature tracking
- Provided as an all-in-one Docker image (but can be installed manually)
- (Future) Configurable Alerting/Notifications via Webhooks
- (Future) Hard Drive performance testing & tracking

# Getting Started

## RAID/Virtual Drives

Scrutiny uses `smartctl --scan` to detect devices/drives.

- All RAID controllers supported by `smartctl` are automatically supported by Scrutiny.
    - While some RAID controllers support passing through the underlying SMART data to `smartctl` others do not.
    - In some cases `--scan` does not correctly detect the device type, returning [incomplete SMART data](https://github.com/AnalogJ/scrutiny/issues/45).
    Scrutiny will eventually support overriding detected device type via the config file.
- If you use docker, you **must** pass though the RAID virtual disk to the container using `--device` (see below)
    - This device may be in `/dev/*` or `/dev/bus/*`.
    - If you're unsure, run `smartctl --scan` on your host, and pass all listed devices to the container.


## Docker

If you're using Docker, getting started is as simple as running the following command:

```bash
docker run -it --rm -p 8080:8080 \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
--name scrutiny \
analogj/scrutiny
```

- `/run/udev` is necessary to provide the Scrutiny collector with access to your device metadata
- `--cap-add SYS_RAWIO` is necessary to allow `smartctl` permission to query your device SMART data
    - NOTE: If you have **NVMe** drives, you must add `--cap-add SYS_ADMIN` as well. See issue [#26](https://github.com/AnalogJ/scrutiny/issues/26#issuecomment-696817130)
- `--device` entries are required to ensure that your hard disk devices are accessible within the container.
- `analogj/scrutiny` is a omnibus image, containing both the webapp server (frontend & api) as well as the S.M.A.R.T metric collector. (see below)

### Hub/Spoke Deployment

In addition to the Omnibus image (available under the `latest` tag) there are 2 other Docker images available:

- `analogj/scrutiny:collector` - Contains the Scrutiny data collector, `smartctl` binary and cron-like scheduler. You can run one collector on each server.
- `analogj/scrutiny:web` - Contains the Web UI, API and Database. Only one container necessary

```bash
docker run -it --rm -p 8080:8080 \
--name scrutiny-web \
analogj/scrutiny:web

docker run -it --rm \
-v /run/udev:/run/udev:ro \
--cap-add SYS_RAWIO \
--device=/dev/sda \
--device=/dev/sdb \
-e SCRUTINY_API_ENDPOINT=http://SCRUTINY_WEB_IPADDRESS:8080 \
--name scrutiny-collector \
analogj/scrutiny:collector
```

## Manual Installation (without-Docker)

While the easiest way to get started with [Scrutiny is using Docker](https://github.com/AnalogJ/scrutiny#docker),
it is possible to run it manually without much work. You can even mix and match, using Docker for one component and
a manual installation for the other.

See [docs/INSTALL_MANUAL.md](docs/INSTALL_MANUAL.md) for instructions.

## Usage

Once scrutiny is running, you can open your browser to `http://localhost:8080` and take a look at the dashboard.

Initially it will be empty, however after the first collector run, you'll be greeted with a list of all your hard drives and their current smart status.

The collector is configured to run once a day, but you can trigger it manually by running the following command

```
docker exec scrutiny /scrutiny/bin/scrutiny-collector-metrics run
```

# Configuration
We support a global YAML configuration file that must be located at /scrutiny/config/scrutiny.yaml

Check the [example.scrutiny.yml](example.scrutiny.yaml) file for a fully commented version.

## Notifications

Scrutiny supports sending SMART device failure notifications via the following services:
- Custom Script (data provided via environmental variables)
- Email
- Webhooks
- Discord
- Gotify
- Hangouts
- IFTTT
- Join
- Mattermost
- Pushbullet
- Pushover
- Slack
- Teams
- Telegram
- Tulip

Check the `notify.urls` section of [example.scrutiny.yml](example.scrutiny.yaml) for more information and documentation for service specific setup.

### Testing Notifications

You can test that your notifications are configured correctly by posting an empty payload to the notifications health check API.

```
curl -X POST http://localhost:8080/api/health/notify
```

# Debug mode & Log Files
Scrutiny provides various methods to change the log level to debug and generate log files.

## Web Server/API

You can use environmental variables to enable debug logging and/or log files for the web server:

```
DEBUG=true
SCRUTINY_LOG_FILE=/tmp/web.log
```

You can configure the log level and log file in the config file:

```
log:
  file: '/tmp/web.log'
  level: DEBUG
```

Or if you're not using docker, you can pass CLI arguments to the web server during startup:

```
scrutiny start --debug --log-file /tmp/web.log
```

## Collector

You can use environmental variables to enable debug logging and/or log files for the collector:

```
DEBUG=true
COLLECTOR_LOG_FILE=/tmp/collector.log
```

Or if you're not using docker, you can pass CLI arguments to the collector during startup:

```
scrutiny-collector-metrics run --debug --log-file /tmp/collector.log
```

# Contributing

Please see the [CONTRIBUTING.md](CONTRIBUTING.md) for instructions for how to develop and contribute to the scrutiny codebase.

Work your magic and then submit a pull request. We love pull requests!

If you find the documentation lacking, help us out and update this README.md. If you don't have the time to work on Scrutiny, but found something we should know about, please submit an issue.

# Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

# Authors

Jason Kulatunga - Initial Development - @AnalogJ

# Licenses

- MIT
- Logo: [Glasses by matias porta lezcano](https://thenounproject.com/term/glasses/775232)

# Sponsors

Scrutiny is only possible with the help of my [Github Sponsors](https://github.com/sponsors/AnalogJ/).

[![](docs/sponsors.png)](https://github.com/sponsors/AnalogJ/)

They read a simple [reddit announcement post](https://github.com/sponsors/AnalogJ/) and decided to trust & finance
 a developer they've never met. It's an exciting and incredibly humbling experience.

If you found Scrutiny valuable, please consider [supporting my work](https://github.com/sponsors/AnalogJ/)
add image to readme. 4 years ago			`<p align="center">`
			`<a href="https://github.com/AnalogJ/scrutiny">`
			`<img width="300" alt="scrutiny_view" src="webapp/frontend/src/assets/images/logo/scrutiny-logo-dark.png">`
			`</a>`
			`</p>`


init 4 years ago			`# scrutiny`
badges 4 years ago
			`[![CI](https://github.com/AnalogJ/scrutiny/workflows/CI/badge.svg?branch=master)](https://github.com/AnalogJ/scrutiny/actions?query=workflow%3ACI)`
add codecov badge. 4 years ago			`[![codecov](https://codecov.io/gh/AnalogJ/scrutiny/branch/master/graph/badge.svg)](https://codecov.io/gh/AnalogJ/scrutiny)`
badges 4 years ago			`[![GitHub license](https://img.shields.io/github/license/AnalogJ/scrutiny.svg?style=flat-square)](https://github.com/AnalogJ/scrutiny/blob/master/LICENSE)`
			`[![Godoc](https://img.shields.io/badge/godoc-reference-blue.svg?style=flat-square)](https://godoc.org/github.com/analogj/scrutiny)`
			`[![Go Report Card](https://goreportcard.com/badge/github.com/AnalogJ/scrutiny?style=flat-square)](https://goreportcard.com/report/github.com/AnalogJ/scrutiny)`
			`[![GitHub release](http://img.shields.io/github/release/AnalogJ/scrutiny.svg?style=flat-square)](https://github.com/AnalogJ/scrutiny/releases)`


init 4 years ago			`WebUI for smartd S.M.A.R.T monitoring`

added WIP note. 4 years ago			`> NOTE: Scrutiny is a Work-in-Progress and still has some rough edges.`

init 4 years ago			`[![](docs/dashboard.png)](https://imgur.com/a/5k8qMzS)`

			`# Introduction`

			If you run a server with more than a couple of hard drives, you're probably already familiar with S.M.A.R.T and the `smartd` daemon. If not, it's an incredible open source project described as the following:

			`> smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.`

			Theses S.M.A.R.T hard drive self-tests can help you detect and replace failing hard drives before they cause permanent data loss. However, there's a couple issues with `smartd`:

			- There are more than a hundred S.M.A.R.T attributes, however `smartd` does not differentiate between critical and informational metrics
			- `smartd` does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
			`- S.M.A.R.T attribute thresholds are set by the manufacturer. In some cases these thresholds are unset, or are so high that they can only be used to confirm a failed drive, rather than detecting a drive about to fail.`
			- `smartd` is a command line only tool. For head-less servers a web UI would be more valuable.

			`Scrutiny is a Hard Drive Health Dashboard & Monitoring solution, merging manufacturer provided S.M.A.R.T metrics with real-world failure rates.`

			`# Features`

			`Scrutiny is a simple but focused application, with a couple of core features:`

			`- Web UI Dashboard - focused on Critical metrics`
			- `smartd` integration (no re-inventing the wheel)
			`- Auto-detection of all connected hard-drives`
			`- S.M.A.R.T metric tracking for historical trends`
			`- Customized thresholds using real world failure rates`
			`- Temperature tracking`
			`- Provided as an all-in-one Docker image (but can be installed manually)`
			`- (Future) Configurable Alerting/Notifications via Webhooks`
			`- (Future) Hard Drive performance testing & tracking`

			`# Getting Started`

updating docs. use /opt/ not /etc/. Describe RAID instructions. 4 years ago			`## RAID/Virtual Drives`

			Scrutiny uses `smartctl --scan` to detect devices/drives.

			- All RAID controllers supported by `smartctl` are automatically supported by Scrutiny.
			- While some RAID controllers support passing through the underlying SMART data to `smartctl` others do not.
			- In some cases `--scan` does not correctly detect the device type, returning [incomplete SMART data](https://github.com/AnalogJ/scrutiny/issues/45).
			`Scrutiny will eventually support overriding detected device type via the config file.`
			- If you use docker, you must pass though the RAID virtual disk to the container using `--device` (see below)
			- This device may be in `/dev/` or `/dev/bus/`.
			- If you're unsure, run `smartctl --scan` on your host, and pass all listed devices to the container.


init 4 years ago			`## Docker`

			`If you're using Docker, getting started is as simple as running the following command:`

			```bash
			`docker run -it --rm -p 8080:8080 \`
changed command to use /run/udev in read-only mode. 4 years ago			`-v /run/udev:/run/udev:ro \`
update getting started & documentation to remove -v /dev/ mount and --privileged requirement. Uses --cap-add and --device instead close #26 close #18 4 years ago			`--cap-add SYS_RAWIO \`
			`--device=/dev/sda \`
			`--device=/dev/sdb \`
init 4 years ago			`--name scrutiny \`
update getting started & documentation to remove -v /dev/ mount and --privileged requirement. Uses --cap-add and --device instead close #26 close #18 4 years ago			`analogj/scrutiny`
init 4 years ago			```

update getting started & documentation to remove -v /dev/ mount and --privileged requirement. Uses --cap-add and --device instead close #26 close #18 4 years ago			- `/run/udev` is necessary to provide the Scrutiny collector with access to your device metadata
			- `--cap-add SYS_RAWIO` is necessary to allow `smartctl` permission to query your device SMART data
Adding automatic builds to CI for arm64 + windows - eventual support. Call out the fact that NVMe drives require --cap-add SYS_ADMIN in addition to SYS_RAWIO. 4 years ago			- NOTE: If you have NVMe drives, you must add `--cap-add SYS_ADMIN` as well. See issue [#26](https://github.com/AnalogJ/scrutiny/issues/26#issuecomment-696817130)
updating docs. use /opt/ not /etc/. Describe RAID instructions. 4 years ago			- `--device` entries are required to ensure that your hard disk devices are accessible within the container.
update readme docs. 4 years ago			- `analogj/scrutiny` is a omnibus image, containing both the webapp server (frontend & api) as well as the S.M.A.R.T metric collector. (see below)

			`### Hub/Spoke Deployment`

			In addition to the Omnibus image (available under the `latest` tag) there are 2 other Docker images available:

			- `analogj/scrutiny:collector` - Contains the Scrutiny data collector, `smartctl` binary and cron-like scheduler. You can run one collector on each server.
			- `analogj/scrutiny:web` - Contains the Web UI, API and Database. Only one container necessary

fix #3 Make sure that the collector attempts to correctly communicate with webapp container, even when running in dedicated container (and triggered manually). fixes cron schedule to run daily. added instructions for dedicated containers. 4 years ago			```bash
			`docker run -it --rm -p 8080:8080 \`
			`--name scrutiny-web \`
			`analogj/scrutiny:web`

			`docker run -it --rm \`
			`-v /run/udev:/run/udev:ro \`
update getting started & documentation to remove -v /dev/ mount and --privileged requirement. Uses --cap-add and --device instead close #26 close #18 4 years ago			`--cap-add SYS_RAWIO \`
			`--device=/dev/sda \`
			`--device=/dev/sdb \`
fix #3 Make sure that the collector attempts to correctly communicate with webapp container, even when running in dedicated container (and triggered manually). fixes cron schedule to run daily. added instructions for dedicated containers. 4 years ago			`-e SCRUTINY_API_ENDPOINT=http://SCRUTINY_WEB_IPADDRESS:8080 \`
			`--name scrutiny-collector \`
update getting started & documentation to remove -v /dev/ mount and --privileged requirement. Uses --cap-add and --device instead close #26 close #18 4 years ago			`analogj/scrutiny:collector`
fix #3 Make sure that the collector attempts to correctly communicate with webapp container, even when running in dedicated container (and triggered manually). fixes cron schedule to run daily. added instructions for dedicated containers. 4 years ago			```

updating docs. use /opt/ not /etc/. Describe RAID instructions. 4 years ago			`## Manual Installation (without-Docker)`

			`While the easiest way to get started with [Scrutiny is using Docker](https://github.com/AnalogJ/scrutiny#docker),`
			`it is possible to run it manually without much work. You can even mix and match, using Docker for one component and`
			`a manual installation for the other.`

			`See [docs/INSTALL_MANUAL.md](docs/INSTALL_MANUAL.md) for instructions.`
init 4 years ago
			`## Usage`

			Once scrutiny is running, you can open your browser to `http://localhost:8080` and take a look at the dashboard.

			`Initially it will be empty, however after the first collector run, you'll be greeted with a list of all your hard drives and their current smart status.`

			`The collector is configured to run once a day, but you can trigger it manually by running the following command`

			```
			`docker exec scrutiny /scrutiny/bin/scrutiny-collector-metrics run`
			```

			`# Configuration`
			`We support a global YAML configuration file that must be located at /scrutiny/config/scrutiny.yaml`

			`Check the [example.scrutiny.yml](example.scrutiny.yaml) file for a fully commented version.`

Adding documenation for notifications. 4 years ago			`## Notifications`

			`Scrutiny supports sending SMART device failure notifications via the following services:`
added custom script notification type. 4 years ago			`- Custom Script (data provided via environmental variables)`
Adding documenation for notifications. 4 years ago			`- Email`
			`- Webhooks`
			`- Discord`
			`- Gotify`
			`- Hangouts`
			`- IFTTT`
			`- Join`
			`- Mattermost`
			`- Pushbullet`
			`- Pushover`
			`- Slack`
			`- Teams`
			`- Telegram`
			`- Tulip`

			Check the `notify.urls` section of [example.scrutiny.yml](example.scrutiny.yaml) for more information and documentation for service specific setup.

			`### Testing Notifications`

			`You can test that your notifications are configured correctly by posting an empty payload to the notifications health check API.`

			```
			`curl -X POST http://localhost:8080/api/health/notify`
			```

updated Debug mode documenation. 4 years ago			`# Debug mode & Log Files`
			`Scrutiny provides various methods to change the log level to debug and generate log files.`

			`## Web Server/API`

			`You can use environmental variables to enable debug logging and/or log files for the web server:`

			```
			`DEBUG=true`
			`SCRUTINY_LOG_FILE=/tmp/web.log`
			```

			`You can configure the log level and log file in the config file:`

			```
			`log:`
			`file: '/tmp/web.log'`
			`level: DEBUG`
			```

			`Or if you're not using docker, you can pass CLI arguments to the web server during startup:`

			```
			`scrutiny start --debug --log-file /tmp/web.log`
			```

			`## Collector`

			`You can use environmental variables to enable debug logging and/or log files for the collector:`

			```
			`DEBUG=true`
			`COLLECTOR_LOG_FILE=/tmp/collector.log`
			```

			`Or if you're not using docker, you can pass CLI arguments to the collector during startup:`

			```
			`scrutiny-collector-metrics run --debug --log-file /tmp/collector.log`
			```

init 4 years ago			`# Contributing`

			`Please see the [CONTRIBUTING.md](CONTRIBUTING.md) for instructions for how to develop and contribute to the scrutiny codebase.`

			`Work your magic and then submit a pull request. We love pull requests!`

			`If you find the documentation lacking, help us out and update this README.md. If you don't have the time to work on Scrutiny, but found something we should know about, please submit an issue.`

			`# Versioning`

			`We use SemVer for versioning. For the versions available, see the tags on this repository.`

			`# Authors`

			`Jason Kulatunga - Initial Development - @AnalogJ`

add license and logo link. 4 years ago			`# Licenses`
init 4 years ago
add license and logo link. 4 years ago			`- MIT`
			`- Logo: [Glasses by matias porta lezcano](https://thenounproject.com/term/glasses/775232)`
add thankyou comment to README. 4 years ago
			`# Sponsors`

			`Scrutiny is only possible with the help of my [Github Sponsors](https://github.com/sponsors/AnalogJ/).`

			`[![](docs/sponsors.png)](https://github.com/sponsors/AnalogJ/)`

typos. 4 years ago			`They read a simple [reddit announcement post](https://github.com/sponsors/AnalogJ/) and decided to trust & finance`
			`a developer they've never met. It's an exciting and incredibly humbling experience.`
add thankyou comment to README. 4 years ago
			`If you found Scrutiny valuable, please consider [supporting my work](https://github.com/sponsors/AnalogJ/)`