Risk or Redundancy?
by Bob Rodgers

Most security managers would say that their digital video recording system is mission critical, yet most digital video recording systems on sale today would come to a long and embarrassing halt if a hard disk failed. Is such an event to be regarded merely as a fact of life – a very low risk that does not merit concern? Or is it a significant hazard which, with a little extra knowledge and forethought can be avoided? Bob Rodgers of Geutebrück UK provides some background information to enable you to judge for yourself.

Why do disks fail?

Disk failure can result from inherent disk errors or handling damage, or a combination of both plus or minus some element of general wear and tear. Although present from the time of manufacture, disk errors may not manifest themselves for quite a while, perhaps not until the disk is almost full. The same is true for damage caused by dropping or mishandling.  It might be immediately evident, or might only appear after many re-writes.

The fact that flaws or damage are not immediately evident is due to the disk’s clever fault-tolerant design which includes a sizeable surplus capacity for coping with bad sectors. As part of normal disk management, its on-board software ensures that if a bad sector is detected while data is being laid, the data is redirected to a spare sector and the bad sector avoided in future. This maximises the useful life of the disk and means that the disk itself does not fail until all the spare capacity is exhausted.

The relevant point to appreciate here is the unpredictable nature of disk failure which can manifest itself with little or no warning.

The consequences

In the vast majority of digital CCTV systems, if a disk fails the whole system stops.  And to add insult to injury, not only is disk replacement a skilled job but it has to be followed by the re-formatting of the whole database, itself a lengthy process, involving the inevitable loss of all recorded data.

But how likely is this to happen? If the manufacturer’s stated MTBF (mean time between failure) for a hard disk is 500,000 hours or 50 years, why worry?

Whether in fact you need to take account of this rather depends on the size of your system.  And as systems get bigger and bigger this becomes more and more of an issue.  Since system development is being driven by demand for ever greater data storage capacity, developers have been rising to the challenge by taking a twin pronged approach: by increasing individual disk capacity, and crucially in this context, by increasing the number of disks in each system.

A hard disk is a very clever piece of engineering. It is composed of a stack of flat disks or platters, which store data as magnetic patterns in concentric tracks. Heads above and below the platters read or write the data as the disk spins at a set rotation speed of 4500 to 7200 rpm.

Even though the average disk may be very reliable, the statistical risk of experiencing an individual disk failure increases with the number of disks. So despite a high MTBF, if your system has a large number of disks, you should be prepared to suffer disk failure at some stage.

Whereas some systems disregard this risk, others are designed with built-in redundancy to enable them to keep running even when a disk fails. Redundancy in this context can mean duplication of hardware, data, or a back-up facility based on the ability to reconstruct lost data. Geutebrück is one of the manufacturers who have borrowed a successful approach from the IT world: the redundant disk combination or RAID.

What’s a RAID?

A RAID is a “redundant array of independent disks”, a set of standard commercial hard disks which the operating system sees and treats as if it were an enormous single logical hard disk. Developed for different kinds of IT application, RAID systems come in different configurations and employ a selection of different techniques in storing and securing data.

In their simplest form RAIDs use either a striping (RAID-0) or mirroring (RAID-1) format in storing the data. For striping, each drive's storage space is partitioned into units and data is stored by writing data to them in order, so that the data is spread or ‘striped’ across all the disks. This system allows data to be accessed quickly by reading several disks simultaneously, but on its own offers no redundancy or other form of data security.

A 4.5 TeraByte picture database using RAIDs supplied by IBM.

A RAID-1 system on the other hand uses disk mirroring or duplicating to provide full redundancy. It has at least two drives which store identical data. Consequently it is expensive in terms of disk space but is useful in situations where system control discs need disk redundancy and in multi-user applications where many different records can be accessed at once.

Other RAID levels employ one or more of these functions and combine them with various checking or correcting processes. See Fig. 1 for a summary of characteristics.

RAID-5 is the configuration we are most interested in for CCTV applications. It is the most versatile since it provides the best balance of cost / performance / data protection and it is one we at Geutebrück offer for storing large video databases in our MultiScope II-based CCTV and digital recording systems.

Secure and efficient

RAID-5 employs the striping principle to distribute data and parity in blocks in rotation across three or more disks. - ‘Parity’ being extra bits generated using a parity algorithm, in effect a mathematically generated summary which is stored on different discs from the original data. - This strategy offers the great advantage that lost data can be recalculated from data and parity on the other drives if a disk fails. It provides full redundancy but more space-efficiently than by mirroring.

If the level 5 RAID has been configured with a hot spare then the array’s controller automatically actions the reconstruction process when a failed disk is reported. If not, then the system automatically detects the replacement disk and triggers the reconstruction process once the disk has been replaced. Thanks to a hot swap caddy, changing the disk does not require any special skill and can be done with the system still powered up and running.

RAID-5 commended

There is a strong argument which I would commend, that in fact EVERY system needs the standard of redundancy offered by RAID level 5, because no matter what the system or where it is, as soon as it is installed, it becomes mission critical and the user cannot live without it. Some recent Geutebrück customers who have come to a similar conclusion and opted for the peace of mind offered by a RAID-5 video database include airport operators, banks, casinos, logistics companies, a road toll operator, power stations and town centres.

Figure 1