 |
 |
 |
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
Raid Again Storage using Commodity Hardware And Linux
RASCHAL
Using Linux and inexpensive IDE drives for building large storage
systems is becoming more and more common. RASCHAL is such a storage
system, with 40TB of storage, built at JPL for use by the PAT
group data intensive science investigations, and also as a testbed for
cluster storage systems. The whole system was designed and built in
about six weeks, and it became operational in April 2003.
The system was assembled by Jimi Patel, who also handled the final installation details.
This storage cluster has been under heavy use ever since it became operational. It is used by the OnEarth website, for both the public access data and for work on the future world landmass image.
Please scroll to the bottom of this page for updates on this system.
Reference
herein to any specific commercial product, process, or service by trade
name, trademark, manufacturer, or otherwise, does not constitute or
imply its endorsement by the United States Government, Jet Propulsion
Laboratory, or California Institute of Technology.
 |
RASCHAL has 160 IDE drives,
organized in ten separate rack mountable PC cases, with each case
containing a Linux system and sixteen IDE 250GB drives.
Each Linux system has 1GB of RAM, a 2.4GHz XEON CPU, an ATX
motherboard with integrated dual copper Gigabit Ethernet and video card,
a CD-ROM drive and two 8 port IDE RAID cards. Each set of eight drives
controlled by an IDE RAID card is configured as RAID5, with one of the
eight drives being redundant. The IDE drives are mounted in externally
accessible removable trays. Since both the trays and the RAID card
provide support for hot-swap, a single disk failure within each set of
eight drives can be detected, and a replacement can be installed without
affecting the system operational state.
The PC cases are 4U tall but due to mechanical considerations they are
mounted in the rack with 1U of spacing between them. Each case is
powered by a 450W power supply. Operational power requirement seems to
be closer to 250W per case, with no overheating observed. Each case
contains a separator wall with five fans, producing a forced air flow
from the front of the case to the back.
The system at the top left is the host system, an 8 CPU SGI O300 server
and an external 1TB Ciprico Fiber Chanel disk system. In the back of the
left rack, a 24 port copper Gigabit Ethernet switch is also mounted. It
provides two connections to each storage unit and four additional ports
for client systems.
No configuration data or operating system is loaded on any of the
storage units. Upon power-up, each one of them starts by loading a Linux
kernel from a CD-ROM, followed by loading a complete Linux OS from the
host system via NFS. Since all unit shares the same identical copy of
the software, maintenance of the whole system is simplified. Each set of
16 drives is configured as one logical drive, and is accessed by the
hosts systems via NFS. Point to point sustained data rates in excess of
20MB per second have been observed. |
 |
A 250GB disk mounted in a tray,
ready to be used. The hot-plug connector in the back provides both the
parallel IDE connection and the power required by the disk. The tray has
a very simple mechanical lock system to prevent accidental removal of
the disk, and also provides a power and an activity LED on the front
panel
These drives were the largest drives available at the time RASCHAL components were chosen, and providing the best disk price to overhead price ratio. Since they are low power, 5200 RPM drives, they are also a very good match for this type of application. If the Maxtor 320GB drives have the same profile, using them will increase each unit capacity to 5.1TB. They might also require a higher rated power supply.
|
 |
From this angle, five fans
providing forced air flow to are visible. The front of the case is the
far end in this picture. The metal brace in the foreground of this
picture provide extra stiffness to the case, and also provides
mechanical support to the PCI cards. |
 |
The fans hosted in the middle
wall can be individually removed, as in this picture. All the power and
IDE cables have to be routed under this separating wall, removing the
fans provide a bit more space. The drives are aranged in three stacks of
five, with the last one mounted vertical, of the lef side of the
others. Each case holds 16 IDE drives, for a total capacity of 4TB.
|
 |
In this view, the stiffener
brace is removed, providing a better view of the two IDE RAID cards. The
power supply, the CPU fan and the memory banks are also visible. A mix
of flat and round IDE cables was used. The middle connector of the IDE
cables is not used, since each drive needs a unique connection. Installing these cables, and the power cables is not easy. Space under the airflow dam is limited, and due to the location of the PCI slots, long cables are required. The new serial ATA interface will greatly improve this situation, but will not improve the performance.
While the motherboard is dual CPU capable, only one XEON CPU is used. This reduces the power consumption by a considerable ammount, and should not affect performance much.
|
 |
This is a view of the whole PC,
without the top cover. The front of the case to the right. The CD-ROM is
mounted on top of the IDE drive cages. The mounting bracket on the top
right of the drive cage can host a slim floppy disk, not used in
RASCHAL. Above the CD, there is a flat speaker (blue label) and the
wires to the power and reset buttons.
The fan separator wall with the five fans is clearly visible in the
middle of the case. The system does not seem to overheat, and five fans of 5Watt each seems to be overkill. A few of these could be removed and replaced with baffles, to preserve the air dam integrity. Most of the motherboard is covered by the IDE cables. The CPU fan is visible, a second CPU socket is right below the
first one, not in use. The only PCI cards are two IDE RAID cards. Video
and dual Gigabit Ethernet are integrated on the motherboard.
The power supply is in the top left of the picture. Since some of the power supplies had to be replaced due to a brownout, we discovered that this configuration is very demanding. The original 460W power supply works great, but 500W and 600W versions fail to power up the system, due to a lack of enough power on the 12V and 5V power lines, the only ones used by the disks and fans. It seems to be more tolerant during the initial power surge caused by the disk spinup.
The back panel of the case is a standard ATX breakout panel plus the two GigE connectors. |
News:
After a short an unfortunate incident caused by a brownout, RASCHAL is fully functional again. Moving the drives in use to a spare unit and rebooting was all that was needed to recover function, in about 4-5 hours. Three failed power supplies were identified and replaced within a week, and all ten units are again available.
Storage performance proved to be reasonable, with sustained data rates in excess of 20MB/sec being observed for a single host. The gigabit ethernet router acts as a matrix switch, including connections for multiple hosts, make disk operations can be active at the same time, providing much higher system crossection bandwidth.
Raschal seems to have a hard time dealing with heavy IO loads, especially when they generated by multi-megabyte read and write requests. This seems to trigger a serious failure in the IRIX NFS v3 client.
July 18th 2003: -
A few RASCHAL units have been updated to Linux kernel version 2.6.0test1
Dec 28th 2005:
RASCHAL is now running Linux kernel 2.6.3 from an additional system disk installed in each box. These changes have improved the NFS performance to 40MB per second, and much improved system stability. The 3Ware RAID card bios has also been updated, which makes it possible to use the command line interface to manage the RAID units without having to cycle the power, even if the software is not supposed to work under Linux 2.6.
A very large number of the drives, close to 50%, have failed in the last two years and have been replaced. The exact cause for the very high failure rate is not known, but there are a number of factors that might contribute. Construction work in and around the server room has taken place almost constantly, with numerous power failures, brownouts and power glitches. The drives themselves were from the first batch of available 250GB drives on the market. The original power supplies for the RASCHAL computers are marginal for this configuration, especially at power-up. Larger, newer or faster disks can not be used in this configuration to due higher power requirement and software limitations.
|
 |
 |
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
|
 |
This page,
http://pat.jpl.nasa.gov/public/lucian/RASCHAL.html,
is maintained by
and was last modified Friday, 06-Jan-2006 10:17:43 PST
|
|
 |
 |
 |
 |
 |
 |
 |
|