Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Hardware

Pros & Cons of Different RAID Solutions 261

sp1n writes "Our mail server has hit a major disk bottleneck, and we're considering a RAID 5 solution. We're a local ISP with 13,000 users that's been around since 1993. The mail server is a Sun UltraSparc 2 w/ 768M ram, a 10000rpm 9 gig for mail storage, a 7200rpm 7 gig for spool, and another for system mountpoints, all on a u2w bus, and it's running exim. It reaches a load of 10 on a daily basis, and hits around 20 once a week. The spool and mail storage drives are cranking away constantly. Requirements are: external RAID controller, scaleability, speed, and a rackmount case for the controller & drives." Sound intriguing? If you've ever wanted to learn a bit about RAID, then hit the link below.

sp1n continues: "We are currently considering 3 options:

(1) SCSI - EIDE controller with six 9G/7200 ATA drives (hadn't heard of this one until recently). This supposedly accesses the drives directly through DMA and bypasses all IDE, just using them as physical media. All are accessed in parallel. I'm a bit weary about the reliability of IDE drives under constant use.

(2) SCSI - SCSI controller with six 9G/7200 u2w drives. The controller currently at the top of my list is the Mylex DAC960SXi w/ 32MB cache. However, something that fits in a half-height bay, instead of hogging a full-height would be nice.

(3) SCSI - SCSI controller as above, running with 2 disk channels and 2 separate RAID 5 arrays for each mountpoint (spool/mail storage).

I'm looking for any experience with IDE/DMA raid setups (1), as well as the pros/cons of making 2 partitions, both which are very active, on one array of 6 drives (2), as well as 2 separate level 5 arrays of 3 for each mountpoint (3). In addition, any suggestions for external controllers and rackmount enclosures would be greatly appreciated. I would like the controller to have an i960 or better processor.

--
"The glass is not half full, nor half empty. The glass is just too big."

This discussion has been archived. No new comments can be posted.

Pros & Cons of Different RAID Solutions

Comments Filter:
  • AFAIK, The only difference between a scsi hard drive and an ide hard drive is one little controller chip on the drive. So the reliability of the ide drives, mechanically, should be identical to that of the scsi drives.

    Jeremy
  • by sPaKr ( 116314 )
    I had a similar problem. I went with the sun StorEdge A1000. Its just a greate piece of hardware. I got 12 18Gb drives, 10,000 rpm segate cheetas. Its in 2 raid 5 clusters. With on hot spare. I needed to geta differential scsi adapter.. as they dont come standard on ultra2's. Wow is it fast. I can move GB in what seems like seconds. Its a night and day improvment over a jbod box. A bit pricey.. about 17K after our 50% edu discount. Its all scsi-scsi, host swap disks, host swap power supplies. When you running solaris nothing beats sun hardware.. it just works.
  • You might also consider just adding multiple scsi controllers and have as many drives as possible.

    With each additional drive, you can access another unique piece of data simultaniously. While raid is nice and helps solve reliability and performance problems, it isn't the only solution.

    It is a technique that newsgroup server admins used to use, and probably still do.
  • HAH!

    I'm going to laugh at you! Scsi drives have more reportability for failures and have some "dead space" set aside for failure recovery in hardware. Also, the difference between the scsi and ide models are huge. In ide the cpu does more, in scsi the hardware does more.
    Oh well...
  • by vectro ( 54263 ) <vectro@pipeline.com> on Thursday November 18, 1999 @09:36PM (#1520185)
    Before you go out and purchase an expensive RAID solution (of any kind), make sure this is really the problem. The vmstat command will make it quickly apparant what kind of i/o is happening, and further analysis might tell you more about what kind of hd accesses are happening.

    In many cases, adding more memory or CPU can make a bigger difference than more/faster hard drives, if the problem is that the cache is too small, or paging activity too much. Also check your CPU load and make sure it is nowhere near 100% - if so, time to get a 2nd CPU.

    Also, avoid software RAID implementations like the plague. They will slow down your system and provide questionable reliability. You should also try to find cards that have redundant SCSI controllers onboard, and support redundant cabling. This way if the cable, plug, or SCSI bus fails for some reason you will not be SOL.

    Finally, be sure that the majority of your disk accesses are reads. RAID will slow down writes, sometimes drastically so. If the majority of your disk accesses are writes, then tuning your kernel to flush dirty buffers less often may make a good difference.
  • You may want to look at a Dell Powervault as a possible solution. Check out dell's website [dell.com] for details. They are VERY reliable and VERY fast, not to mention Dell has the best support in the industry.
  • Mechical reliability shouldnt be bay different between a scsi drive and an ide drive if they operate at the same speed (RPM's)
  • You mean the Network Appliance Filer that Dell resells and calls a 'PowerVault'? Yes, they're very nice. But Dell doesn't make them, and 3rd party support is rarely as good as getting support directly from the manufacturer (at least for those manufacturers that also sell their products directly to the public.)

    That, and they start at around $50k for 100GB, which isn't even local storage - it's network storage. (Choose CIFS, NFS, HTTP, or whatever else they support.)

    Not that these aren't great boxes - we have one and are about to get a second one. But they're pricey and not as fast as local storage - which I believe is what this guy is looking for.
  • The Network Appliance Filers [netapp.com] are really sexy.

    The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.

    Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.

    Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.

    This paper [netapp.com] discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.

    Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?

  • It doesn't sound like you need a lot of space if you're currently doing well with 9GB and 7GB. Get a pair of 18GB drives for the spools and a pair of 18GB drives for storage, and you should be set.

    RAID 0+1 is a lot faster than RAID 5. It's disadvantage is that it's more expensive because you have to buy 100% more disk than storage, as opposed to 20-33% more for RAID 5.

    As far as which controller to use... Sun now rebrands DPT controllers, but they're pci and you're stuck on sbus, so I don't know.

    Good luck
  • by Falsch Freiheit ( 7780 ) <freiheit@@@gmail...com> on Thursday November 18, 1999 @09:56PM (#1520193) Homepage
    First off, it's not clear from your post how heavily loaded the drives really are.

    In particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.

    All you mention is the load averages and a fairly non-specific measure of drives that are "cranking away constantly". If the drives were being used at a current constant 10% of available I/O, they'd tend to "crank constantly" even if they could be hit much harder. (still, given that losing email is considered bad by customers, a RAID 5 solution seems like a good idea anyways and leaves you room to grow and handle sudden increases in email from the holidays or spammers or gradual expansion of business)

    As to IDE vs. SCSI -- never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose. Sure, it slows down access to that particular block of data, but it's a lot easier than the OS having to deal with failures directly. However, I'm completely unfamiliar with the strange SCSI - EIDE setup that you're describing -- if it treats them as just physical media and provided the SCSI interface itself, it may be able to do that particular SCSI trick, as well. Physically, SCSI drives and EIDE drives are identical -- as in, you can find the *exact* same drive from certain manufacturers, only one has SCSI and the other EIDE. Reliability of the physical media is the same, IOW. In a normal configuration, *apparent* physical reliability is higher for SCSI due to wonderfully useful trickery.

    I don't recall the exact model numbers, but I've seen pretty good results with Mylex RAID controllers before. (more along the lines of database stuff than what you're talking about -- somewhat different needs, but not all *that* different, I suppose.)

    I can't see putting two partitions on one RAID device as making a lot of sense -- since things are striped you'd end up running into contention issues.

    IOW: I'd guess that option #3 would be the fastest -- it's also probably the most expensive.

    If I were you, I'd check more carefully to determine how much of the currently available disk I/O is actually being used... If the budget allows it, the dual-channel RAID solution sounds pretty good. You might want to go with two single-channel RAID cards instead -- makes it easier to stock a backup card in case a card decides to die. Try and get something with hot-swappable drives, too. It makes the RAID stuff so much more useful.


    Also, I don't know the details of your setup (of course), but seriously consider breaking the mail serving task into separate pieces and run it on separate machines.

    You have:
    1) incoming email
    2) outgoing email
    3) email from customers
    4) email customers pick up (POP)

    It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)

    You can have a setup such as:
    outgoing1 through outgoingN all behind the single name of "outgoing" that internal machines are told to send email to that they don't know how to deal with
    mail1 through mailN all behind "mail" that customers are told to have as their outgoing mail server. In particular, it should blindly send off email it doesn't know how to deal with to outgoing.
    pop (harder to break into separate machines, but possible)
    incoming1 through incomingN with MX records pointing at them for your domain.

    Now, breaking into that many machines is probably silly. Moving outgoing to one machine and everything else to a second machine (and possibly mailing lists off to a third machine) may make a *lot* of sense though. Don't get tied into the idea of a monolithic machine to accomplish everything related to a particular task -- eventually it's much more expensive than many cheaper boxes to handle the same task.
  • by thesteveco ( 20012 ) on Thursday November 18, 1999 @09:58PM (#1520194) Homepage
    We've just spent 2 weeks at my office researching the different solutions available to us for implementing the most reliable and scalable solution available today. Our needs differ a bit from yours as we're looking to put many machines on a network for load-distribution yet they all need to speak to the same data on a single repository. This holy grail is know as a SAN, or Storage Area Network.

    Our solution is going to be a single cabinet RAID (level 5 for accessing smaller files) with a "hot spare" that will rebuild a crashed disk on the fly. This being a standard cabinet we'll have 8 disks, of which the capacity of 6 will be data (one parity (term used loosely as parity is striped on RAID-5), and one spare).

    The disks are Seagate's 10,000 RPM Cheetahs, the most commonly recommended units among all the vendors we've talked to, and the controller is a multi-channel u2w with fibre interface to a Q-Logic PCI adapter.

    The total system is going to run just over $15,000. This sounds like a lot, but pricing lower end systems isn't too much cheaper and you'll never get 24-hour turnaround on failed parts (if they're even available). This seems like overkill for a single system, but by adding a fibre hub later we can use the single system for many many machines once a file controller (dedicated machine) is put into place.

    The beauty of SAN is that it operates much like FTP, with a control and a data connection. The control connection occurs over your existing LAN, and the data is transmitted directly over the fibre channel (max rate of 100 MB/s).

    Other NAS (Network Accessible Storage) models are somewhat cheaper to implement, but performance can never match the fibre as the "control" and "data" connections (NFS or SMB) both transmit across your network.

    I apologize for digressing from the straight RAID topic, but I felt obligated to give the /. community something to chew on in return for all that I've learned here.

    -Steve
  • True.

    However, SCSI drives reserve dead space and move the contents of bad sectors to a reserved sector and remap the bad sector to point at the previously reserved sector.

    IOW: SCSI drives hide physical defects on the media from you, where IDE drives require the OS to deal with the problem.
  • I'm thinking of getting one myself. It's supported in Linux, does hardware raid 0, 1, 0/1, 3, 5, 30, and 50. Does anyone have one? Is it decent? Can I trust my data to it? It's $150 on pricewatch, which sounds like a damn good deal for something with its own CPU on board.

    - A.P.
    --


    "One World, one Web, one Program" - Microsoft promotional ad

  • on the IDE v SCSI be careful. with some drives the difference really is just a chip, but often drive manufacturers will use different actuators and such for SCSI drives (due to the fact that they're more likely to be dropped into a high-stress environment). The MTBF for a drive that's expecting to run grandma's recipe book is not relevant when used as a high-stress server.

    I'd suggest a SCSI or Fibre Channel raid array, with some 10,000RPM drives, and lots of cache on the drives and the controller. If you are currently IO-bound, you want to make sure that you remove that bottleneck for at least a couple years. Some sort of external enclosure might be nice if only due to the fact that 10,000RPM hard drives make a LOT of heat, so it keeps things a little less critical. Oh, and of course I'd recommend using RAID-5 for obvious reasons. RAID-0 is faster, but clinically insane.
  • by aran ( 79680 )
    Another solution is to look at Communigate mailserver from http://www.stalker.com
    It allows you to cluster your mail server to multiple servers with very little fuss.
  • Really Expensive (tm). But they work, and they work well.

    Yes, they're network attached. Good for stuff that is going to be used over the network, naturally. Not good if you need -really- fast access to the data from -one- server. They have CIFS, HTTP, NFS, and something else. We use this for all of our UNIX and Windows home dirs - the same data is accessible via either NFS or CIFS, which can be quite convenient at times.

    The feature I like the best from their WAFL file system is the snapshot. It's configurable, and can be set to take hourly and nightly snapshots of the entire file system. A user deleted a file? They can go back into their .snapshot directory and retrieve the copy themselves. Sure is a lot easier than having to pull files from a tape, and I don't know anyone who does hourly tape backups. :)

  • don't listen to that crap, the scsi drives are built for industrial use. You get what you pay for. Go for the scsi setup you will be glad in the end. as for the raid five config, if you dont have the fault tolerance on (parity stripe) and are just doing it to crank every bit of speed you can out of that box well go for it; but what the other guy said about checking the amount of writes your raid is doing, sounds like a well thought out solution. i know people who have professed their love for ide but when they get a taste of scsi, they rarely go back.
  • by troutman ( 26963 ) on Thursday November 18, 1999 @10:11PM (#1520201) Homepage
    This is only mildly applicable to your question since it isn't for Solaris, but it is all I have to offer.

    I spent a fair amount of time looking at RAID 5 solutions this past summer for a client. Both external and internal, for Linux. Tried several different controller card brands and drive configurations, did a lot of reading, and bugged a lot of vendors.

    You really should try to test your options and all of the configuration combinations using something like Bonnie [textuality.com], on a machine with a simular configuration to your target server. Make sure that your Bonnie test file size is at least twice physical RAM, to eliminate the effects of RAM and controller caching on the results.

    I found that using 6 drives in a RAID 5 config was a LOT faster than 5 drives, most of the time. In fact, 3 drives in an array was faster than 5 in some cases. I think it has to do with the way the controller cards were calculating the distributed parity, and perhaps also due to things the driver was doing. 4 drives usually wasn't much better than 3, either.

    Stripe sizes for the array can also make a big difference. 32k vs 128k, etc. Larger strips sizes are usually better for I/O speed, but you may find for email that having a higher number of random seek transactions per second is better than raw speed.

    I did not get a chance to do any hard testing of multiple channel configurations with these cards. I suspect that splitting the I/O onto multiple channels would be a win.

    IMHO, you definately want a i960 based board or system, with the fastest CPU you can find on them. I noticed a signifigant difference between boards with the 33Mhz part vs. the 66Mhz part.

    FYI for others: for controllers, the AMI MegaRAID (alias Dell's PERC2/SC) just blows chunks. Older non-LVD, non-raid SCSI systems can run rings around it, at least on write speed.

    It has been my experience that the write speed on a RAID 5 system is generally only a fraction of the reading speed, like 1/4th to 1/2. For a quick and stupid test, do something like 'time cat /proc/kcore > /tmp/kcore' and do the math for MB/second.

    oh, and my current favorite card is the DPT Millenium V controller, using it in several systems in various places for the last 3 or 4 months. Here are some Bonnie results for a system with a DPT with 6x 7200 RPM drives, all on the same channel (internal) Linux kernel 2.2.10, dual P3 500Mhz:

    -------Sequential Output-------- ---Sequential Input-- --Random--
    -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
    1024 7637 97.5 16743 15.2 9561 19.4 8384 98.3 52923 36.2 583.2 9.0
  • by ZxCv ( 6138 ) on Thursday November 18, 1999 @10:11PM (#1520202) Homepage
    our setup has right about 31,000 users constantly checking and sending email and is running RH 6.1 on a dual PII/333 with 128mb ram and 9g UW SCSI. I haven't seen a load higher than 0.75 since that machine has been the mail server... maybe something about how your mail server is setup is creating a tremendous bog on it.
  • A big diff is that scsi drives have separate heads for reading and writing and thus can do both at once while ide drives has to do either or, effectivly making a very busy scsi-drive twice as fast as a very busy ide-drive.
  • Reread the suggestion. It's about using an IDE RAID controller that bypasses most of the IDE stuff, but that works with IDE drives.
  • You've got to look at the disk access characteristics of mail servers. In many cases, you'll find that you have lots of writes, and a comparable number of reads. RAID5 works best in situations where you need space, redundancy and reads, but if you want good write performance, you need to sacrifice space and go with a RAID 0+1 solution, also known as a mirrored stripe set.


    I would recommend a Sun MultiPack [sun.com] with Solstice DiskSuite [sun.com] for management.

  • Load average is defined as the number of processes sitting on the run queue. This need not indicate a disk IO bottleneck.

    I would be surprised if any exim system was having more of a bottleneck to disk than it was to network. Your disks are faster than your network and exim is pretty light on un-required disk access.

    The more bottleneck to network (by network I mean end-to-end with your customer not just your links) is large, the more processes are going to hang around longer.

    More processes, more paging, less cacheing. Less cacheing, more IO. More paging, more IO.

    Probably teching granny to suck eggs - but you do have your swap space on a seperate device don't you ;)

    The more exim processes that hang around longer, the more processes for the CPU to switch around. The more switching, the more likely you are to see paging.

    If the processes hang around longer, they take up more memory which reduces the cache-size available.

    Exim has several files which it accesses frequently, mainly the retry databases and its configuration. These should perminantly be in memory.

    Bottom Line:

    I do however suggest that you don't consider moving a single server to RAID. If you have a server that you want to move to RAID for efficency purposes... your design is wrong and you should be building a scalable system .

    Red
  • Personally speaking for a load of this magnitude SCSI is the only solution.

    Don't even think of software RAID.

    For some background on SCSI itself try http://www.scsifaq.org

    There are many types of RAID 0-5 are the "standard" but there are several new ones eg level 10 which attempts to address throughput issues. Your actual space requirements don't seem outrageous so level 5 would be reasonably cost effective.

    Another thing you will probably want is hot swapping. Once you've had a box tell you a drive is dead, you've removed it and popped a new one in without taking the box down, you will not want anything else.

    On the IDE vs SCSI debate, whilst IDE is fast it seems to me that under continuous load SCSI gives better throughput.

    As others have pointed out - a 'designed' server, rather than a "roll your own" box would make sense. Compaq Proliants make excelent Linux machines. The SMART arrays are very good and support RAID to level 5. You can fit a lot of disks in the drive cages as well. They are a little pricey but of a good quality and reliability. We have rather a lot of them running NetWare. I get to use the older kit to run my funny Open Source stuff ...

    A suggestion might be:
    Proliant 1600, 2 x 600Mhz processors, SMART 3200 with 64Mb cache, 5 drive slots - 81 Gb available after RAID 5 on 18Gb 1" drives (that's Ultra-2 SCSI) supports upto 1Gb RAM (has 128 by default). There is also an on-board SCSI interface for CDROM etc. This comes in at about GBP 9,000
  • I'm not familiar with Exim, but aren't there more efficient solutions?

    Although my experiences have been with much smaller configurations, qmail [qmail.org] reportedly handles loads of this magnitude on lesser hardware.

  • My raid experience comes from nt software raid and using AMI MegaRaid controllers. For performance the following things are important

    PCI Bus-- The fastest controller/drives wont make a difference if the PCI bus cant get data to the drives fast enough. Look at what else you are running, consider upgrading memory/processor like another person said.

    Stripe Size-- In a hardware raid setup the controller will write to one hard drive for xxx kb before switching to the next hard drive. You want to figure out what size 'chunks' of data the OS will send to the controller. Netware uses a 64k block size, which means large file reads/writes will be sent from the OS to controller in 64k pieces. If your stripe size is set to 8k, and you have 6 hard drives in a raid 5 array, look at the following situation.
    drive1 - 8k total=8k
    drive2 - 8k total=16k
    drive3 - 8k total=24k
    drive4 - 8k total=32k
    drive5 - 8k total=40k
    now time to calculate parity. this requires the controller to read data from drive1,2,3,4,5, calculate the parity using an XOR algorithm then write the parity
    drive6 - 8k parity
    drive1 - 8k total=48k
    drive2 - 8k total=56k
    drive3 - 8k total=64k
    Now it has to calculate and write parity again.

    compare this to a stripe size of 64k
    drive1 - 64k total=64k
    calculate parity, write parity
    drive6 - 64k parity

    Having a poorly configured stripe size can cause a huge performance problem. NT and NetWare(current versions) both optimize their disk writes to 64k. YES! I know the block size in NT is 4k, but the OS still optimizes disk requests to 64k chunks for performance reasons. I'm not sure about various *nix, can someone else answer that? Some people have the notion that writing smaller amounts of data to multiple hard drives is somehow faster. Hard drive maximum transfer rates are based on controller->hdd cache. A 64k or 8k write isnt going to fill up the cache on the controller, and a single 64k write will take less time on the controller, fewer commands will need to be issued, and performance will be better overall.

    An anecdote about this.
    Copying a 1.5 gig file from a workstation to a server with the stripe size at 8k took about 40min, with the stripe size at 64k it took 6min

    Another consideration is how much cache the controller has and what its use is. The AMI Megaraid controller has 3 types of cache. Write, Read and IO. Write cache allows for Lazy Writes, which can improve performance. Read cache will allow the controller to read ahead, hopefully improving performance. IO cache(and I20 cards) allow the controller to take some of the work off of the processor, improving overall system performance.

    Some controller come with multiple channels. The AMI MegaRaid series 438 controller has 3 different SCSI channels on it. IIRC each channel can transfer up to 80MB/S. This is similar to the idea of putting hard drives on different SCSI controllers except that I've never seen an implementation that allows a raid array to span multiple controllers.

    The above info IS NOT ACCURATE for RAID 0, RAID 1, or RAID 3, those levels have different rules. You should consult the OS vendor, documentation, and Database vendor for specific settings to optimize the controller.
  • by bgp4 ( 62558 )
    I used an A1000 for a while on the back end of a UE3500. It was a terrible piece of equipment as far as I was concerned. It broke... a lot. And there aren't any useful diagnotics that the box gives out, just blinky lights. There is no Out of Band notification to be had. If it breaks, you have to physically inspect the box, and even then you still may not know what the real problem is until you replace just about everything. BTW: the internals of the box are basically an Intel PC (it's got a 486 chip on the main board)
  • I used to run a large mail server at a fairly big ISP who will remain nameless, and I'd like to suggest you consider a RAID-10 solution, we were experiencing disk bottleneck problems, and this really helped. Basically, RAID-10 splits the disk i/o half and half over multiple drives with the standard mirroring/striping. This is a simplified explination, but that's the basic idea.

  • First try iostat -D -l (numberof disks+2) 5 to get percentage utilisation in 5 second intervals.

    This is my favourite tool for disk analysis. Secondly go to http://www.sun.com/sun-on-net/performance read what you feel is important but download the se toolkit.

    Run zoom.se to get a professional analysis of your system. Run virtual_adrian.se to get a virtual professional to tune your box.

    I recommend you do this BEFORE spending any money. I have an E3000 with 2Gb RAM and 2% processor utilisation because nobody checked the system properly.

    If it is your disks I recommend sun kit even though it is expensive and RAID 5. Don't worry about people telling you about it being slower, compared to a thrashing single spindle it is extremely fast and as importantly reliable. Tinker and learn!
  • It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)
    I suppose you could spam everyone and tell them to change that, and then have your router redirect that port to the appropriate machine for the people who forget.
  • Good point. You are right about everything except for the support. Dell's tech support is number one in the world and has been for some time now. Nothing you say will EVER convience me otherwise.

    On a side note another good solution (except that it's not external) would be a Dell Poweredge server. I'm currently running a Dell Poweredge server with Linux and RAID 5 and it works quite well.

    ...and yes, I'm bias, I work at Dell.... in support... :-)
  • ...Oh, and did I mention that you can order a Poweredge with Linux factory installed?!?
  • The AMI MegaRaid cards are excellent IMO. Very clean setup of raid arrays, uses simms for its cache(on some models) so you can upgrade the cache easily. Like you mentioned it has an Intel I960 processor and the newer ones are I20 devices. I20 is a standard where the card has a processor to offload work from the main system processor. Not all OS's support it though. I20 will also allow for 1driver per card, instead of 1 driver per card, per os, per os version.
  • by noy ( 12372 )
    if this is a server, don't go with IDE - you are a business looking for *safety* of the data as well as performance, and should be willing to fork over the extra 20 to 100 percent it takes for scsi...

    as for controllers, i say mylex, high-end adapter of your choice, i would beef it up to 128 megs of ram in any case...

    as for the drives, go 10,000 RPM, the difference in access times will help you out, and i think that is much more important in your case than trasfer rate... for an ISP, i would only ever buy IBM or Seagate drives, reputable workhorses that they are...

    for great cases and setups, i honestly recommend macgurus.com - they specialize in mac stuff, but a scsi tower is a scsi tower, and they will build it with good components at a reasonable price to whatever specs you need... (no, i dont work for them)...

  • Basically, I Just want something to Play With (tm).

    Just looking for a way to play with raid on a home system. As you put it, if it were to go down, who cares =) I'd rather make mistakes now while I can afford them.

    I see you guys like the case on my page =)

    -S
    Scott Ruttencutter
  • DANGER! Conflict of interest. Sun is my employer

    I'm sorry about your experience. However, I support Sun's internal hardware and I have not seen abnormal failure rates on the beasts. Sure, disks go bad - they have moving parts. I support loads of A1000s and they work great. As to diagnostics, that is a sore point for me as well. There's nothing really at the OBP level to test the array. They do come with software that is minimally useful however.

    It may be overkill, but I much prefer the A5x00s. All around though the hardware from Sun is VERY good.


    _damnit_
  • by lucky luck ( 42609 ) on Thursday November 18, 1999 @10:57PM (#1520227)
    Hi,
    a couple of years ago we had the same problem till I discovered that all our mailboxes where in one mail spool directory. This was a huge bottleneck and after adapting qpopper and configuring sendmail to a split mailspool dir load came down to 1. (split mailspool is /mail/a /mail/b /mail/c and all users which will begin with an a will be placed in /mail/a ... etc ... )

    check above first before you buy hardware

  • by Anonymous Coward
    I agree. Although, I used to work for Sun. ;-)

    I am currently contracting to a major shop setting up ISPs and we're using E250s with A1000s in the rear for data. I've been to 4 different sites in the world, with this setup and its just not failed so far, as long as you put a terminator on it. :-)

    The RAID Manager software is good for setup, but nothing else. I agree there's nothing for diagnostics on it, but I've never had any failure on the device, except when I kicked one and 2 disks popped loose. But the disks were fine after that.

    I wouldnt go for an A5x00 on an Ultra 2, just because a diff scsi card is much easier on the system then putting fibre in there and having more possibilities(?) of crap to wade thru. It is overkill. :-) james
  • port forward to a dedicated POP server... its not so bad ;-)
    We are all in the gutter, but some of us are looking at the stars --Oscar Wilde
  • Well, after dealing with many different brands of RAID controllers, I have found that DPT's Millenium series tend to be the best. The card takes care of everything, and they're available in 64-bit flavors with 3 onboard U2 channels, or 2 Fibre channels.

    Mylex are good if you're looking for a cheaper solution, or Adaptec for dirt cheap. But, if you're looking for the absolute fastest possible solution, it would be Fibre Channel Quantum Atlas 10k's on a 64-bit DPT Millenium Fibre controller in a RAID 0+1 configuration. With a 10 drive setup (equal to the total capacity of 5 of the drives) you could easily reach 100MB/s. Of course, that's gonna cost you a pretty penny.

  • by abulafia ( 7826 ) on Thursday November 18, 1999 @11:12PM (#1520232)
    Our mail server is currently handling about 1M messages a day. IO became a serious issue. We're still using sendmail, and I'm not going to give it up (we know it, we have a custom builds for strange applications, it works). As others have noted, load average doesn't mean much here - I have some machines with a load average at 4 that are actually idle and fine, and others at .2 that need tuning. Ignore it and concentrate on what matters.

    Assuming IO matters, I am putting my full faith (and job) on Mylex controllers. I love them. I only have one in production, but am about to deploy 5 more, and we'll come in at about 600G managed by them. They just work. The DAC960SXi I have in production (for 7months now) has been flawless, delivering wire speed doing RAID 5 without any effort after initial config (which is a bit annoying, to be sure).

    My production system using it is doing far too many things - mail, staging server, enterprise backup. This is changing - lack of time and historical accident made it that way. The point is that the Mylex handles it with no grief.

    If you're building these, be aware that Mylex external controllers need to be mounted in a box with "internal" style connectors. For good RAID cases, check out http://www.storagepath.com/ - they are what I'm using. They look low rent, but the boxes are nice (if a bit expensive).

    Down to specifics. For a mail only machine doing the sort of volume you're talking about, I'd deploy a dual processor box with three SCSI busses (one for spool, two for mbox/system access - system access is pretty cheap in comparison) attached to two harware RAID setups. Granted volume allows, I'd go RAID 5 for spool (with 18G disks, that's ~65G spool) and hot spares. For mboxes, I'd do 0+1, for as much space as needed. Stripe disks on independent controllers, mirrored to each other. Striped mirrors can grow, as you need them to (RAID 5 can't, easily). You don't want to lose anyone's mail. Hot spares for each.

    Assuming 100G of mboxes, that's a total of 17 18G disks. Add three Mylex DAC9660SXis and (initially) 3 rack mount cases, and that's something around ~24K.

    Availability beyond disk is a different question, that gets platform specific. I do mainly Solaris now, so I can't talk much about Linux for this. Mylex controllers can do dual active/dual host configurations, but things get more complex, and
    a summary here doesn't make sense.

    Other options like A1000s (Sun specific) and Netapps require different approaches - they're very different beasts. We have all of the above, and treat them very differently. We'll buy them all again - they're all decent - but are good at different things.

    If you can, buy raw Mylex contollers through a reseller like TechData or similar - you'll save a lot.

    Hope this helps some.

    -j
  • by Grimwiz ( 28623 ) on Thursday November 18, 1999 @11:13PM (#1520233) Homepage
    The first thing about a hardware raid controller is that it hides failures from the operating system. With software RAID you have to manually carry out all sorts of tasks, and I'm sure we've all heard of the engineer who mirrored the new blank disk on top of the one remaining data disk of a mirror.
    Units such as SUN A1000 and Baydel connect via SCSI and you just watch for an orange light, even the part-time cleaner could pull out the correct disk and replace it and have the system back and running without the OS noticing. Storageworks and Clariion(EMC) do the same but over Fiber Channel. SCSI units tend to top out at 40Mb/s, Fiber Channel theoretically top out at 200Mb/s (they have two 100Mb/s loops) but since I only had a max of 30x18Gb disks to play with the disks were the bottleneck. Monster multi-scsi machines like EMC/IBM's can achieve whatever bandwidth you want by multiplexing SCSI connections.
    We've evaluated software RAID, Hardware RAID over SCSI, Hardware RAID over Fiber channel from EMC, IBM, SUN, Compaq(storageworks) and in our opinion a good smart raid controller with two data channels and load balancing software is impossible to beat.
    For Speed, stripe(0) mirrors together(1), in RAID 0+1, this allows reads at double speed because each mirrored disk can handle a request seperately, and slightly sped-up writes because you can write to the RAID controller's NV cache and carry on doing your work whilst that takes care of putting the data to media.
    This of course has only a 50% data efficiency.
    Using Raid 3 or 5 you lose one disk in a rank for parity, raid 6 (used by Network Appliances) use two disks for parity but have wider ranks of disks. This often means that sequential reads are fast, because a request for data wakes up all the disks in the rank, but therefore the whole rank can only handle one request at a time. Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.
    RAID5 is really good for data which doesn't have to be the absolute fastest.
    Whilst we were doing performance tests, we measured a linear increase in speed up to 20 disks (in transactions/second), and there is a definite art in making sure that you spread the load over all the disks available so that a single disk doesn't get thrashed to death.
    In conclusion? well, that depends on your OS.
    For me, for a PC-based system I would choose a hardware RAID system with SCSI connection which let me choose the LUN sizes. 5 disks in a RAID5 configuration will only waste 1 disk in capacity. If you're finding your mail spool is being thrashed then I would build a 10 disk 0+1 raid and stripe the mail area across them, using the rest of the area for home areas or web areas or something else which has large storage requirements but doesn't get hit hard.
    Oops, this assumes that this REALLY is your problem, a lot of disk problems go away by adding more memory to the machine... I assume you have measured this by tracking the outstanding I/O queue.
  • by Anonymous Coward
    Don't even think of software RAID.

    I'd like you to back up this claim (if you can).

    You see, serious people like Deja [deja.com] does in fact use Linux software RAID and get it to work. Rather well too. Does zero-point-two-five-percent disk related downtime sound OK to you? It does to them.

  • The original posting doesn't say if the server is running pop/imap, and thus if it is used as the final delivery point for those 10,000 users.

    If it is, then the hashing of the mailbox path that lucky luck mentioned is worth investigating. Also worth investigating is alternative mailbox formats. If you're using mbox format, then I'm not surprised there's a problem if you have a large number of users (and/or reasonably large mailboxes).

    There has been some discussion about these issues on the exim-users mailing list [exim.org]. I read it via egroups. [egroups.com]

  • by kijiki ( 16916 ) on Thursday November 18, 1999 @11:22PM (#1520237) Homepage
    n particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.

    Correct me if I'm wrong, but isn't the load the average number of processes in the run queue? This would mean that processes that are blocked on the network or disk would be in the sleep (wait) queue, and not counted in the load average.

    In this case, a load of 20 means 20 processes are ready to run, which is not so good.
  • Yes, they are great cards. I bought a few AMI 428s from Onsale a while back. Not to turn /. into e-bay, but I have one that I want to get rid of. Pricewatch is down at to moment, so I don't know what they are selling for. I will let you have one for considerably cheaper.

    e-mail me at bm@datapace.com if you are still thinking about getting one.
  • It is important to really find out if the disks are the problem.
    I suggest you examine your system carefully to see what is actually happening. Besides using vmstat, iostat and friends you can get
    a software package by Adrian Cockroft which has a 'virtual adrian' which points out all the bad spots in the system.
    It can be found here : SE toolkit [sunworld.com]
  • LOL, i see you were sending that message from work! LOL! =]
  • Be sure to check out the www.clariion.com [clariion.com] web page for information on their fibre-channel external RAID units. These units can be managed separately through their own console connection and support redundant everything, including I/O controllers. They support Sun and Solaris. The box also supports hot spares, so if one disk fails, another is automatically bound into the RAID group and rebuilt. With just a single hot spare, you'd have to lose two disks before risking data loss.

    As for me, I'm considering their lower end SCSI boxes connected to high-end Intel server running Linux, beings I have $52,000 to spend this year! (yippee). The idea is to put all the money where the valuables are (the data) and use commodity hardware and open source software to drive it. The OS would boot from internal HD and all data and local customizations (ie, /usr/local) would be on external RAID box. If a CPU box fails, unplug it from the array, plug in a spare CPU box, reboot. Minimal downtime due to hardware problems. I can then repair or replace the busted CPU box at ease.

    For linux jockies, there is efforts to bring fibre-channel drivers to Linux. Be sure to look at the work at Worcester Polytech [wpi.edu] for info.

  • When I worked at Demon, the netapps were one of the most reliable pieces of machinery that I administered. Whilst you might think that network attached storage can be a performance problem, in practice it worked very well indeed.

    You do, however, need to be aware of how to make your application play well over NFS. Exim is actually reasonable at this. Qmail is good at storing mailboxes on NFS thanks to it's Maildir technology, but the mail queue *needs* to be on a local disk... I'm not sure about postfix or sendmail (bletch).

    Unfortunately, I can't remember the command to make the individual LEDs on the disks blink, which is one of the best remote diagnostic features ever. :-)

    -Dom
  • There is really not so much that differentiates ATA from SCSI anymore. ATA (formerly known as IDE) drives have been remapping bad blocks transparently for years, they have been doing DMA for nearly as long, and some drives even came in ATA and SCSI versions (IBM DCAA/DCAS for one), where only the interface board was different and absolutely everything else was equal.

    There is even a usable external ATA RAID subsystem out there, manufactured by Arena. They use the same i960 that is used on high end SCSI RAID controllers and deliver decent performance with cheap drives. (Remember: The I in RAID once meant inexpensive)

    Of course, in a server, you want reliable drives. But that has next to nothing to do with the interface. UDMA is very reliable as far as the interface data transfer is concerned, I would rate it even higher than SCSI in this regard (proper CRC vs. ordinary parity). The quality of the disk mechanism is another thing, but with IDE drives being so cheap, you could afford to upgrade the things so quickly that they never get a chance to fail at work. Or you could just buy two big ATA drives for less than one SCSI drive and do RAID1.

    For the records: Recent ATA drives really scream. Look at these bonnie results from my workstation (dual P2, 128M, Red Hat 6.1, 2.2.13, Test run on 2 GB / Partition 50% full):

    -------Sequential Output-------- ---Sequential Input-- --Random--
    -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
    512 18196 97.7 23648 22.5 10807 19.1 19702 84.2 23128 6.9 129.9 2.0

    The drive is a 20 GB Seagate ST320430A which sells for less than 400 DM around here. Remember: These are not artificial results on an empty filesystem. This is my real root partition which is used daily.
  • by jocks ( 56885 ) on Thursday November 18, 1999 @11:48PM (#1520248) Homepage
    I accept that you will need to test to make sure that the disks are not the problem but you will need to do it the right way.

    Firstly vmstat tells you very little about disk i/o. What it is good for is the processes. Look at the output from vmstat 5 for example. The first three colums are r b w, running, blocked and waiting. If there are blocked processes look at WHY processes are blocked. Use top to get the i/o wait information. If there is a lot of io wait then look at the disks. Use iostat -D to get percentage utilisation of the disks. If there is a lot of disk wait then you may need to either add more disks or spread the load.

    It is interesting to note the relative speeds of devices:
    If cpu takes 3 seconds to do a job then,
    Level 1 cache takes 10 seconds
    Level 2 cache takes 1 minute
    Memory takes 10 minutes
    Disk takes 7.7 months
    Network takes 6.5 years

    Get stuff off your disks better! Monitor your cache hit rate to get information on efficiency. Use vmstat or sar or stuff from the se toolkit. Get the se toolkit from http://www.sun.com/sun-on/net/performance. Run zoom.se to monitor your system. Run virtual_adrian.se to tune your system. Use the right tools and don't just add more memory, identify the bottleneck, fix the bottleneck, re-test and repeat until the performance is satisfactory.
  • Both SCSI and IDE drives these days hide defects. There is no technical reason why IDE drives are less reliable than SCSI drives. However, SCSI has lost the workstation market completely, so now only servers use SCSI. The data on server disks is generally more expensive to replace, and downtime there is more troublesome, so people are prepared to pay for quality. Some manufacturers have made two versions of mechanically identical drives, one IDE and one SCSI. Quantum Fireball is an example. That practice seems to be ending now that there is no such thing as a low end SCSI drive.

    Anyway, there sure is not two heads on each platter as suggested by another poster. At one time Seagate made Barracuda drives that were able to read data off two platters in parallel. They dropped it in the later Barracudas when the increase in data density made it possible to make faster drives without this feature.

    Another issue is that IDE drives are usually optimized to withstand getting started and stopped again and again by powersaving, whereas SCSI drives are optimized to run continuously for years.


    Benny
  • by Anonymous Coward
    As others have already mentioned, you should really look into tracking down where the problems are before you go and spend $$$ on a new RAID system.

    A few things that may help;
    1) Our POP mail server (~1000 users) running on an old SUN Solaris machine (LX) was having problems because of the number of NIS lookups that were going on. System CPU was up near 75% constantly, I/O waits near 0, and load was also very high. Solution; make mail server a NIS slave as opposed to a NIS client. Reduced load by 20% immediately. Same goes for DNS lookups.

    2) Make sure you're not writing/reading to/from NFS mounted fs.

    3) Install rec. Solaris patches - these can make a big difference. Try installing Virtual Adrian, and see what it reccommends.

    5) Don't buy EIDE for all the reasons mentioned previously. For lots of simultaneous hits, SCSI outperforms EIDE every time.

    6) Consider fibre channel disk arrays from SUN - expensive but they are nice especially the new A5200. Give 22 spindels as opposed to the 14 in the A5100.

    7) Ignore the guys talking about s/w RAID solutions being a BIG slowdown. Sure h/w RAID 5 is much faster than the s/w equivalent but when it comes to RAID 0+1 then there ain't a lot of difference. Not only that BUT s/w RAID systems tend to be much easier to configure and maintain w/o a doubt - check out Veritas Volume Manager (love it!) or even the free DiskSuite (with Sun Solaris server version) is better than any h/w RAID configuration I've seen.

    8) I would bet my next salary that adding a RAID system to your mail server will increase performance by less than 15%.

    Oh, and I've been managing enterprise level Sun systems now for 8 years, so I'm not just a Linux geek who has read too much ;)

    Hope this helps.

  • This is my HD:
    Filesystem Size Used Avail Use% Mounted on
    /dev/hda1 486M 358M 102M 78% /
    /dev/hda2 3.8G 2.7G 909M 75% /usr
    /dev/hda3 964M 501M 413M 55% /home
    /dev/hda5 99M 20k 94M 0% /tmp
    and that's AFTER cleaning out... before I had / at 100%, /usr at 100%, and /home at 100%. I have a 4.3 gig HD laying around which I had FreeBSD on for awhile (been thinking aoubt putting BeOS on it) but I may use this idea and go for it.


    If you think you know what the hell is really going on you're probably full of shit.
  • > I'm not familiar with Exim, but aren't there more efficient solutions?

    Probably not ... a certain (v) large UK ISP I know quite a lot about uses exim on it's email system because it's more secure than sendmail (but then, what isn't?) and more efficient than qmail (see below).

    qmail starts up a seperate process for every email it delivers, whereas exim starts a seperate process for each batch of email it delivers. On a lightly loaded system, the point it probably moot - however on systems like what we are discussing, it's quite probably not!
  • Actually, with SCSI, you can have inter-device transfers (without intervention of the CPU or the DMA controller) and can access several devices on the same SCSI bus at the same time, which you cannot do with EIDE (you have to end the dialog between the driver and the device before accessing another EIDE device). I don't know if I made myself clear, but in any case there are many webpages out there that explain the differences.
  • If planning to build raid5 arrays the
    physical limits of IDE might become a
    issue.

    I don't know if it's reasonable to plug
    raid5 array disks as IDE slaves. But i would go
    for SCSI if you do big raid5 arrays.

    With 5 ULTRA2 fast and wide scsi in raid5
    array (software raid5 in Linux) i have seen
    reports of 40MB/s read and write throughput.

    And if you have dough, buy 2 controllers
    and put raid5 array on both. And stripe among them

    --miku
  • When i use my ide disk i can't even move the mouse. When i use the scsi disk my puter don't
    seem to notice (mp3's dont stop, i can move my
    mouse again). So i wouldn't even consider ide for my ws anymore.. even if scsi is 500-1000 NOK (US $80-140).
  • Laugh at yourself. IDE drives have space set aside since the days of the first Seagate 89M-130M drive series.

    So the man is right - no difference in reliability whatsoever.

    The question is I think that they are actually hitting not a drive bottleneck but the UFS filesystem bottleneck so they should either abandon Solaris or buy (forgot what's their name) the file server and reliable filesystem solutions for Solaris extensions.

    So even if they upgarde to RAID they are not going to get anywehere.

    Also on the topic of RAID: There are very good external boxen using proprieatry solutions for IDE hotswap and presenting a single u2w or better SCSI interface to the box. And they are rackmountable. And they cost about 4000-5000 fully populated with 13-17GB EIDE.
  • TyFoN wrote: "When i use my ide disk i can't even move the mouse."

    You fail to mention which chipset and transfer
    mode you are using. And there _are_ SCSI hosts that do a lot worse than recent ATA interfaces (the cheap ISA-Adaptecs that come bundled with scanners and ZIPs for example).

    I once was a SCSI advocate, too. Then came the Intel PIIX3 and Mword DMA mode 2, nowadays I am using a PIIX4 and UDMA33. I have _never_ had my system go slow on me with DMA ATA drives, much less the mouse pointer stop moving.

    There is just one special case: Swapping to ATA drives can put more of a load on the system under certain circumstances (because I haven't seen ATA drivers use command queueing yet - it's already specified in the ATA spec, though), but you don't really want to be swapping in the first place. If your system does that constantly, you should have gone for more RAM instead of that pricey SCSI drive.
  • or "simultaneous". whatever. Fortunately, on RAID systems that I have experienced (AMI MegaRAID and some of the Mylex offerings), the 64k block is an "idiot" setting; whew! Personally, I've found both Mylex and AMI to offer good products with *reasonable* support (Mylex above and beyond the call, when they really had no obligation to weigh in with assistance) and I'll be going back for some more of that when my budget allows. As for these shitcart IDE arrays; I know they're cheap and offer bundles of storage -and the diff between one of these and a SCSI job may be a nice holiday somewhere- but I have had a very bad experience with one (well, actually, two and then three) units that couldn't do the job it/they was/were meant to. My bad luck, or this is a despicable case of a "RAID-style toy: Not meant for serious use"? Finally, don't RAID 0 unless you can RAID 10!
  • I would not use RAID for the problem you're describing. You're most probably better off splitting the box into several others.

    For example, try using a fallback mailhost for outgoing mail (fallback_mx in Sendmail). That way messages that cannot be delivered within a couple of seconds are relayed to the fallback server, keeping your outqueue clean and tidy.

    For incoming mail, use a different server, or if you can, use several. You could just put them all in the MX list of your domain, with the same priority. This does wonders.

    It might be smart to look at the mailbox format. Some mailbox formats (MBX) have much better performance than others. And you could put POP3 and IMAP on a third server.

    All this is much preferable to simply installing a RAID array, IMO, based on the information you presented.
  • Used to work for Data General, the parent company. Fantastic hardware. They've just been bought by EMC though.

    I would definitely try to tune the system before throwing hardware at it though. Find out exactly where the bottleneck is.
  • I ran into the same problem not long ago. Our local ISP needed a backup solution. The old tape drives were not doing their job anymore. But, we built our own RAID cabinet. We bought a 8 disk RAID enclosure with dual redundant power supplie from Siliconrax. The controler is a Mylex External RAID controller. The card is nice, it allows expandablity down the line. The card comes in a full height enclosure (keep it in mind, its big). We used 18.6 gig Seagate drives in the system. Each drive was mounted in a CRU Data Port removable enclosure for hot swap. RAID controller has a LCD front panel making setup a snap. The array was configured with RAID 5. RAID 5 is redundant, and provides fast read access, but write access is slower. All in all, the the array is about 100gig online. It cabinet is connected to a SGI O2. The only thing to watch is the cable length!! We've been doing nightly backups over NFS since the array was turned up. The system is nice. Go SCSI, and go the research on the proper controller. If the money is there, go fiber.
  • For long term monitoring on Solaris, I would recommend Orca. This is a perl based tool which uses the SE toolkit to collect data. It then stores it very tidily and produces HTML with PNG graphs that let you see many performance statistics on daily weekly ... up to yearly cycles. The home page is here [caltech.edu].

  • having read through most of the the thread, my $0.02 is:

    definitely install virtual adrian to get a better
    idea of system tuning you can do and where your
    real problems might lie. have you tuned all the system paramaters possible ? ncsize ? turned off
    all non essential daemons/apps on the machine ?

    mylex controllers seem reliable but were definitely a pain to configure - we're using them on a dec fileserver solution. one downside that appeared was they took 6-8 hours to initialize the array - compared to 1.5 hours for a non mylex controller :-/

    we're now switching from DEC+Mylex to Sun+Infortrend who make a very nice scsi-scsi controller. www.infortrend.com - we're using the 3201U2G - 4 Ultra2Wide scsi buses.

    don't go to raid unless you know what you're getting yourself into - it's far more complex and expensive in the long term apart from your initial investment in the hardware. you'll have larger spares provisioning, your documentation (you do have some right ;-) will be more complex and your backup system might need some work too.

    my rule of thumb at present is JBOD to 50G, RAID
    as a NAS for 50G-500G and SAN (RAID/fibre) for above 500G. you really don't need raid below 50G except for specific performance reasons

    it's been an interesting thread to read, since i'm
    right in the middle of working on a raid5 server implementation.

    -jason
  • First of all you might want to check out other MTAs, as well as other methods for storing the user's mails. If all mailboxes reside in the same directory, you're spending all your time in the kernel doing _linear_ searches thru the mailbox directory. You could spend millions on EMC hardware without seeing _any_ performance increase.

    I'd recommend using the Postfix MTA, as it has almost all features of Sendmail, and it's secure, and (hold on) it's even faster than QMail. Eventually you could use it with the Cyrus IMAP/POP services. You definitely want to make sure that you don't have all mailboxes in the same directory. Build a hierarchial structure where you never have more than say 30-50 subdirectories/files in one directory.

    Ok, if disks are still your problem, consider:
    1) Software RAID is usually a lot faster than hardware RAID. And for the money you save on the HW controller you could buy faster/more disks.
    2) An IDE disk is identical to an SCSI one, except of course for the interface and the warranty. The price difference is mainly due to the warranty.
    3) UDMA/ATA-{33,66} IDE interfaces are as fast as any SCSI solution if you keep _one_ disk per channel. The main problems with IDE solutions is the short cable length allowed (a problem for 10+ disks) and the number of controllers you must have (one controller for each two disks)

    You can spend $50K on a SCSI/HW-RAID solution easily. And you won't know if you'll even get the speed of one single UDMA drive from it (yes people actually get 15MB/s both from their single UDMA drives, and from their expensive DPT RAID solutions). At least consider a software-RAID and eventually IDE solution before rushing out to spend the next 10 years budget on the shiny HW-RAID solution.

    Your setup is fairly small, eg. you would probably do just fine with a four-disk RAID-5/10 for spool and mailboxes. This is where SW RAID is worth considering. Granted, for 20+ disk systems, HW RAID may well be a better way to go, eventually combined with SW RAID.

    My 0.02 Euro.
  • ...and name a hard drive that you will want to be using in five years. My three year hard drives are just slipping out of warrantee are 1.2s and 1.6s. Who cares!
  • my SCSI drives have a 5 year warentee.

    ...and name a hard drive that you will want to be using in five years. My three-year old hard drives that are just slipping out of warrantee are 1.2s and 1.6s. Who cares!
  • Not entirely true. RAID 0+1 is faster for writing, but RAID5 is usually, depending on configuration, faster to much faster for reading (you have more platters to simulatenously read from, and calculating checksums isnt necessary for reads).

    Some array types (notably HP that I know of) will dynamically rearrange data storage between RAID 0+1 and RAID5 to optimize speed and space.
  • Many have posted followups here mentioning that RAID 5 may not be your best avenue. To recap, this is because of the performance overhead associated with the calculation of parity data. Unless you have a reliability issue, RAID 5 is probably something to stay away from. An exception might be hardware RAID, but such solutions are expensive and will still involve a slight performance hit.

    The multi-controller solution is probably best; someone mentioned the Sun StorEDGE product with the Cheetah drives. This is a great piece of gear, and coupled with some really good storage management software (might I suggest Veritas Software's File System/Volume Manager) you'll get a very flexible solution providing the most bang for the least buck. With the Veritas product you can manage the data on the fly over several drives, and monitor & tweak the configuration on the fly while in a production capacity; additionally, the Veritas product provides a journalled filesystem which will allow rapid restarts in the event of a crash and if you have the drives, can be configured to fail over to available spares.

    Yes I am a Veritas Consultant =^) but that does not change the fact that this is an excellent product that would probably go a long way towards addressing your issues (which seem more performance oriented than reliability related) on your existing drives. Check out this link for more info: http://www.veritas.com/library/su/fsconceptwp.pdf

    Good Luck!

    -Videoranger
  • I do sys admin for a software company with a mixed Unix-NT enviroment. We had some terrible experience with Samba on Unix, and NFS on NT. About a year ago, we purchaced an F720 with 100GB, for around $50,000. Now we have another F720 with a 300GB fibre-channel RAID.We talked with other NetApp customers, and they were extatic about the reliability of these machines. Although I can't say that the filer was %100 reliable, like we heard from smaller sites, we're VERY satisfied with it's performance. In the last year, we've only had 2 occasions with signifficant (> 10 minutes) downtime. As far as speed is concerned: it's usually faster than our local disks...
    One of the best things about it is it's simplicity. GUI people use the nice Java applet to control it (it get's better with every release of the OS), and us Unix people have a great command line interface.
    If you plan to use the NetApp with lots of clients (about 500 in our case) in a mixed enviroment, the Netword Appliance is probably the most reliable and simple to maintain solution. If you want the fastest RAID array to connect to your mail server, it will simply amaze you :-)
    If your budget allows, got for it!
  • No - he is in fact right. I don't know about all manufacturers, but certainly IBM drives use the same hardware - the only real difference being the content of a firmware chip (and the cable connector, I guess).

    This doesn't mean that all the drive's features would be available for both SCSI and EIDE, and it doesn't stop them charging loads more either.

    -- Steve

  • 1.5 gig in 6 min is only a little over 4 MB/s. Something is seriously wrong with that number. A single 1000 rpm drive can sustain 18 - 22 MB/s.
  • Now from all of my research it seemed like NetApp was the way to go. So I pushed and pushed and pushed, and finally we got a F760. (Nothing like going from nothing to the top of the ladder) And now it is 2.5 months into being a NetApp user. Both the 1 and 2 month aniversaries were marked with a MB dieing. I must say it is fast, real fast, but right now the analogy is fast like a race car going towards a wall. Now ease of use, maintainence, etc on the UNIX side has been pretty carefree for me. The NetApp has been very easy to use, easy to monitor, and easy to setup. But the NT department which paid for half of it is hating life. The NetApp's quota system is straight out of unix which is not good for NT, i.e. you are putting quota's on users, groups, or qtree's (Think root level directories which are made in a special way). According to the NT guru's file ownership by individual's in NT is a bad idea, therefore all files are owned by an administrator equivalent. This means you lose user quotas. NT has a different group philosophy than unix (multiple groups can have access to a single file) so I am guessing the group quota's are out as well. Leaving qtree's, which are sort of ugly. Right now our NT people are looking at taking the loss on the NetApp and giving it to UNIX (Fine by me ;) and replacing it with a conventional NT file server. Another downside for the NT side of things is that the NetApp's is configured much like a UNIX box. It uses init and rc files etc etc. Well from NT land there is a carriage return/line feed issue. All of those files have Unix style carriage return/line feeds. I am not sure if they break if you start using dos style but I am leary to find out. Which means the Unix side is resonsible for all configuration of the NetApp. This is both good and bad. They aren't going to break my stuff, but I have to take on additional labour. Note: The hardware failures were quickly resolved by NetApp, but it still sucked hard. The NT quota issues are supposed to be resolved in the next major version of the NetAppOS codenamed Guiness or some such. The NT people IMO haven't fully explored the quota possibilities instead taking the partyline that it's too much work. And it is entirely possible that I have not uncovered all of the problem's and solution's for those problems in the time we have had it.
  • Why is everyone soo obsessed with RAID5. It is not the holy grail of disk storage as one or two others have tried to point out but been flamed for. Raid5 offers great resilience, BUT is not good if performance is also required. Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume, and the need to perform the stripping on writing makes the system slower. If performance is an issue, and money is not, then RAID1 (mirroring) is the solution (unless your system will allow both RAID0+1 (IBM RS6000's, my domain, do not)
  • Writes shouldn't take significantly longer than reads. I work with Fibre Channel, and the throughput numbers I get for raw reads and writes (no file systems) aren't significantly different. If you have a good raid controller, it should be able to keep the drives busy on both reads and writes as long as the file system is writing data in large enough blocks.
  • Recently built a server with an AMI 428 card. It is an old hunk of a HP NetServer 5/166 LS2 (Dual Pentium 166's). The preformance speed up over straight scsi was quite nice. I am running 3 raid 1's. But at about 3 week intervals I am crashing. There is a new driver for the controller which I haven't tried yet, but it doesn't list the mysterious SMP + MegaRaid crashes as resolved. The box is running Debian with a 2.2.10 kernel.
  • >1) Software RAID is usually a lot faster than >hardware RAID. And for the money you save on the >HW controller you could buy faster/more
    >disks.

    Since when? I've been working on servers with and without RAID for ten years now, and this is the first time I've EVER seen this claim. Was that a typo? Hardware RAID is much faster usually, as well as more reliable. Yes, it can be harder to set up, but in the end it is well worth it. Remember, you get what you pay for. Any time you use software to do a job that hardware can handle, you are devoting CPU cycles to it. Properly designed RAID controllers offset a ton of processing that would otherwise be done by the host CPU. They don't put RISC processors on RAID controllers just for show :-)

    As for SCSI controllers, I'll echo what others here have said. Mylex is one of the best. Not the easiest to config, but by far one of the fastest and most reliable controllers out there.

  • > It's unbelievable how many people are confused over this.

    Yes, it is. There are still people who recommend SCSI without further investigation.

    > For example, let's say your system is trying to read data and do a write at the same time.

    No decent OS would do that. It would concentrate on reads and save the writes for later, unless the write cache is full.

    > With IDE your OS has to issue one command to the controller which passes it to the device and then waits...

    With IDE maybe. With ATA not. ATA does have everything that SCSI has, and more. Read the specs at www.t13.org.

    > With SCSI, the OS tells the controller all the operations it wants to do and the controller looks at it and decides if there is an optimal way of doing the commands.

    Of course, only if you have a host adapter / driver which support command queueing, and an application that _does_ do multiple accesses at the same time. Most don't. And a decent OS reorders the commands anyway before they are sent to disk, partly eliminating the need for reordering by the drive.
  • Well, I agree with pro-SUN posters (no I don't work for SUN, although I had a job offer from them 3 weeks ago :P).
    I've installed 3 A1000's over the last couple of weeks, ranging from the minimally specced ones (50Gb RAID5) to a fully loaded one (8x 36.4Gb)

    Although RAIDmanager is only marginally useful and you have to make sure your /etc/nodename doesn't contain your FQDN, it's still one impressive piece of kit.


    --
    Full Time Idiot and Miserable Sod
  • Seperate read write heads? No, not in the sense you are thinking of. The only reason for a seperate heads is for when you can't use one head for both jobs because one is physically unable to do the others job.

    For example think of a readhead able to read the ever shrinking area of a single bit the surface of the platter (bear in mind that tricks are used to work out the real state of a bit, you don't need heads able to read a bit on a stationary platter). Attempting to use that head to write to the disk may well destroy it, you now need annother head to write with.

    The two heads are on the same arm, and can't operate at the same time (no point, you know what your writing ;), so you get no two head advantage just more cost for the head. The gain of course is increased storage density.

    Bryn
    --
  • SAN is an ill-defined acronym that everyh vendor defines differently. The idea selling SAN is that you have a large centralized storage center that offers it's disks/volumes to all connected clients w/o the hassle of administrating a disk subsystem on each server.

    The problem is that each vendor implements this differently, and has a different definition of what a SAN should be. None have really addressed the complex issues, instead implementing the kind of hack you describe - NFS with a data channel over FCAL. You still have the problems of NFS to contend with (no reliable locking, consistant transactional guarantees in client and server implementations, etc.). Heck, most vendors are selling FCAL HUBS instead of SWITCHES to accomplish this storage sharing because the switches aren't prepared to do TCP/IP over fiber!

    Ideally a SAN would be a well fleshed-out spec that allows massive amounts of storage to be conveniently accessed accross a network with all of the guarantees of a local disk. That's how it's being sold. However, right now it's looking like little more then a way to get NFS to run faster.

    -Peter
  • Remember ``Hardware RAID'' is just a smaller processor running software as well. The PII+ in most modern systems is way faster than the i960 or m68X in a hardware RAID controller.

    I've seen quite a few people finding in disbelief that they surely didn't get what they thought they paid for when buying HW RAID solutions.

    Back in the old days I'm sure letting an i960 do parity calculations was a boost. Well, times change.

    The _only_ thing I've seen HW raid controllers being better at, is large setups (10+ disks) where a pure SW solution will load the memory and PCI busses of the system heavily. Especially RAID-1 where a SW solution will have to duplicate data to all disks, the HW solution will have an edge moving this duplication off the main memory / PCI bus.

    For smaller setups, like the one in question here, software RAID is absolutely both a viable solution, and probably offers by far the best price/performance.
  • Solaris' filesystem prior to the logging filesystem in 2.7 is a dog. I'd highly recommend that you benchmark your performance w/ Veritas' vxfs, or w/ solaris 7 before you buy a raid system.

    Also, if you do get a RAID, I'd highly recommend a box that does not get controlled in software, i.e. Solstice DiskSuite or Veritas Volume manager (I love veritas' VM, but as a raid controller it lacks intellegence).

    A good external box with hot-swappable drives and a sizeable write-back cache (w/ a battery!) is my favorite way to do this stuff.
  • I work for a Systems Integrator-nice word for RESELLER! We are a Sun reseller first and formeost, but we are very strong in the NetApp arena. Since I am a geek trapped in the hell of being a sales(wo)man, please forgive me if I sound salesy at all....
    Anyway, NetApp's are a great solution for multiprotocol storage. One of the drawbacks is that it is Network attached and therefore only as fast as your network...which has been a problem for many of our customers. Another HUGE problem is backup. There is only one product that can do it well-a product called BudTool. BudTool is a little guy that some geeks in my company thought up and brought to market, then along came NetApp who asked us to figure out a way to b/u their filers. Out of that venture NDMP was born. BudTool is the only product that makes use of NDMP correctly. That divison of my company was recently sold to Legato systems, who plans to EOL that product. NetApp is now scrambling to find another solution, since they've been recommeding BudTool from Jump Street....
    Pricing is also an issue. And you were right in saying that they start at aroung $17K, but that is WITHOUT storage. A good sized storage solution, let's say 1 TB is going to run you upwards of $100K. Yikes.
    There is also a good resource for people who are thinking of deploying a NetApp solution, which is the toasters users group. You can send an e mail to toasters@mathworks.com and ask to subsrcibe to the group. You'll get alot of good feed back on what works, and what doesn't. You'll also get to see the downside to using it (and BudTool). I think there is info about the group at http://teaparty.mathworks.com but i haven't been able to get there in a few.....Check it out. It's definitely worth the trip.
    And if you need any quotes I'd love to help you out!!! Just Joking
  • Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume

    That is simply not true. Reads in RAID 5 occur from all volumes where a stripe resides. A file never exists on a single volume in RAID 5 unless it is smaller than the stripe size.

  • I've been in the ISP business for years. Ran an ISP with 2000 customers and was the Systems Admin for an ISP with 150,000 customers.

    Reliability Is the issue when it comes to email, and raid systems. Ofcourse Sun has the edge, so why not stick with Sun Software & hardware. The sun StorEdge A1000 has a caching controller and usually 30-40 gigs per rack, it plugs into your SCSI Bus, and you can simply add another Dual Channel Scsi card to split the load or add redudancy.

    Network Appliances makes an Excellent Solution. NFS Toasters are the way to go in a distributed environment. Say you have customer on a shell account, well you can export the mail directory and mount it VIA NFS and access it from the shell servers without throwing more email load on them locally. NFS Toasters come in a great looking appliance rackmount case, and depending on how much storage you need, is how much rackspace you need.

    And ofcourse there is StorageTek, which will run you a pretty penny, but offers Fibre Channel, or Multiple SCSI channel connections, full redundancy, caching, hotswap and maintenance features.

    I'd never stick and IDE solution on a production box, You need something that you can get support on and Services on, so i'd suggest that you stick with the Sun StorEdge A1000 drive systems for complete compatibility and put it under the same Support contract as your UltraSparcl

    AND

    As far as email is concerned, you should setup an MX server to cache and forward incoming email, these work real nice since you can run RBL or pre-process out spam without killing the actuall server that holds and processes email for incoming clients. You have to look at a distributed environment, as email is precious to alot of people, and a single server machine is not gonna cut it when your upwards to 20,000 customers doing that much email.

    PS. Try out Qmail too :) smaller footprint!

  • My guess is that in this role, performance is not the paramount issue. You're not bopping the heads around like you would in a database application; and even 20MB/sec is going to be a plenty of throughput unless you have banks of ADSL lines. The important issues are reliability and maintainability.

    I'm as much of a tinkerer as anybody; for my own use I don't mind spending two bucks of labor to svae one buck of investment, because I'm really investing in myself. That said, if I had 13K users depending on me for e-mail, I wouldn't mess around; two days of down time could be fatal for your business.

    I'd invest $1.50-$2.00/user in a professional grade solution:

    Hardware SCSI raid controller.
    Drives on hot swap trays.
    Same/next day on-site service contract.
    External cabinet that can be swapped over to another computer.

    It's been over two years since I spec'd a solution like this one (I'm doing software exclusively these days), so I can't make a specific recommendation for today's hardware. I know that some devices used to come in a separate cabinet and looked like a humungous SCSI drive; they even had their own RJ-11 to hook up to a phone line for remote diagnostics from the vendor's tech support.

    If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash. You could even use IDE drives. If you could afford a second IDE controller, then you could use software mirroring across two different controllers for improved throughput.

    One thing I haven't looked into is RAID-2; RAID-2 is like RAID-1 with additional error correction codes. It is seldom used in SCSI because SCSI does this for you, but it might be worth looking into for IDE raids.

    Good luck.


    Really what would be great is failover clustering.
  • Much of this is probably repeated elsewhere, and much is common sense, but...

    1. When was the last time you defragged the drives? Chances are this will reduce thrashing immediately.
    2. Add more memory. More cache == less I/O. Double the RAM for a week and see how much better things are...
    3. Hardware RAID is the only RAID. In most cases, the overhead of s/w RAID exceedes the I/O performance increase. Plus, the OS (whatever OS) need never know the boot drive is spread across 5 drives is three racks...
    4. Hot Swap is a must for a production environment. Nothing beats the warm feeling of yanking a dead drive, slapping in a new one, and watching it get rebuilt on the fly - and the users never know...
    5. Any amount of RAID will still fail badly if the PSU dies - always get redundant, hot swap power supplies.
    6. The same goes for cabling.
  • Load average is defined as the number of processes sitting on the run queue. This need not indicate a disk IO bottleneck.

    Indeed, a high load average indicates that there is no I/O bottleneck, and a low load average may indicate an I/O bottleneck.

    The run queue holds only those processes that the kernel thinks can constructively use CPU cycles. Once a process asks the kernel to access an I/O device, the kernel decides whether the device is currently available. If not, the process gets kicked off the run queue until the device becomes available again.

    Thus, if you have a lot of processes hitting the same device, an I/O bottleneck would actually drop the load, as there are fewer processes able to use the processor.

  • by Salamander ( 33735 ) <jeff @ p l . atyp.us> on Friday November 19, 1999 @04:07AM (#1520332) Homepage Journal
    >the whole point of the NetApps is to be faster than local storage. and they are, as long as your network is fast enough.

    I think network-attached storage is a fine idea and the "right solution" for many things, but I just have to add a rebuttal here anyway.

    Network-attached storage is faster than local storage if your network (including the protocol stack) is fast enough and your local-storage subsystem (including its own separate protocol stack) is slow enough. That's a totally useless claim. It's like saying that a train is faster than a car, leaving out the part about the train being an unloaded bullet-train engine on an empty track and the car being a Yugo stuck in New York traffic.

    In actual fact, the raw bandwidth of modern storage interconnects (e.g. UW SCSI, FC) is higher than that of most network interconnects (e.g. 100baseT) for which the adapter cost is similar. In addition, the protocols used for storage (e.g. SCSI, the various layers of FC) are more suited toward that task - duh - than are the protocols used for networking (e.g. TCP/IP). There is no reason in hell that it should be faster to use network interconnects and protocols to access your storage than to use storage-specific interconnects and protocols.

    Why might it appear that network-attached storage performs better? I can think of at least three reasons right off the top of my head:
    • Many computers are "unbalanced". They are misdesigned or misconfigured so that they have a lack of direct-to-storage capability coupled with an excess of network capability. This may actually make NAS the correct solution for that environment but is irrelevant when considering the overall merits of the two approaches.
    • Network-attached storage devices often benefit by having much more cache than direct-attached storage devices. If you took that same amount of cache and applied it to the direct-attach devices, the NAS boxes wouldn't look so good.
    • The caching strategies used for NAS - i.e. thos in NFSv3 - sacrifice consistency for speed, while direct-attach systems are held to a higher consistency standard. Everyone who has tried to use NFS for something where data consistency or up-to-date modification times matter - even something like "make" - has probably cursed NFS already over this. Some NFS vendors make things even worse by failing even to meet the NFS requirements. Sun's own Solaris NFS client, for example, doesn't always flush data when it's supposed to. If you added all the appropriate sync() operations and fixed the NFS implementations so that your NAS solution was really doing the same thing as your direct-attach solution, you might see some different performance comparisons. Note, though, that for many applications the NFS tradeoff and hence the NAS solution is pretty reasonable.

    At this point I should disclose my own biases. First, I work for EMC. That's not by choice - the company I was working for got bought out - and I'm often not thrilled about it, but the pay is good. In particular, I don't buy in to all of EMC's arrogant "storage is the center of the universe and the Symmetrix is the ultimate storage device" attitude, and I heartily dislike our own Celerra NAS product even though it blows the doors off NetApp in terms of performance and scalability. Secondly, my professional areas of interest include distributed, cluster, and SAN filesystems, so I of course have some fairly strong opinions on such matters. That said...

    I think that once we start seeing true, mature, multi-platform shared-storage filesystems, NAS will start to seem much less appealing. Why pay for NAS when you can just add software to your existing hardware investment and get all the sharing with almost all the performance of local access? Now all we need is a decent implementation of such a filesystem.
  • There is really not so much that differentiates ATA from SCSI anymore.

    I wouldn't go that far.

    Yes, IDE has finally caught on to such things as DMA and busmastering, and throughput on IDE devices is in the same arena as SCSI now. But.

    IDE is limited to two devices per bus, and generally requires one IRQ per bus. IDE also has very strict and short cable length limits, and lack a "external" connector -- you generally can't have an external IDE device (I know is is possible, but the cable restrictions make it very difficult).

    There are more kinds of devices (scanners, printers, etc.) available for SCSI then IDE. SCSI is generally more capable in terms of what you can do with it.

    IDE controllers tend to be very primitive compared to their SCSI counterparts. Things like bus disconnect, command queuing, scatter-gather, even busmastering are often not available or iffy on IDE controllers. This applies especially to the onboard controllers in many motherboards; the number of shortcuts taken there are incredible.

    Likewise, the drive electronics and HDA components in IDE drives are often cheaper then those in SCSI drives. These are all design and engineering issues, not issues with the specification itself, but they exist. The problems stem from the fact that IDE is marketed to be cheap, cheap, cheap, and thus gets are higher incidence of cheap components. It isn't limited to IDE, either -- you can also find cheap SCSI hardware, it is just that there is less of it.

    IDE often appears faster in benchmarks, because benchmarks typically try to do operations in bulk on a single device. IDE has a lower command overhead then SCSI, so for such things, IDE will be faster. But when you get into the real world, and have multiple processes trying to access multiple devices at once, that is when IDE stalls, while SCSI keeps on going.

    I realize this started off as a discussion about RAID, and that IDE RAID devices are not your typical RAID devices. They usually have one drive per bus, connected to a custom controller that multiplexes them all and presents them to the host as a SCSI interface. But the topic has drifted to more general applications.

    Just my 1/4 of a byte. ;-)
  • Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.

    Kinda why you want gobs of battery-backed RAID controller cache memory... (and a UPS, and clean power... ;)

    Your Working Boy,
  • If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash.
    Actually, RAID-1 is more expensive and faster for writes than RAID-5.

    The reason for this is that RAID-1 uses 1:1 mirroring of a 2-drive set while RAID-5 uses rotating parity in which parity information is distributed across all drives.

    With regard to space, using RAID-1, your usable yield (what shows up in df) is half of the total disk space put into it. With RAID-5, parity info is spread througout all the drives. Eg., I have a RAID-5 using four 4GB drives, which gives me 12GB of usable space. With 0+1 on this configuration, it would be 8GB usable.

    As for speed, both RAID-1 and RAID-5 allow you to read from multiple disks at once (which, of course, is a win). For writes, a drive pair in a RAID-1 will take as long as a write to a single drive. On RAID-5, however, it takes longer because (afaik) the RAID controller has to determine which drives to write the parity info to, which takes CPU time.

    A decent little overview is at DPT's site (sadly, only in PDF) at http://www.dpt.com/pdf/understand_raid.pdf [dpt.com]

  • There is a great deal more information involved, part was the saturation of the PCI bus causing the slowdown, part was OS tuning, part was Hardware configuration. We were using IIRC 7200 or 5400rpm ultra scsi drives(not ultra 2). the point was to show it makes a big difference tho
  • It took quite some time for my original question to be posted, and we were on a critical schedule. We ended up buying a whole new server and internal RAID controller. Details follow:

    After much shopping, questions, advice and temporary insanity, we decided to go for a new Linux box to handle the mail. Apparently, the load wasn't only coming from disk i/o wait; the kernel was using 70% cpu. We chose a Dual PIII/500 setup on an Asus P3B-DS, 512M ECC SDRAM (less than before, but prices are so high right now, and we figure processes should end sooner on this box), Intel Pro/100, Seagate Barracuda for system, six Seagate Cheetahs for spool and mail storage, and a Mylex eXtremeRAID 1100 (w/ the 233MHz i960).

    It was configured with 5 spindles in RAID 5, with 1 as a hot spare, and then partitioned in half. I'm confident this badarse controller can keep up on the writes, with minimal performance hit. Preliminary results with bonnie are inconclusive, since it's working with one huge file, rather than thousands of small files. If write performance lags once it goes online (this Sunday am), we'll split it into 0+1.

    Exim, QPOP, and IMAPD were hax0red to use a double-hashed directory structure. ie: "spin" would reside in /var/mail/s/.p/spin (the dot was required for those who have a single digit username). This should eliminate any overhead that ext2fs may have with large directories.

    Thanks for all your advice, keep it coming. If you're a gamer, check out http://www.xmission.com/quake

    -Kevin Blackham Xmission Internet Salt Lake City, UT

  • My understanding was that some fs's will perform some actions to avoid some fragmentation.

    A collegue of mine recommends doing a complete backup/reformat/restore cycle every 2 months or so on partitions that see a great deal of edit/extension to files - on a partition in use since '93 i expect this would give a radical reduction in trashing . . .

    I also give you a chance to test your backup procedures :)

I judge a religion as being good or bad based on whether its adherents become better people as a result of practicing it. - Joe Mullally, computer salesman

Working...