Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage

Data Storage Capacity Mostly Wasted In Data Center 165

Lucas123 writes "Even after the introduction of technologies such as thin provisioning, capacity reclamation and storage monitoring and reporting software, 60% to 70% of data capacity remains unused in data centers due to over provisioning for applications and misconfiguring data storage systems. While the price of storage resource management software can be high, the cost of wasted storage is even higher with 100TB equalling $1 million when human resources, floor space, and electricity is figured in. 'It's a bit of a paradox. Users don't seem to be willing to spend the money to see what they have,' said Andrew Reichman, an analyst at Forrester Research."
This discussion has been archived. No new comments can be posted.

Data Storage Capacity Mostly Wasted In Data Center

Comments Filter:
  • Intentional? (Score:5, Insightful)

    by Anonymous Coward on Wednesday July 28, 2010 @01:18PM (#33058424)

    I don't know about your data center, but ours keeps drives well below full capacity intentionally.

    The more disk arms you spread the operations over, the faster the operations get, and smaller drives are often more expensive than larger ones.

    Plus, drives that are running close to full can't manage fragmentation nearly as well.

    • Re:Intentional? (Score:5, Insightful)

      by TrisexualPuppy ( 976893 ) on Wednesday July 28, 2010 @01:33PM (#33058672)
      Yep, that's how we run things at my company. Drives and controllers have fewer files to deal with, and all else assumed equal, you get better performance this way.

      You also have to think of the obvious spare capacity. In 2005, my company invested in a huge (at the time) 10TB array. The boss rightfully freaked out when we were hitting more than 30% usage in 2007. After having a slow, quasi-linear growth of files for the previous couple of years, the usage jumped to 50% in a matter of months. It ended up that our CAD users switched to a newer version of the software without our knowledge (CAD group managed their own software) and didn't tell us. The unexpected *DOES* happen, and it would have been incredibly stupid to have been running closer to capacity.

      Accounting would have probably had half of us fired if they hadn't been able to do their document imaging which tends to take up a lot of space on the SAN.

      Yet another sad FUD or FUD-esque article based on Forrester's findings.
      • Re: (Score:3, Insightful)

        I think the above example is a great reason why you should always over-engineer your storage capability somewhat. Demand for space can come up unexpectedly and stop the whole show if it's not there. Also if you don't use the storage today, you will definitely make use of it tomorrow. Data usage always goes up, not down. So there's ROI for the next fiscal year when you can make use of the extra capacity.
      • What about the fact* that if something runs amok in a thin-provisioned client and pins a LUN at 100%, the underlying allocation doesn't scale back DOWN after cleanup of such an event.. ending up with the wasted space anyway?

        (or the rumour that our OS of choice doesn't really like the magic happening under the covers if you thin-provision, so we're better off avoiding it anyway)

        -r

        * .. our arrays being EMC and this is what the storage folk tell me.. what do i know, i'm the unix guy.

    • Re:Intentional? (Score:5, Insightful)

      by Nerdfest ( 867930 ) on Wednesday July 28, 2010 @01:46PM (#33058866)
      Simply put, over-provisioning is relatively harmless while under-provisioning is very bad.
      • I don't know about the workplace of the writer of TFA but when I worked in a big factory the price of downtime or a failure due to an application or a number of applications running out of disk space could potentially cause a million worth of damage in lost productivity or damaged product (say it gets stuck in a time sensitive step) in less than half an hour.
        I heard claims that a full fab down could cost a million in 10 minutes though that could have been a slight exaggeration.

        a million worth of extra disk

        • I agree with HungryHobo, but I also doubt the opposite side of that equation. How in hell do you estimate 100TB of storage costing $1 million usd? And over what time period? Per quarter, per annum? That works out to $10,000 per terabyte. I can pick up a 2TB SATA drive at frys for $180 (maybe less now). I have a Netgear SAN with 4 such drives at my house, running the equivalent of RAID 5 I've got 6TB in that alone. During the summer, my power bill is $40/mo, and I guarantee the SAN is a drop in that bucket c

    • Re:Intentional? (Score:5, Insightful)

      by hardburn ( 141468 ) <hardburn@wumpus-ca[ ]net ['ve.' in gap]> on Wednesday July 28, 2010 @01:48PM (#33058880)

      FTA:

      Rick Clark, CEO of Aptare Inc., said most companies can reclaim large chunks of data center storage capacity because it was never used by applications in the first place. . . . Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises.

      In other words, the whole thing is an attempt to get companies to spend tens of thousands of dollars for something that could be done by well-written shell script.

      • Re: (Score:3, Insightful)

        When I see services advertised at those kinds of rates I can't help but remember P.T. Barnum's slogan: "There's a sucker born every minute."

      • In other words, the whole thing is an attempt to get companies to spend tens of thousands of dollars for something that could be done by well-written shell script.

        To be fair, "well-written shell script" is only an inexpensive solution for the author of that shell script. When you purchase expensive product A, you aren't spending money on the solution you are spending money on anchoring your liability in case of catastrophe (software bug misreports disk usage leading to slapstick) against the provider of the solution.

        Put simply, shell scripts are the thongs of the IT world. They are too skimpy to Cover Your Ass with. 8I

    • Agreed. People keep on forgetting, it's not just storage, but iops matter too. When you are running a cluster with hundreds of VMs, you need to size out storage based on how much iops you can get out of these disks instead of how much storage you can give them. Even if you plan out space just enough for each and every application, if disk iops can't keep up at a useful speed, you will get applications that crash, stall, or generally performing horribly.

    • Re:Intentional? (Score:5, Insightful)

      by Score Whore ( 32328 ) on Wednesday July 28, 2010 @02:41PM (#33059770)

      Not to mention the fact that over the last few years drive capacities have skyrocketed while drive performance has remained the same. That is, your average drive / spindle has grown from 36 GB to 72 GB to 146 GB to 300 GB to 400 GB to 600 GB, etc. while delivering a non-growing 150 IOPS per spindle.

      If you have an application that has particular data accessibility requirements, you end up buying IOPS and not capacity. A recent deployment was for a database that needed 5000 IOPS with services times to remain less than 10 ms. The database is two terabytes. A simple capacity analysis would call for a handful of drives, perhaps sixteen 300 GB drives mirrored for a usable capacity of 2.4 TB. Unfortunately those sixteen drives will only be able to deliver around 800 IOPS at 10 ms per. Instead we had to configure one hundred and thirty 300 GB drives, ending up with over 21 TB of storage capacity that is about ten percent utilized.

      These days anytime an analyst or storage vendor starts talking to me about thin provisioning, zero page reclaim, etc. I have to take a minute and explain to them my actual needs and that they have very little to do with gigabytes or terabytes. Usually I have to do this multiple times.

      In the near future we will be moving to SSD based storage once more enterprise vendors have worked through the quirks and gained some experience.

      • by Dogers ( 446369 )

        You might like to speak to 3PAR - when we got them in, they didn't only ask how much storage we wanted, they wanted to know how many IOPS we needed. Their stuff works on the basis that not all the data is needed all of the time. Put hot data on SSD, recent data on SAS/fibre drives and stuff that's not been touched for a while on SATA

        • We did a POC with an array based upon a caching architecture. Worked well as long as the cache happened to match the working set of active transactions, unfortunately a large enough percentage of the workload lead to cache misses which killed the per transaction performance for an equivalent percentage of transactions, which had cascading effects on DB threads, app server threads, connection pools, etc.

          (HSM/Tiered storage == sophisticated caching strategy. Same effects apply.)

          At the end of the day caching s

          • At the end of the day caching strategies will improve performance, but if you need guarantees you can't rely on cache.

            Words of wisdom, friend. Average is only a shorthand for bulk volume, it's peaks which challenge your bottleneck.

    • That's even not talking about the fragmentation that inexorably follows a project that has the exact needed size, and all the costs of managing it.

      If you are now using 60% of your storage capability, you are in troble since that can increase quite fast, not giving you time to buy adequate hardware. What follows is a hell of problems, partitioning storage servers, reallocating disks, reconfiguring workstations and so on.

    • by mlts ( 1038732 ) *

      Don't forget filesystems. UNIX filesystem performance goes into the toilet as soon as drives get over 85-90% full, because the filesystem can't locate contiguous space for things, nor can it pre-allocate space around a file easily, so fragmentation results.

    • We do use thin provisioning, and virtualization in general, but I agree that there is benefit to keeping utilization low. We try to keep more space than we could possibly need both because it can sometimes surprise you by growing quickly and because the drives are faster if the data is spread across multiple drives. Also SSD drives sometimes live longer if not fully utilized, because they can distribute the wear and tear, so we usually leave 20% unformatted.

      Downtime and slow systems are much more expensive

  • The cost of too much storage isn't bad.

    Of course - you may say that it's necessary to delete old data, but in some cases you can't know which old data that may be needed again.

  • I didn't know that I've got $25000 dollars worth of storage at home :-)

  • by fuzzyfuzzyfungus ( 1223518 ) on Wednesday July 28, 2010 @01:20PM (#33058466) Journal
    Likelihood that I get fired because something important runs out of storage and falls over(and, naturally, it'll be most likely to run out of storage under heavy use, which is when we most need it up...): Relatively high...

    Likelihood that I get fired because I buy a few hundred gigs too much, that sit in a dusty corner somewhere, barely even noticed except in passing because there is nobody with a clear handle on the overall picture(and, if there his, he is looking at things from the sort of bird's eye view where a few hundred gigs looks like a speck on the map): Relatively insignificant...
    • by qbzzt ( 11136 ) on Wednesday July 28, 2010 @01:29PM (#33058608)

      Exactly, and that's the way it should be. Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.

      • Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.

        A few hundred dollars gets you a few terabytes (it's around 163 dollars for a 2 terabyte drive in the first netstore I checked), not a few hundred gigabytes. Or are these "enterprise harddrives [wnd.com]" ?-)

        • by wagnerrp ( 1305589 ) on Wednesday July 28, 2010 @02:51PM (#33059934)
          They're not buying the $100 2TB bargain special, they're buying the $300 300GB 15K SAS drive. They don't care how much storage they have, they just want the IOPS.
          • Unfortunately, my company is sufficiently large and bureaucratic that equipment standards are often made by people who don't know the applications :-) The bureaucrats like SAN arrays because they're blazingly fast, and because they're easy to administer, back up, plan for storage growth, etc. And $8000/TB is really just fine if your idea of "huge" storage is a TB or two.

            I've got an application that needs to do a bit of fast-IOPS logging (so the overpriced SAS drives and SAN array are fine), but needs lo

        • Most CIOs would not risk their job on non enterprise hard drives. The regular drives may be cheaper, but they may also fail sooner. Data centers and the like are most likely using enterprise level drives.

          That being said. Many of us have had enterprise drives fail in under a month and have consumer level drives that are still going strong after 10+ years.

          • I read years back that Google's data centers use largely commodity servers and drives, but their operations assume so much data redundancy that no one drive failure hurts them. They pull the whole server whenever it suits them, plug a spare in and send it to the coroner.

            It really just makes my ears bleed to hear that seven years after I've read this, most organizations are still futzing over the reliability or IOPS of single drives. Why cannot the reliability and access speed be spread over a larger number

      • Exactly, and that's the way it should be. Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.

        The way we build servers and do storage has changed *massively* over the last 10 years.
        Why is it so hard to imagine that storage is going to evolve again?
        FTFA

        Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises.

        "Our customers can see a return on the price of the software typically in about six months through better utilization rates and preventing the unnecessary purchase of storage," Clark said.

        A minimum of $5,000 per month strikes me as a touch more than "spending a few extra hundreds of dollars on storage."

      • I've got "Precise" in quotes because I'm skeptical that you can ever get really good predictions about the future, or even the present, especially from users. But if you try, it's going to take you a while, and you'll be spending loaded-engineer-salary time and harassed-user time trying to get predictions that the users will tell you are guesses. Meanwhile, you do need to get some disks online, and it's going to take you a while to accumulate tracking data. I'm in that kind of situation now - until there

    • Re: (Score:3, Insightful)

      by _damnit_ ( 1143 )

      Of course this is the case. This study is as exciting as news that George Michael is gay. There have been plenty of studies to this effect. My company makes tons of money consulting on better storage utilization. [Some Fortune 500 companies I've visited run below 40% utilization.] EMC, IBM, HDS, NetApp and the rest have no real interest in selling you less drives. They all make vague, glossy statements about saving storage money but in reality you need to be wasteful if you want to protect your ass.

      • by Shotgun ( 30919 )

        NetApp and the rest have no real interest in selling you less drives.

        Then why is about half of their feature set aimed at helping their customers reduce storage usage (wafl file system, dedupe, etc)?

        Why have the instituted a systems group to do nothing BUT coach customers in how to reduce disk usage?

        There is a LOT of competitive advantage in selling less drives.

      • Or at least the whole truth, quite often you find that to replicate your 8 TB volume really requires you to buy a SAN with 16 TB capacity on one end and 16 TB on the other with the "unused" space for replication overhead or whatever fancy SAN tricks you want play.

        So while you wanted 16 TB of capacity, you actually buy 32+ much of what appears to be uncommitted.

      • by dave562 ( 969951 )

        From a human perspective, fuzzyfungus is right. Over-engineering is less likely to cost your job than failure. Plus, over-engineering is easy to justify.

        Exactly. I'm working for a company that provides a software based service to law firms. The bill to a single client can eclipse $150,000 PER MONTH. With that kind of money being thrown around, the expectation is that the application will be up and running, ALL THE TIME. As odd as it, there are people connected into the system at 4am sometimes (with a 15

  • Slashvertisement (Score:5, Insightful)

    by hcdejong ( 561314 ) <hobbes@nOspam.xmsnet.nl> on Wednesday July 28, 2010 @01:21PM (#33058488)

    for a storage monitoring system.

  • Overprovisioning (Score:4, Interesting)

    by shoppa ( 464619 ) on Wednesday July 28, 2010 @01:23PM (#33058516)

    It's so easy to over-provision. Hardware is cheap and if you don't ask for more than you think you need, you may end up (especially after the app becomes popular, gasp!) needing more than you thought at first.

    It's like two kids fighting over a pie. Mom comes in, and kid #1 says "I think we should split it equally". Kid #2 says "I want it all". Mom listens to both sides and the kid who wanted his fair share only gets one quarter of the pie, while the kid who wanted it all gets three quarters. That's why you have to ask for more than you fairly need. It happens not just at the hardware purchase end but all the way up the pole. And you better spend the money you asked for or you're gonna lose it, too.

    • by Maarx ( 1794262 ) on Wednesday July 28, 2010 @01:28PM (#33058584)
      That mother is terrible.
      • by Zerth ( 26112 )

        And works in the budgeting dept of a company I'm glad I'm no longer at.

      • Oh my mom was much more devious.

        She would let one of us cut the pie, and the other pick the first piece....

        Now imagine a 14 and 11 year old using nasa-style tools to divide a piece of pie ;)

    • Re: (Score:3, Insightful)

      Dad here. Had that fight (or similar). I asked a simple question to the kid who wanted it all. I asked him "all or nothing?" and again he said "all", to which I said "nothing".

      Of course he rightly cried "Not Fair!!!", and I said, you set the rules, you wanted it all, setting the rule up that you didn't want to be fair, I'm just playing by your rules.

      Never had that problem again. EVER.

      • by MagicM ( 85041 )

        I asked him "all or nothing?"

        At that point he was screwed. If he said "nothing", he could reasonably expect to get nothing. His only option was to say "all" if he wanted to get a chance at something.

        • Re: (Score:2, Insightful)

          Nope, he wasn't screwed, because it wasn't the only option; it was a false dichotomy. I gave him a chance to offer another choice, it was just veiled. Kobioshi Maru. He could have thought about it and said "half" even though that wasn't an obvious choice.

          I often give my kids tests to break them out of self imposed boxes (false dichotomy). Pick a number between 1 and 10 .... 1 - no, 2 - no, 3 - no, 4 - no .... 9 - no, 10 no ... THAT IMPOSSIBLE DAD!!.

          No it isn't. The number I had in mind was Pi.

          Raising kids

        • Re: (Score:3, Insightful)

          by Culture20 ( 968837 )

          At that point he was screwed. If he said "nothing", he could reasonably expect to get nothing. His only option was to say "all" if he wanted to get a chance at something.

          If my son (nobly or stubbornly) said "nothing", I'd offer him half or nothing. Parents are allowed to alter the deals. Pray that they alter them further.

      • by hoggoth ( 414195 )

        My dad used to try that fucking psychology on me. I wish he had just hit me and gotten it over with.

      • Re: (Score:2, Insightful)

        by sheph ( 955019 )
        Well done, man!! See, some folks just don't know what tough love is, and the positive impact it can have. You wanna run for office in 2012? We could use someone like you after the current round of buffoons!
    • by Jaime2 ( 824950 )
      Another factor is that it is way too expensive to re-provision. Where I work, you might as well ask for 5 times what you need, because if you go back and ask for an increase, the labor to do it costs more than the storage. I really shouldn't take 20 hours of someone's time to make my LUN bigger, but that's what the storage team will bill me for. If re-provisioning only required that you pay for the additional storage, I wouldn't worry about it.

      Good storage virtualization fixes most of these problems, b
  • Disk space is free (Score:5, Interesting)

    by amorsen ( 7485 ) <benny+slashdot@amorsen.dk> on Wednesday July 28, 2010 @01:27PM (#33058574)

    Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.

    I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.

    • Re: (Score:3, Interesting)

      by eldavojohn ( 898314 ) *

      Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.

      I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.

      I don't know too much about what you just said but I do know that the Linux images I get at work are virtual machines of a free distribution of Linux. I can request any size I want. But my databases often grow. And then the next thing is that a resizing of a partition is very expensive from our provisioner. So what do we do? We estimate how much space our web apps take up a month and then we request space for 10 years out. Because a resize of the partition is so damned expensive. And those sizes are

      • Those virtual machines are stored on a real SAN somewhere. The SAN administrator deals with all the things the GP said, that is why you don't need to understand it. Anyway, he'd better have some spare capacity and plan based on I/O, and not storage size (he probably did), otherwise, you'll have big unknown risks.

      • by dave562 ( 969951 )

        There are some downsides and a bit of overhead to pay for virtualization but I thought everyone had moved to this model ...

        And virtualization isn't always the way to go. It is great for a lot of environments, but sometimes you have an application that really does need all of the cores and all of the RAM a box might have.

    • Re: (Score:3, Interesting)

      by bobcat7677 ( 561727 )
      Parent has an excellent point. Utilization is not always about how full the disk is...especially in a data center where there is frequently large database operations requiring extreme amounts of IOPS. In the past, the answer was to throw "more spindles" at it. At which point you could theoretically end up with a 20GB database spread across 40 SAS disks making available ~1.5TB of space using the typical 73GB size disks just to reach the IOPS capacity needed to handle heavy update/insert/read operations.
    • The $1million / 100TB might be real, though it seems high, but he great majority of that is NOT hardware costs. In fact having larger disks than you need may reduce the management costs - less chance a particular disk set will become full, extra space to move data from failing disks, etc.

  • by Todd Knarr ( 15451 ) on Wednesday July 28, 2010 @01:37PM (#33058736) Homepage

    Having too much storage is an easy problem. Sure it cost a bit more, but not prohibitively so or you'd never have gotten approval to spend the money. Not having enough storage, OTOH, is a hard problem. Running out of space in the middle of a job means a crashed job and downtime to add more storage. That probably just cost more than having too much would've, and then you pile the political problems on top of that. So common sense says you don't provision for the storage you're going to normally need, you provision for the maximum storage you expect to need at any time plus a bit of padding just in case.

    AT&T discovered this back in the days when telephone operators actually got a lot of work. They found that phone calls tend to come in in clumps, they weren't evenly distributed, so when they staffed for the average call rate they ended up failing to meet their answer times on a very large fraction of their calls. They had to change to staffing for the peak number of simultaneous calls, and accept the idle operators as a cost of being able to meet those peaks.

    • Actually, telcos ramp their operator staff up and down in response to expected call volume based on historical time of day and day of week trends.

      This was quite important when I worked on an automated 411 services that front-ended operators with computers doing speech recognition. Only if the system could not help the caller would it get routed to a human.

      Regulatory requirements placed limits on the distribution of time to queue for an operator but not as stringent ones on returning a busy signal when staff

  • 2 146GB drives from HP are less than $500 for the SAS drives. you can put the same storage on an EMC SAN and provision less for the system drive for a Windows server but by the time you pay their crack dealer prices for hard drives along with the drives for the BCV volumes and pay for the fiber switches and g-bics and HBA's and everything else it's cheaper to waste space on regular hard drives

  • CYA Approach (Score:5, Informative)

    by MBGMorden ( 803437 ) on Wednesday July 28, 2010 @01:40PM (#33058770)

    This is the CYA approach, and I don't see it getting any better. When configuring a server, it's usually better to pay the marginally higher cost for 3-4x as much disk space as you think you'll need, rather than risk the possibility of returning to your boss asking to buy MORE space later.

    • And it may well make economic sense too at least if you are talking about a low end server with a pair of SATA drives (though it depends how much your server vendor rips you off on hard drives).

      Upgrading drives later has a lot of costs on top of the raw cost of getting the extra drives.

      How much does it cost to get hold of those extra drives? (at uni recently someone told me that the total cost of processing a purchase order worked out to about £40 now admittedly some of that is fixed costs but still i

    • Did you factor in how expensive is it to change storage size, and the costs of failing to change it? Also, there is the cost of adding some storage that isn't compatible to the first chunk. The amount you pay for oversized storage normaly isn't even on the same order of magnitude of all of those.

  • by alen ( 225700 ) on Wednesday July 28, 2010 @01:45PM (#33058852)

    time to go and buy up all kinds of expensive software to tell us something or other

    it's almost like the DR consultants who say we need to spend a fortune on a DR site in case a nuclear bomb goes off and we need to run the business from 100 miles away. i'll be 2000 miles away living with mom again in the middle of no where and making sure my family is safe. not going to some DR site that is going to close because half of NYC is going to go bankrupt in the depression after a WMD attack

    • it's almost like the DR consultants who say we need to spend a fortune on a DR site in case a nuclear bomb goes off and we need to run the business from 100 miles away.

      Flood, earthquake, hurricane (yes, possible even in New York), sink hole, etc.

      Are you really going to go primeval when any one of those things happens?

      First thing, of course you're going to find out if your family is fine. Assuming so, then what? Not only has their home been destroyed, but your job is gone too, so you'll now be dependant on

    • by asc99c ( 938635 )

      > in case a nuclear bomb goes off

      Or even more far-fetched, someone brings in a fan heater from home, forgets to switch it off one evening, some paper blows into the elements and sets on fire, and it burns down the building.

      Keeping an off-site backup is not a ridiculous idea in itself. Could the business survive if the office burned down and all servers and data was lost? Maybe if employees are allowed to take data home, most stuff could be pieced back together, but even then it would be a substantial a

  • by shmlco ( 594907 ) on Wednesday July 28, 2010 @01:47PM (#33058874) Homepage

    This isn't like an ISP overbooking a line and hoping that everyone doesn't decide to download a movie at the same time. If a hosting service says your account can have 10GB of storage, contractually they need to make sure 10GB of storage exists.

    Even though most accounts don't need it.

    One client of mine dramatically over-provisioned his database server. But then again, he expects at some point to break past his current customer plateau and hit the big time. Will he do so? Who can say?

    It may be a bit wasteful to over-provision a server, but I can guarantee you that continually ripping out "just big enough" servers and installing larger ones is even more wasteful.

    Your pick.

  • This is one of the arguments that's made for using a SAN. Consolidate to make better use of the disk space. Smaller footprint, less power, etc.
    • However SANs have issues of thier own

      1: they are EXPENSIVE, figure you will be paying many times the cost per gigabyte of ordinary drives. Particually if you buy the SAN vendors drives so you get support. This tends to cancel out more efficiant use of space.
      2: Even a 1U server has space for a few drives inside, so if you use a SAN with 1U servers it will probablly take up more space than just putting the drives in the servers. Blades would reduce this issue but come with issues of thier own (e.g. vendor loc

  • So ... 100 TB / 1 Million ==> 1 TB / $10,000.

    A 1 TB drive is 60-100 dollars.
    The KW/h required to run a 60 watt drive 24/7 = 60/1000 KW x 24 hours x 365.25 days = 526 KW/h.
    At .12 cents per KW/h, that's 63.12 per year.

    Even if we double or triple the hardware costs, they will only make up a few percentages of the 10 grand per TB cited here.

    The labor to maintain 100 or 200 or 400 drives is going to be relatively constant. In fact, with a little more reasonable monitoring software (just reporting drive fai

  • Instead of a medium number of large systems, I wonder whether it would make more sense to have a larger number of mini-itx type units that could be:
      - easily replaced
      - put in stand-by when no access - smart load balancer would decide when to wake up sleeping units.
      - simplified cooling?

    It would also be nice for a universal back-plane design to support plugging in boards from any company, with minimal or zero cabling.
     

  • Billions of dollars are also wasted every year in the manufacturing and transporting of fire extinguishers, 99% of which will probably never be used.

  • No... (Score:3, Interesting)

    by rickb928 ( 945187 ) on Wednesday July 28, 2010 @02:11PM (#33059242) Homepage Journal

    "It's a bit of a paradox. Users don't seem to be willing to spend the money to see what they have,"

    I think he meant users don't seem willing to spend the money to MANAGE what they have.

    As many have pointed out, you need 'excess' capacity to avoid failing for unusual or unexpected processes. How often has the DBA team asked for a copy of a database? And when that file is a substantial portion of storage on a volume, woopsie, out of space messages can happen. Of course they should be copying it to a non-production volume. Mistakes happen. Having a spare TB of space means never having to say 'you're sorry'.

    Aside from the obvious problems of keeping volumes too low on free space, there was a time when you could recover deleted files. Too little free space pretty much guarantees you won't be recovering deleted files much older than, sometimes, 15 minutes ago. In the old days, NetWare servers would let you recover anything not overwritten. I saved users from file deletions over the span of YEARS, in those halcyon days when storage became relatively cheap and a small office server could never fill a 120MB array. Those days are gone, but without free space, recovery is futile, even over the span of a week. Windows servers, of course, present greater challenges.

    'Online' backups rely on delta files or some other scheme that involves either duplicating a file so it can be written intact, or saving changes so they can be rolled in after the process. More free space here means you actually get the backup to complete. Not wasted space at all.

    Many of the SANs I've had the pleasure of working with had largely poor management implementations. Trying to manage dynamic volumes and overcommits had to wait for Microsoft to get its act together. Linux had a small lead in this, but unless your SAN lets you do automatic allocation and volume expansion, you might as well instrument the server and use SNMP to warn you of volume space, and be prepared for the nighttime alerts. Does your SAN allow you to let it increase volume space based on low free space, and then reclaim it later when the free space exceeds threshold? Do you get this for less than six figures? Seven? I don't know, I've been blessed with not having to do SAN management for about 5 years. I sleep much better, thanks.

    Free space is precisely like empty parking lots. When business picks up, the lot is full. This is good.

  • Comment removed based on user account deletion
  • Unlike in the movie "This is Spinal Tap" there is not an 11 on the volume control for storage capacity in a data center. We will not see proud proclamations from boards of directors "today we are running our data storage at 115% of capacity!"

    Having been in the predicament many times of frantically trying to ration out disk storage space for some critical application at 3 AM Sunday morning I think that running data centers at 80-90% is being conservative and may save your ass the next time you cannot get int

  • by natoochtoniket ( 763630 ) on Wednesday July 28, 2010 @03:03PM (#33060182)

    There are two numbers that matter for storage systems. One is the raw number of gigabytes that can be stored. The other is the number of IO's that can be performed in a second. The first limits the size of the collected data. The second limits how many new transactions can be processed per time period. That, in turn, determines how many pennies we can accept from our customers during a busy hour.

    We size our systems to hit performance targets that are set in terms of transactions per second, not just gigabytes. Using round numbers, if a disk model can do 1000 IO/second, and we need 10,000 IO/second for a particular table, then we need at least 10 disks for that table (not counting mirrors). We often use the smallest disks we can buy, because we don't need the extra gigs. If the data volume doesn't ever fill up the gigabyte capacity of the disks, that's ok. Whenever the system uses all of the available IO's-per-second, we think about adding more disks.

    Occasionally a new SA doesn't understand this, sees a bunch of "empty" space in a subsystem, and configures something to use that space. When that happens, we then have to scramble, as the problem is not usually discovered until the next busy day.

    • And that's not even the whole picture: When dealing with databases, not all IO operations are equal. Reading a million records on a sequential scan in a certain part of the disk is different than reading them on a different part of the disk, or reading said records in a random order.

      Large amounts of empty space are just the nature of data warehousing, and there's no way to go around that. In some cases, the RAM expense is even higher than the expense on disk, because for cases where a lot of throughput is n

  • Comment removed based on user account deletion
  • I can't be sure which one. On second thought, make that second one ``stupidity''. I still can't decide which one's really at work.

    I attend a daily conference call where I hear case after case where people are waiting for SAN space to be reclaimed so that it can be reassigned to other systems. People are either being told their projects cannot proceed because there's not enough disk space or other projects have been held up while the space they've been allocated is scaled back to allow others to work.

    I'

    • Stupid mistake starts with the idea of using DEDICATED RESOURCES on a SAN. SAN's one and of the few perks is central storage to minimize waste, therefore maximize return on the investment.
      Then again, all SAN implementations i've seen yet in production has been idiotic, moronic and stupid (yes, i know ....)

      Some of the implementations i've seen has only caused headaches, downtime, poor performance etc. with an insanely huge investment. Tell me again, what's the benefit of having a SAN when it makes the end-re

  • We happily manage about 75-90tb total storage across our servers (really, i've forgotten how much exactly is total across all different server types), with less than 1$ million. Way less than 1$ million.

    Then again, when you go the expensive route (say EMC), it's certainly expensive. But if senses are stuck together, and only acquires features required, storage doesn't need to be expensive. There's plenty of ways of running servers with storage costing almost as little as consumer storage, when some thought

  • We're growing so fast that we can barely keep up with the demand. Maybe I can run a few cross connects into my neighbor's cage and borrow some of their unused space?

  • Data centers already provide power and network - why don't they also provide shared storage?

    The data center can buy several big mean SANs, and then provision storage sets to customers. Customers then only have to pay for what they use, can add space without installing new hardware, and get speed and reliability.

  • In the world of storage area networks you must design too support the IO load first and capacity will typically never be an issue - in tier one and two storage.

    With cache cards and ssd becoming cheaper this rule is changing but for many SANs they have wasted space only because they needed more spindles to support the load.

"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry

Working...