Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Android Hardware Technology

The Fight Against Dark Silicon 137

An anonymous reader writes "What do you do when chips get too hot to take advantage of all of those transistors that Moore's Law provides? You turn them off, and end up with a lot of dark silicon — transistors that lie unused because of power limitations. As detailed in MIT Technology Review, Researchers at UC San Diego are fighting dark silicon with a new kind of processor for mobile phones that employs a hundred or so specialized cores. They achieve 11x improvement in energy efficiency by doing so."
This discussion has been archived. No new comments can be posted.

The Fight Against Dark Silicon

Comments Filter:
  • by thisisauniqueid ( 825395 ) on Friday April 29, 2011 @11:59PM (#35982340)
    Language support for ubiquitous and provably threadsafe implicit parallelization -- done right -- is the answer to using generic dark silicon -- not building specialized silicon. See The Flow Programming Language, an embryonic project to do just that: http://www.flowlang.net/p/introduction.html [flowlang.net]
    • programmer-safe language.

      That's just asking for trouble,that's like saying a keyboard is safe from illiterate people because it has letters printed on the keys.

      • ...that's like saying a keyboard is safe from illiterate people because it has letters printed on the keys.

        Sadly, that statement is true. An illiterate person will shy away from a keyboard, an on screen (TV) menu, a newspaper, etc. the same way someone who is broke is embarrassed by the sight of a checkbook or wallet... it becomes a reflex. I know someone who is a good intuitive mechanic, but somehow managed to get to adulthood with less than third grade reading and writing skills. Left to himself, a typical 5 page job application takes a couple of hours and many phone calls to complete. Now he has a 2 year old

        • cats are illiterate, they walk all over the bloody keyboard causing all kinds of havoc.

          "I know someone who is a good intuitive mechanic, but somehow managed to get to adulthood with less than third grade reading and writing skills.",
          quite possible the way that he learns things (ergo... schools are crap)

          I have/had that problem, in that language is generally poorly designed and people like to fuck with other peoples heads. But I worked out how they do that now and it kind of, mostly, started to sort itself ou

        • by Anonymous Coward

          My Dad was a bit like that, Mum introduced him to Science Fiction and made him read enough to get him hooked on the story. He now has no problem with reading.
          Took me longer than most kids to pick up reading so she used the same technique on me, couple of years later my reading comprehension was far above my age group.
          Writing and spelling have never really caught up though. still have problems with that at 30.

        • by Kjella ( 173770 )

          Left to himself, a typical 5 page job application takes a couple of hours and many phone calls to complete.

          Not so many phone calls, but job applications can take me a while. The "spray and pray" variety may be useful if you're unemployed, but if you already have a job and it's one of those rare opportunities I could easily spend 2 hours on it. Not because of language problems but for making the best possible application for the position. It's usually well spent time.

      • Comment removed based on user account deletion
      • by aix tom ( 902140 )

        Definitely. As a programmer myself, I can switch *language* pretty quick. There are even some pretty easy to use GUI tools out there where "normal non-programmers" can implement something.

        The problem is that very few people seem to be able to LOGICALLY solve a problem, that is define what should happen when certain conditions are met. Basically the definition of "what should the program do exactly?". Getting THAT defined is 90% of the "programming" problem. And that can't really be solved by different langu

        • ohhh.. often the problem is that they can define when logical conditions are met, they just can't then generalize and turn it into patterns etc... and well that's all too much like hard work when I can just hack and slash myself through the day...

          it's like they've written a function in C++ but the body of the function looks more like very bad prologue.

          void foobar(int &a)
          {
          int tmp = a;
          if (a = 1)
          {
          a*=a;
          a=(int)sqrt(float)a));
          a++;
          if (a + 1 = 2 )
          {
          printf ("goofie%d", a);
          }
          a--;
          } else if (a

    • That would be so much better if it wasn't in "early design" stage. Their "no garbage collection" plan seems particularly worthwhile.
    • by Anonymous Coward on Saturday April 30, 2011 @01:06AM (#35982622)

      Uuum, no need to learn some obscure weird language that doesn't even exist yet, when you can learn a (less) obscure weird language that already exists. ;)

      Haskell already has provable thread-safe implicit parallelization. In more than one form even. You can just tell the compiler to make the resulting binary "-threaded". You can use thread sparks. And that's only the main implementations.

      Plus it is a language of almost orgasmic elegance on the forefront of research that still is as fast as old hag grandma C and its ugly cellar mutant C++.

      Requires the programmer to think on a higher level though. No pointer monkeys and memory management wheel reinventors. (Although you can still do both if you really want to.)

      Yes, good sir, you can *officially* call me a fanboy.
      But at least I'm a fan of something that actually exists! ;))

      (Oh, and its IRC channel is the nicest one I've ever been to. :)

      • You're right about Haskell being a beautiful language, but it is not as fast as C/C++. Even Java is usually faster. It's still pretty fast for a declarative language and has a C interface for when you need to speed up certain parts of code.
        • by m50d ( 797211 )

          You're right about Haskell being a beautiful language, but it is not as fast as C/C++.

          Depends on the problem. My previous company found the Haskell proxy we wrote for testing could handle 5x the load of the best (thread-based) C++ implementation.

        • by Intron ( 870560 )

          You're right about Haskell being a beautiful language, but it is not as fast as C/C++. Even Java is usually faster. It's still pretty fast for a declarative language and has a C interface for when you need to speed up certain parts of code.

          Who cares? CPUs are 1000X as fast as they were 12 years ago, but I/O speed has barely changed. There are no CPU-bound problems anymore.

          The only thing that matters is programmer efficiency. It takes 5X as long to write C code as to solve the same problem in a modern language.

      • Plus it is a language of almost orgasmic elegance on the forefront of research that still is as fast as old hag grandma C and its ugly cellar mutant C++.

        People always claim this. And always against C and C++. It's essentially never true except for a) FORTRAN and b) occasional synthetic benchmarks. While it is undeniable elegant, the lack of for-loops is anything but elegant in scientific computation, image processing etc.

        Does Haskell allow you to parameterize types with integers yet? It didn't last time I

        • the lack of for-loops is anything but elegant in scientific computation, image processing etc.

          What's the difference between imperative "for" and functional "map" for iterating through a collection? Python has both, and I end up using generator expressions (which use syntax not unlike "for" and the semantics of "map") at least as often as an ordinary for-loop.

          • the lack of for-loops is anything but elegant in scientific computation, image processing etc.

            What's the difference between imperative "for" and functional "map" for iterating through a collection?

            Perhaps the inelegance comes because, with for loops, you can:

            • break out of it early
            • iterate a subsequence without actually creating a subsequence
            • step across several elements on each iteration

            and you can't do any of that with a map operation. Not the ones I've seen, anyway.

            • I don't know Haskell or ML, but I do know the itertools module in Python [python.org], which represents a bunch of lazy-evaluated iteration concepts borrowed from Haskell and ML.

              with for loops, you can: * break out of it early

              Some cases of breaking early can be represented as composition of iterator operations: "Find the first ten elements that meet these criteria" is something like islice(ifilter(criterion, seq), 10) where criterion is a function returning nonzero for elements that match. "Find all elements until the first not meeting the criteria" is takewhile(crit

              • Yeah, itertools seems like a really nice library. I wish I had something similar in other languages.

                But, you know, the itertools functions are just inefficient applications of a normal map operation. Islice, for example, iterates from the start, skipping everything until it gets to the elements you want to process. A proper for loop does not need those nop iterations, and is thus more elegant (for a certain definition of elegant).

                I will grant that breaking out of the iteration works just peachy; I figure Py

                • by tepples ( 727027 )

                  Islice, for example, iterates from the start, skipping everything until it gets to the elements you want to process. A proper for loop does not need those nop iterations

                  If you know all the elements up front, in a numbered sequence of some sort (such as an array), you can use a regular slice (e.g. some_list[10:20]).

        • by drb226 ( 1938360 )

          Asking for "for loops" will make most functional programmers chuckle. Usually what you want is a fold (or a special fold like a filter or a map). Speaking of parallelization, the semantics of generalized for loops require that each iteration be performed sequentially. What if you want to perform each iteration in parallel?

          As for number-parameterized types, I haven't dealt with it myself, but I'll just leave this here: Number-parameterized types by Oleg [psu.edu]

        • I think you're behind the curve a bit - some review of the features of functional languages may be in order. Haskell really is fast, from what I've read and seen. Some of the 'program some standard thing in a zillion languages' websites have example Haskell implementations that are pretty performant.

          The key to the 'no loops' issue is simple - tail recursion [wikipedia.org]. I quote excerpts:

          In computer science, a tail call is a subroutine call that happens inside another procedure and that produces a return value, which is then immediately returned by the calling procedure.
          [...]
          Tail calls are significant because they can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is not needed any more, and it can be replaced by the frame of the tail call, modified as appropriate. The program can then jump to the called subroutine.
          [...]
          in functional programming languages, tail call elimination is often guaranteed by the language standard, and this guarantee allows using recursion, in particular tail recursion, in place of loops.

          I personally like Erlang better - it's more of a 'real world' rather than 'ivory tower' language in its approach and I find it easi

      • Haskell reminds me of glossolalia. Every time I take a look at it all I see is gobbledygook with the proponents of it claiming they have seen God.
      • by tepples ( 727027 )

        Haskell [...] is as fast as old hag grandma C and its ugly cellar mutant C++.

        As I understand it, purely functional languages use a lot of short-lived immutable objects and therefore generate a lot more garbage than languages that rely on updating objects in place. If your target machine is a handheld device with only 4 MB of RAM, this garbage can mean the difference between your design fitting and not fitting. And for a design on the edge of fitting, this garbage can mean the difference between being able to keep all data in RAM and slowing down to read the flash over and over.

        • I wouldn't use Haskell (or Erlang) to write a device driver; I wouldn't use C to write much of anything else. Just as we have automotive vehicles that range from scooters to huge Terex earthmovers, it's important to use the right tool for the job. A friend of mine used to have a Volkswagen bug that had been converted to a pickup truck. It wasn't pretty! :D I wouldn't run a million row relational database on your 4MB device either.

      • by JamesP ( 688957 )

        I like Haskell but it has its warts.

        The main problem of Haskell is going "full functional", with monads, etc. Monads are very difficult to understand and master.

        Still, I think Haskell is much more close to "the solution" than Lisp for example. (or maybe Scala gets better)

        Not to mention it's great to play with Hugh/GHC with its interactive console

        • The main problem of Haskell is going "full functional", with monads, etc. Monads are very difficult to understand and master.

          Monads are what makes Haskell interesting. Take a look at some of the stuff like the STM implementation from Simon Peyton-Jones's group, for example. Functional programming without monads is just imperative programming with a bunch of irritating and pointless constraints.

        • Frankly, I was thinking the other way around. A pattern matching Lisp macro library would really hit the spot, though OCaml style grammars would be the nail in the coffin for other functional langs.
    • No need to reinvent the wheel. Plenty of stuff out there, based on functional programming model which by design can be setup to parallelise well. I know some folk messing around with this: not my particular area of interest, but demonstrates that this is a well understood problem space with alot of clever people already having committed alot of hours of brainwork over long periods of time to progress solutions in this problem space. Mercury Programming Language [wikipedia.org]
    • What's your plan of attack on GC? Reference counting doesn't pause; but fails if you create cyclic references. Mark and Sweep doesn't have that problem; but creates the dreaded pause. The state of the art, AFAIK, is to check for recently created objects and kill them early (generational GC). There are heuristics to avoid a full mark-and-sweep; but AFAIK there aren't any airtight algorithms.

      Now I wonder, is it possible to do a static analysis on a parse tree for some language and determine whether or not

      • You can create memory in "arenas" (an overloaded word, unfortunately) where the entire arena is freed at once. e.g. you can create a graph as a collection of nodes that can have arbitrary inter-connectivity. When the the last reference to anything in the collection is dropped, the whole collection is freed. This will cover a lot of cases with circular deps.
      • Immutable objects don't allow cycles to form without notice - in Haskell, and Tcl interestingly, it's a non-issue.
    • I checked it out and it sounds really interesting, but at the moment all they seem to have is the idea. Not to say it isn't a very good idea, but I think the main challenge will be making such a language intuitive and human readable whenever they get that far.
    • I don't think that's an answer to the same problem. The problem is that it simply isn't possible to make a general purpose processor arbitrarily small due to power dissipation. You can parallelize all you want, you still might not hit the same performance for specific tasks that optimizing the processor architecture itself will. Quite clever if chips customized to particular phones can be cost effective.

    • by NoSig ( 1919688 )
      That helps, but it will still require powering up all the silicon you are using. In this apporach you only power up the part of the special-purpose silicon you need, but in return get much greater speed out of that piece of silicon. This is more power-efficient if you need some of what is on the chip.
    • Actually, no. Dark Silicon is not about having too many cores to effectively use them all at the same time. It's about maintaining a power envelope when your number of transistors is going up.

      Currently we lower the voltage when we increase the number of transistors, which keeps power usage and heat generation in check. But we're at the limit of what voltage will work, especially since electron leakage becomes more of a problem the smaller your transistor. So dark silicon will be necessary to ensure reasonab

    • by drb226 ( 1938360 )
      Wake me up when it's released.
  • Not required.. (Score:5, Informative)

    by willy_me ( 212994 ) on Saturday April 30, 2011 @12:11AM (#35982382)
    The CPU in a cell phone does not use much power so there is little to gain. Now if you can make more efficient radio transceivers - that would be something. Or the display, that would also significantly reduce power consumption. But adopting a new, unproven technology for minimal benefits.... That's not going to happen.
    • Agree 100%

      The two biggest power draws are the screen and radios. This is what needs to be made more efficient.

      With proper GUI design and AMOLED screens, the screen power draw can be drastically reduced but things like the 3G radio drain power like mad if the signal isn't perfectly strong (while 4G radios gobble power under all circumstances).
      • Hell, even in laptops that's the case. I've got a high-end (well, medium-end now, but two years have gone by) gaming laptop. I've noticed that the biggest power draw is unquestionably the display - just turning the brightness down triples my battery life. Then comes turning off the wifi/bluetooth (there's a handy switch to do so), which gives me an extra half-hour. And this is while the Crysis-running CPU and graphics card are on normal gaming settings. Setting the CPU to half the clock speed barely gives m
        • So, what we really need to be working on is connecting the output directly to your brain and skipping all that wasted light.
          • by artor3 ( 1344997 )

            Well, some companies are working on HUD goggles for personal computers, so I guess that's a step in the right direction, even if it does make you look like a total dork.

            • WANT!

              Stareing into someone's eyes, you'll be able to see the porn they are viewing reflected off their cornea.
              • Of course CSI Miami would do this through a surveillance camera, zooming into the reflection and determining the eye color of the porn actress.
    • by artor3 ( 1344997 )

      Unfortunately, there's a pretty fundamental problem with making more efficient transceivers. They have to operate, by their very nature, at high frequencies. High frequency signals inevitably draw more current, because they see capacitors as having a low impedance. Basic EE stuff: Z = 1/(jwC). And how do we generate the radio frequency? With a VCO that invariably involves big capacitors (big for an IC, at any rate). Those VCOs typically end up drawing at least 50-60% of your operating current.

      Another

      • I wonder if you could make a high gain steerable antenna to track the dish on the cell tower while you're transmitting? Or, if it *really* tracked accurately, how much power would be needed to transmit the signal over the same range with a laser?
        • by artor3 ( 1344997 )

          Lots of problems with the laser... how does the phone know where the closest tower is, especially if you turn it off while flying across the country or the world? What happens if something gets in the way of the laser? Can you even use it inside?

          A directional, directable antenna might be possible, but it still presents some problems. You'd need a moving part with three axes of motion that can respond as quickly as a person swings a phone from one ear to another, and you'd need the motors behind it to ope

          • You can do a directional, steerable antenna with no moving parts using software radio and an array. But it'd still be impractical for phones - too big, and it'd take even more power. Perhaps at the base stations it might be of more use. The better the SNR you recieve by excluding sources of interference, the less power the phone needs to transmit. With an array and the right software, a base station could have a thousand virtual directional antennea, all turning to track an individual handset in real time.
        • I wonder if you could make a high gain steerable antenna to track the dish on the cell tower while you're transmitting?

          What you described could be done without physically moving the antennas. Read a paper on it a few years ago (sorry, no link) where some researchers built an antenna on a chip that consisted of hundreds of different physical antennas. By applying the signal to different antenna at different times, a directional beam can be formed, much like yagi. But unlike a yagi, the beam can be sent in any direction, one just has to alter the timing. I believe it is similar to how modern RADAR works.

        • Google Phased array antenna, it's a non mechanical directional antenna. Also suggest looking up ceramic resonators, they allow building much smaller radio transievers with lower power use. If the VCO uses half the power then don't use one, just have fixed frequencies using crystals (trade larger size for lower power).
          • by artor3 ( 1344997 )

            You can't really get around using a VCO. Ceramic resonators lack the accuracy needed for complex modulation. You need the sub-100 ppm accuracy of a crystal, but crystals can't operate in the GHz range. So you use the crystal as a reference for a PLL and multiply the frequency up to the desired level, and VCOs are an integral part of PLLs.

            Also, using a ceramic resonator wouldn't allow for "smaller" transceivers. Transceivers are already integrated circuits, with the only external components being a cryst

      • by Anonymous Coward

        And how do we generate the radio frequency? With a VCO that invariably involves big capacitors (big for an IC, at any rate). Those VCOs typically end up drawing at least 50-60% of your operating current.

        I can tell you from direct experience that the VCO (voltage controlled oscillator) used to generate the high frequency carrier is not the issue. The radio I'm messing with right now, in standby with the xtal and VCO running draws 200uA. 50uA with just the xtal. The problem with the radio's is

        1) The demodulation circuitry is computationally expensive. Though as dies shrink this becomes less of an issue.
        2) Transmit power, here you are up against a wall. You need to transmit a high enough power so that th

        • by artor3 ( 1344997 )

          What frequency are you working in? Obviously something like AM/FM radio, RFID, or TV will draw low current because the frequency is relatively low. But once you start getting up to the GHz range, there's no way a VCO could draw that little current, unless your entire radio had less than 30 fF of capacitance. Are you sure it's the RF VCO that's running, and not some IF one?

      • by Agripa ( 139780 )

        Basic EE stuff: Z = 1/(jwC). And how do we generate the radio frequency? With a VCO that invariably involves big capacitors (big for an IC, at any rate). Those VCOs typically end up drawing at least 50-60% of your operating current.

        The capacitor in a VCO is in a tuned circuit so the circulating power can indeed be high but the actual power draw is much much lower. If this was not the case, then the tuned circuit Q would be low leading to high oscillator noise.

        Close in phase noise is actually a significant

    • The CPU in a cell phone does not use much power so there is little to gain.

      Except when it's running Flash video or similar crap.

      • I found two things will drain my phone quickly: Angry Birds and ebuddy. The latter I assume because it keeps the radio on continually.
  • by file_reaper ( 1290016 ) on Saturday April 30, 2011 @12:11AM (#35982386)

    http://cseweb.ucsd.edu/users/swanson/papers/Asplos2010CCores.pdf [ucsd.edu]

    They call the specialized cores "c-cores" in the paper. I took a quick skim through it. C-cores seem like a bunch of FPGA's and they take stable apps and synthesize it down to FPGA cells with the use of the OS on the fly. The C-core to hardware chain has Verilog and Synopsis in it.

    Cool tech, guess they could add gated clocking and all the other things taught in classroom to further turnoff these c-cores when needed.

    cheers.

  • A couple of thoughts:

    1. The common functionalities surely would include OS API's, as they seem pretty stable. But would they include common applications such as social networking apps, office apps, etc.?

    2. If a patch is necessary, then upgrading hardware might be a little tricky. This will become a serious issue with the invasion of malware.

  • by Anonymous Coward

    The Sinclair ZX81 replaced fourteen of the chips used on the ZX80 with one big programmable logic array chip that was only supposed to have 70% of the gates programmed in it. However, Sinclair used up all the gates on the chip and it ran nice and hot because of that. I suppose that the design could have used two chips instead, leading to lots of dark silicon and a cost implication.

  • I realise openjdk's is stack-based vm and dalvik is register-based. But aren't they essentially mapping virtual machine instructions to hardware instructions? In a rudimentary manner this was tried a decade ago with Java. It was found that general purpose processors would spank a Java-CPU in performance due to the way that a VM would interact with a JIT instead of processing raw instructions.

    [Aside - ARM does include instructions for JVM-like code - Jazelle/ThumbEE. Can/does Dalvik even take advantage ?]

    Th

    • by jensend ( 71114 )

      A quick Google search only turns up one serious discussion [mail-archive.com] about the possibility of a ThumbEE - oriented Dalvik. The only reply wasn't very optimistic about it, saying that a 16-cycle mode switch between ThumbEE and regular instructions makes it unlikely to be worth it.

      More's the pity- I really think VM guys and processor design folks need to get their heads together.

      • Cheers. I'm assuming the original instructions were concocted for Sun's proprietary Java ME/SE embedded platforms - AFAIK, none of which supported has made it into phoneME, openjdk.

        Maybe if MIPS had 'won' on phones we'd greater synergy e.g. The reverse of NestedVM.

  • - then I'll be impressed. Currently I sit at 3 days with very heavy usage, and 5-6 days with low to moderate usage. If this sort of mult-core stuff breaks the all important one week barrier, then it'll be a welcomed technology.
    • by RobbieThe1st ( 1977364 ) on Saturday April 30, 2011 @02:45AM (#35982872)

      They can, they just don't want to. All they have to do is make it slightly thicker amd double the size of the battery.
      Heck, I want to see a phone where the battery is the back cover(like the old Nokia dumbphones), and also has a small second battery inside it, something that can power the ram/cpu for 5 minutes.
      Then, you can just yank the dead battery, plug a new one in /without rebooting/.
      It would also allow for multiple battery sizes: Want a slim phone? Ok, use a small battery. Need two weeks of life? use a large battery.

      Easy solution.

      • by Sloppy ( 14984 )

        They can, they just don't want to.

        This is one of the great mysteries of the phone market, a situation where it seems to my ignorant amateur eyes that they're doing the same thing as the MPAA companies: saying, "No, we don't want your money. Fuck off, customers. Go find someone else to do business with."

        Wouldn't a 2011 phone whose battery lasts as long as a 2006 phone sell like hotcakes? Is "slim" really all that "cool?"

        • Honestly, yes. Go look on the Maemo forum: There's a "mugen" aftermarket battery with double the capacity of stock. It makes the phone about a quarter-inch thicker, though.
          Some people like it, others complain that the phone's /already/ too fat!

    • long ago i saw phones from philips(!) that they claimed had a battery life measured in months. the main problem with your need is that most people are ok with 5-6 days of battery on a smartphone.

  • guess using unused space is a good thing, but will it be cost effective to make these huge low nm chips? it might be more cost efficient to include two higher process chips. also batteries are always getting a little better (albeit very slowly). i think android phones especially would benefit from more cores. there are hundreds of threads running on that OS with just a few apps open.
  • Dark Silicon: Luke, I am your father.

  • the claim is that this is the most power efficient design route.

    the problem is that there just aren't the sophisticated tool sets you need for design and analysis.

    of course I've never been clear on why you couldn't just use the asynchronous design ideas and substitute
    very low clock speeds in place of disables or some such thing.

    not a digital designer so can't get too far into the details.

  • by Animats ( 122034 ) on Saturday April 30, 2011 @03:21AM (#35982972) Homepage

    Specialized CPU elements have been tried. The track record to date is roughly this:

    • Floating point - huge win.
    • Graphics support - huge win, mostly because graphics parallelizes very well.
    • Multiple parallel integer operations on bytes of a word - win, but not not a huge win.
    • Hardware subroutine call support, such as register saving - marginal to negative. Many CPUs have had fancy CALL instructions that were slower than writing out the register saves, etc.
    • Hardware call gates - in x86, not used much.
    • Hardware context switching - in some CPUs, not used much.
    • Array instructions - once popular at the supercomputer level, now rare.
    • Compression/decompression (MPEG, etc.) - reasonable win, but may be more useful as part of a graphics device than a CPU.
    • List manipulation, LISP support, Java stack support - usually a lose over straight code.
    • Explicit parallelism, as in Itanium - usually a lose over superscalar machines.
    • Filter-type operations (Fourier transform, convolution, wavelets, etc.) - very successful, but usually more useful as part of a signal processing part than as part of a CPU.
    • Inter-CPU communication - useful, but very hard to get right. DMA to shared memory (as in the Cell) seems to be the wrong way. Infiniband, which is message passing hardware, is useful but so far only seen in high end machines.
    • Hardware CPU dispatching - has potential if connected to "hyperthreading", but historically not too successful.
    • Capability-based machines. - successful a few times (IBM System/38 being the best example) but never made it in microprocessors.

    A lot of things which you might think would help turn out to be a lose. Superscalar machines and optimizing compilers do a good job on inner loops today. (If it's not in an inner loop, it's probably not being executed enough to justify custom hardware.)

    • One thing that I think *would* be a win for scientific calculation programs of many sorts:

      A program that takes two arrays of doubles,

      A1,A2,A3,A4... A98, A99,A100...
      B1, B2, B3,B4... B98,B99,B100...

      And given the start of A, start of B, and number of elements in each, parallelizes the sumproduct of
      A1*B(N) + A2*B(N-1) + A3*B(N-2)... + A(N-1)B2 + A(N)B1.

      The reason for this, is that many, many differential equation initial-value-problems can be solved exactly using the Parker-Sochacki solution to the Picard itera

      • Well...*puts on SSE hat*

        Depending on the precision that you require, you could bolt those either 2 doubles at a time, 4 floats at a time, or 8 fixed-point 16-bit integers at a time into SSE registers and operate on them in parallel. There's some bizarrely cool byte-order rearranging assembly instructions that you could use to help you get the list of B values flipped around, couple that with a couple of cache hints and I think you could compute that reasonably quickly.
    • This reminds me of a funny experience I had in school a long time ago. The school had a brand new 'Harris 220' (formerly Perkin Elmer) timesharing machine. It used a timesliced architecture where every task gets its little slice of the CPU every so many milliseconds. I was helping a math student implement a small program to produce ten values of a particular function - I recall it was a Bessel function but I'm not sure - to 7 or 8 decimal places. The function converges very slowly, so we kept adding ite

  • So here is the power usage breakdown from my Samsung Galaxy S running Froyo:

    Display: 89%
    Maps: 5%
    Wifi: 3%
    Cell Standby: 2%

    So how is enabling "Dark Silicon" going to help the power usage on my phone when the display uses the
    vast majority of the power?

  • We can't see it, we can only detect it by its power draw, and it makes up 95% of your chips!

    • No, we can detect its mass but it doesn't interact electromagnetically with the rest of the device.
  • "If you fill the chip with highly specialized cores, then the fraction of the chip that is lit up at one time can be the most energy efficient for that particular task,

    You can't win, because when a performance hacker reads this, he thinks, "Ooh, such waste! I need to parallelize all my stuff to increase utilization. Light 'em up!"

  • Why do we need ever more powerful phones? I don't think people are going to want to run CFDs, protein-folding, or SETI-at-home on their phone.
    On the other hand, if the phone consumes 11 times less power, you could go a few months without charging it which would be good.

    • If you can minimize standby current you could have a thermocouple generate power from your body heat to charge the phone giving unlimited battery life.
      • Ah...okay then how about something that converts brainwaves into phone-charging energy so when people are talking on the phone constantly while they're driving it's being charged. Oh, wait, that would assume that they have brainwaves. My bad.

  • This is definitely an interesting approach they're taking.

    In my research group, we're looking at a different tactic called near-threshold computing. Say you have a 32nm device that uses 100W at 1V. If you were to run it at 400mV, it would use about 1W, but logic slows by a factor of 10. So that 100X reduction in power translates into a 10X reduction in energy.

    Fast-forward to 11nm, where the transistor density is 10X what it is at 32nm. Nominal voltage won't go down much, so without doing something drast

  • What are the editing commands for forcing a new line in text. I know about bold and I thought a new line was less than character followed by a p. Where is the help / reminder for we responders?
  • So, with all that suggested energy savings, the battery could become smaller, and the number of apps could increase. Is there a Moores law about the number of semi-useful applications?

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...