Hyper-Threading Explained And Benchmarked

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Hyper-Threading Explained And Benchmarked 245

Posted by timothy on Wednesday January 07, 2004 @04:48AM from the two-hearts-beat-as-one dept.

John Martin writes "2CPU.com has posted an updated article about Hyper-threading performance. They discuss the technology behind it, provide benchmarks, and make observations on what the future holds for hyper-threading. It's actually an easy, interesting read. Of note, they'll be publishing Part II in the near future which will detail hyper-threading performance under Linux 2.6. Hardware geeks will probably appreciate this."

This discussion has been archived. No new comments can be posted.

Hyper-Threading Explained And Benchmarked

Load All Comments

Search 245 Comments Log In/Create an Account

Comments Filter:

SMT (Score:2, Troll)

by Gary Whittles ( 735467 ) writes:

Simultaneous Multithreading (SMT) is not a new idea, although no one to my knowledge has implemented it yet. Intel just calls it "Hyperthreading"...it is essentially SMT.

And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP). Turns out, there is not enough ILP in current programs to take full a
- Re:SMT (Score:4, Interesting)
  
  by John Courtland ( 585609 ) writes: on Wednesday January 07, 2004 @05:28AM (#7901395)
  
  Yeah, this is the idea behind the new Cell architecture in the PS3. Dumping the old ideas of having a single threaded model and doing everything in multiple threads where global data can be dynamic with each thread containing its own local storage. Done properly, it's blazingly fast. Done poorly, and you end up with race conditions, blocking semaphores, and generally poor code and poor performance. The only problem is that using the paradigms we have today, very few are capable of programming this style right now. The closest people I can think of are the Michael Abrashes, optimization zealots (not saying it's a bad thing), who know their processor upside and down and are not afraid of assembler, or rescheduling instructions to get the most power out of each cycle, instead of letting an optimizing compiler do it for them.
  
  Parent Share
  twitter facebook
  - Re:SMT (Score:3, Interesting)
    
    by jtshaw ( 398319 ) writes:
    
    You right, very few people can code a program that works well on an SMT processor. It is a lot to keep track of and quite honestly, most of the code I have seen churned out at software companies was done in such a rush because of deadlines the programmers didn't have time to optimize there code.
    
    However, there is no reason why you can take two single threaded processes and use one to fill the holes in the pipeline left by the other so SMT should still have a decent benifit if the kernel scheduler is prepar
    - Re:SMT (Score:3, Insightful)
      
      by sql*kitten ( 1359 ) * writes:
      
      most of the code I have seen churned out at software companies was done in such a rush because of deadlines the programmers didn't have time to optimize there code.
      
      I would argue that in the vast majority of cases, processor-specific microcode (as opposed to language and algorithmic) optimizations aren't the programmer's job - that's what a compiler is for. A professional-grade compiler like MIPSpro or ICC can generate code over twice as fast as GCC on the same processor, because it's smarter about process
      - Re:SMT (Score:3, Informative)
        
        by jtshaw ( 398319 ) writes:
        
        That is totally true. Processor-specific microcode optimizations are definitly the compilers job. But you have to conceed the fact that the compiler can only do so much. If the programmer doesn't choose a good method or solving the problem at hand there isn't much a good compiler can do to optimize the code, especially if the problem being solved is complex.
        
        Compilers simply can't be asked to pick up the slack for programs written with a poor logical flow. They can't be ask to figure out a completely di
        
        Re:SMT (Score:3, Informative)
        
        by John Courtland ( 585609 ) writes:
        
        What you wrote here is almost verbatim what Michael Abrash said in his book "Zen of Code Optimization". Dr. Dobbs Journal actually offered it up for free in PDF format at one point, I can only hope to find it amongst my mass of CD's.
        
        Smart code will do more for you than hand optimized assembler, unless you already have written smart code.
      - Re:SMT (Score:2)
        
        by zenyu ( 248067 ) writes:
        
        A professional-grade compiler like MIPSpro or ICC can generate code over twice as fast as GCC on the same processor, because it's smarter about processor-specifics.
        
        gcc 3.3.2 beats the pants off icc 8.0 on my SSE2 code. Up to a 50:1 ratio on speed tests, 4:1 on average. With earlier revisions of gcc and icc the ratio was 2:1 with icc being faster. This code is written with explicit parallelism so all the fancy loop unrolling icc does doesn't help, and the register allocation algorithm in gcc seems to be th
        
        Re:SMT (Score:2)
        
        by zenyu ( 248067 ) writes:
        
        Compare it the current versions of icc, butt monkey. icc still beats gcc by 2:1, if not more. Your comparison is meaningless. Its like comparing a 2004 Chevy to a Ford Model T and saying, "See, Chevy really is better than Ford!"
        
        Umm, yeah, well icc 8.0 is the newest release. And I checked against several versions of gcc 3.2.3, 3.3.2 and the latest 3.4 from CVS. The 3.3.2 seemed to have the best performance overall, with 3.4 a close contender. Except the 3.4 had worse performance on a couple benchmarks, not
  - Re:SMT (Score:5, Interesting)
    
    by nikh ( 123374 ) writes: on Wednesday January 07, 2004 @10:24AM (#7902429)
    
    Just to clarify here, this is not the same idea as the Cell architecture.
    
    The Cell architecture (which may or may not be used for the PS3) is a multi-processor system designed for scalability; It really does have several processors running at the same time. In contrast, 'Hyperthreading' runs multiple threads on a single processor's core.
    
    They both require multi-threaded code to achieve performance improvements, but fundamentally they're really quite different, and yield quite different price / performance trade-offs.
    
    Parent Share
    twitter facebook
    - Re:SMT (Score:2)
      
      by John Courtland ( 585609 ) writes:
      
      So then they are going to use seperate silicon for each? I guess that would be better, if one unit fails you don't lose the computer. I'm sorry to not have made that distinction, but as you note, the programmatical method is the same or at least similar for both. You must compartamentalize your code into various non-blocking threads to yield a good amount of explicit parallelism to really see any benefit.
  - Re:SMT (Score:5, Interesting)
    
    by Radius9 ( 588130 ) writes: on Wednesday January 07, 2004 @11:14AM (#7902724)
    
    Being a console programmer, and having done quite a bit of work on the PS2, there is something in your comment that is a common misperception. You say that hyperthreading works great when you have people who know their processor upside and down and are not afraid of assembler, well, I am not afraid of assembler, and have done quite a bit of it. The problem is that writing in assembler tends to be slow, especially when trying to do heavy optimization. This takes time, a luxury generally not available to those of us in video games who tend to have hard christmas deadlines to ship our product. For Sony to assume that people are going to learn how to program in assembly is a mistake, as learning assembly isn't the issue, having the time to optimize the code in assembly is the issue. This isn't helped by the fact that most of the tools made available to us are piss poor, which makes working on the code much more difficult. For example, the PS2 has the vector units that are generally programmed in assembly. Not only do you need to make sure that the processing done by the vector units synchronizes with your main CPU, but you don't have ANY sort of debugging capability on these. Because of this, programming vector unit code is incredibly slow.
    
    In addition, video games are things that don't always lend themselves particularly well to running in multiple threads. I have my artificial intelligence code, collision & physics code, and my rendering code. These 3 parts are the main parts of the code that take roughly 90-95% of the total CPU time available to me. I can't run collisions and physics until after the AI has run, and I can't run my rendering until the collision & physics have been run. I can multi-thread individual game objects, but even these constantly interact with each other. This isn't normally a problem if you double buffer it in a way that, for example, after the AI has run, I keep the current frame's AI output around somewhere while I run the next frame, but this requires additional memory, another resource that is scarce on consoles.
    
    Parent Share
    twitter facebook
    - Re:SMT (Score:2)
      
      by John Courtland ( 585609 ) writes:
      
      Not saying assembler is the end all be all, but I don't know of another programmtical model that really does a good job of encompassing the scheduling necessary to program for a simultaneous multithreaded processor.
      
      I understand the need for single threaded performance, it does seem hard to break a game down into enough parts to really benefit from massively multithreaded architectures. I mean, all you really have is input, video, sound, physics, AI and rules (I seperate physics from rules because physics
    - Assembly sucks? (Score:3, Informative)
      
      by dmelomed ( 148666 ) writes:
      
      Not to be specific about SMT. Assembly too hard? You people haven't heard of Forth, right? Just use ficl, or some other embeddable forth instead of assembler, will save you lots of time. Better debugging too, since forth is interactive.
- Re:SMT (Score:2, Informative)
  
  by at_18 ( 224304 ) writes:
  
  A short but informative article about SMT is on Wikipedia [wikipedia.org]
Interesting. (Score:5, Informative)

by Anonymous Coward writes: on Wednesday January 07, 2004 @04:58AM (#7901277)

There was an interesting discussion on the Plan9 newsgroup about hyperthreading recently, read here [google.com]

Share
twitter facebook
- Re:Interesting. (Score:5, Funny)
  
  by Gleng ( 537516 ) writes: on Wednesday January 07, 2004 @05:32AM (#7901404)
  
  Cool, that explains it a little.
  
  I was actually trying to explain hyperthreading to someone today. I got about three minutes into the discussion and realised that I had absolutely no idea what I was talking about.
  
  The discussion arose because we were talking about stupid salesmen. I saw a salesman in a shop the other week, trying to explain hyperthreading to a lady with a glazed expression on her face.
  
  He was saying that hyperthreading makes it easier to use two monitors on your PC.
  
  Parent Share
  twitter facebook
  - From the article: (Score:2, Funny)
    
    by intermediate_represe ( 715812 ) writes:
    
    This could be analogous to two people in moderate shape being able to pile more wood in total, than a single person who's in great shape.
    hmm... in 6 years of architecture research i have never heard anyone talk about SMT like that. it's not even analogous :)
    - Re:From the article: (Score:5, Informative)
      
      by Glonoinha ( 587375 ) writes: on Wednesday January 07, 2004 @10:38AM (#7902506) Journal
      
      How about two people in moderate shape being able to push wood through a single wood chipper than a single person who is in great shape (assuming the wood is piled up 18 feet away = cache miss).
      
      The single wood chipper being analogous to the actual processing part of the core, is only going to be able to shred so much wood - but if two people fetching wood from the woodpile can keep it running at 100% capacity they will shred more wood than a single guy running back and forth to the wood pile by himself.
      
      Parent Share
      twitter facebook
      - What it is, really (Score:2)
        
        by ratboy666 ( 104074 ) writes:
        
        More correct:
        
        We start with one wood chipper, one wood chipper operator and a pile of wood. We can chip (whatever) per unit time.
        
        We make the chipper faster, and can do more (increase clock speed of processor), but at some point the operator can't bring us the wood. So, we use a wheelbarrow to transport more wood in a go, and we keep the stack next to the chipper (a cache).
        
        Now, there's plenty of wood, so we get a SECOND chipper. The operator can stick wood into whatever chipper is free (multiple ALU units,
      - Re:From the article: (Score:5, Funny)
        
        by ArsonSmith ( 13997 ) writes: on Wednesday January 07, 2004 @01:19PM (#7903831) Journal
        
        I have this other idea where we make a large wooden badger...
        
        Parent Share
        twitter facebook
        
        Re:From the article: (Score:3, Funny)
        
        by SpaceLifeForm ( 228190 ) writes:
        
        No, we don't need no stink'n badger.
        What we need is *two* Woodchucks.
  - The diff between a used-car salesman and a PC one- (Score:2)
    
    by Glasswire ( 302197 ) writes:
    
    ...is that the used-car salesman knows when he's lying.
    
    There's a really interesting philosopical point here, BTW. If you are chartered to (or are pretending to know) something that you don't really understand, can you really claim that you didn't lie (because you didn't realize what you said was false) or do you have a responsibility to be correct if you offer yourself as an authority on a subject?
Intel's Whitepaper (Score:5, Informative)

by Cebu ( 161017 ) writes: on Wednesday January 07, 2004 @05:02AM (#7901295)

For those more technically inclined I would suggest reading Intel's Hyper-Threading Technology Architecture and Microarchitecture whitepaper [intel.com] instead.

Share
twitter facebook
- Re:Intel's Whitepaper (Score:5, Informative)
  
  by arkanes ( 521690 ) writes: <arkanes@NoSPam.gmail.com> on Wednesday January 07, 2004 @08:45AM (#7901995) Homepage
  
  Ars Technica [arstechnica.com] has one also - less technical than the Intel paper but very accessible and with pretty colored diagrams.
  
  Parent Share
  twitter facebook
Call that hyperthreading? (Score:5, Funny)

by Anonymous Coward writes: on Wednesday January 07, 2004 @05:10AM (#7901322)

"they'll be publishing Part II in the near future"

Part II should've been published concurrently, using idle time... tch!

Share
twitter facebook
For the real technical details (Score:5, Informative)

by photonic ( 584757 ) writes: on Wednesday January 07, 2004 @05:14AM (#7901332)

The article claims to talk about the technical details of hypertreading. At first glance, however, it seems more like yet another article in the series "Athlon beats Pentium at Doom by 1/2 frame per second".
If you are really interested in the how and why of hypertreading in suggest you read trough the lecture notes of Computer System Architecture [mit.edu] at MIT OpenCourseWare. This gives you enough background to race trough all the articles at Ars Techica et al.

Share
twitter facebook
Celery (Score:4, Insightful)

by Chris Siegler ( 3170 ) writes: on Wednesday January 07, 2004 @05:25AM (#7901376)

We saw a whopping 30% decrease in encoding time with HT enabled on the 3.2GHz P4C. We were using an application that is certainly multi-threaded in TMPGEnc, so each logical processor had plenty of work to do and they both had plenty of bandwidth available to share.
That's pretty cool, but if your primary concern is encoding, then there are some things to keep in mind. A Celeron is much cheaper than a P4 with the hyperthreading ($90 for a 2.6GHz Celeron, and $170 for a P4 2.6C). And if the app you're using doesn't support HT, then a Celery will likely encode faster than a P4 with HT on. HT can also reveal nasty bugs in some drivers (my HDTV card is an example). So unless you're playing games, the P4 is just added expense.

Share
twitter facebook
- Re:Celery (Score:5, Informative)
  
  by turgid ( 580780 ) writes: on Wednesday January 07, 2004 @06:09AM (#7901507) Journal
  
  A Celeron is much cheaper than a P4 with the hyperthreading
  So it is, and it's not all that fast either [anandtech.com]. Then again, you shouldn't believe all that you read on the Intarweb.
  
  Parent Share
  twitter facebook
- Re:Celery (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  I think the logic is wrong here. Even if HT is enabled with a program that doesn't take advantage of it, usually it isn't a noticible liability.
  
  One can still turn off the HT. With only a 128k cache, IMO, it is too much of a performance liability to make it worth the lower cost.
  
  I just leave it on because the system seems to respond a little better under heavy load.
- Re:Celery (Score:3, Insightful)
  
  by Glonoinha ( 587375 ) writes:
  
  $80 difference on a $700 machine (assumes a usable amount of RAM, a real video card, a usable performance hard drive, and a legit copy of XP Pro (XP Pro gives you the best performance on the SMT chips, I have seen roughly 5%-10% gains)) means that for every 8 P4 2.6GHz HT machines you were going to buy, you can buy 9 Celeron 2.6GHz machines. Even if you go display-less (no monitors) and use a free OS (Linux or recycled Win2000Pro CDs) you are talking $500 absolute minimum, you are talking 7 Celeron boxes f
Wrong percentages? (Score:5, Interesting)

by OMG ( 669971 ) writes: on Wednesday January 07, 2004 @05:29AM (#7901398)

I think they made a mistake here.
From the article:
"Sandra's CPU benchmark is obviously quite optimized for hyperthreading at this point, and the numbers certainly show that. We see an average improvement of ~39% when hyper-threading is enabled on the P4 ..."

The numbers are:
4328 without HT
7125 with HT

You could say that disabling HT makes this benchmark 39% slower. But the the increase by turning HT on is
7125/4328-1 = 1.646 - 1 = 0.646 = 64.6 %

Hrmpf.

Share
twitter facebook
- Yup, all over the place... (Score:2, Informative)
  
  by DerProfi ( 318055 ) writes:
  
  This guy can't even calculate his percentages correctly, so I wonder what else might be screwed up in his analysis?
  If X is the lower number and Y is the higher number, he's figuring his percentage increases as (Y-X)/Y instead of (Y-X)/X .
  Or is this some kind of "New New Math" that they started teaching in the 10 years since I graduated?
- Re:Wrong percentages? (Score:4, Interesting)
  
  by Glonoinha ( 587375 ) writes: on Wednesday January 07, 2004 @11:02AM (#7902655) Journal
  
  Crap you are right - just by turning on HT on the same box he saw a 65% boost in performance.
  
  I think it was a case of -wanting- to see a specific number and juggling things in his head until he got the number he wanted. Intel touts the 30% range and if he initially got the 65% number he probably discarded it and kept juggling the books to get the number in the 30's that he wanted.
  
  As someone that has a P4 2.4 (not HT) box sitting right next to a P4 2.4 (HT) box I will assure you that in real life you are not going to see a 65% sustained boost in performance in day to day use. Not 30% sustained boost either, unless you are only running apps that are heavily optimized and multithreaded.
  
  Parent Share
  twitter facebook
- - Re:Wrong percentages? (Score:2)
    
    by budgenator ( 254554 ) writes:
    
    The 3 percentages are about 64.6%, 71.2% and 22.5%,
    so if i take
    50% of a head of lettuce,
    100% of an orange,
    75% of a bannana and
    25% of a cup of Miracle Whip,
    I get 62.5% of a fruit salad?
    
    I guess everybody knows why I flunked calculas now!
Being philosophical on this... (Score:5, Interesting)

by keeboo ( 724305 ) writes: on Wednesday January 07, 2004 @05:33AM (#7901410)

I do believe that HT does have future, perhaps not in its present form, but still.

I do remember when there was that RISC vs CISC thing in the 80s, people were saying that CISC was obsolete, RISC being the future and so on. What we see today is not pure RISC processors but something in between. -- It's just that the answer was not that pure or clean as people thought at first.

Few years ago there was BeBox and its BeOS. Well, BeOS had the philosophy for a machine not having a single super-powerful-burning-hot processor but, instead, several low-power combined.
Well, Hyper-Threading may push distributed processing technology to the desktop, to the masses, so we might have interesting changes in software and hardware philosophy in the future.

Sort of romantic thinking... But one can dream. :)

Share
twitter facebook
- RISC gives you more bang for your buck (Score:5, Interesting)
  
  by putaro ( 235078 ) writes: on Wednesday January 07, 2004 @06:04AM (#7901495) Journal
  
  All things being equal, RISC gives you more bang for your buck. The difference is that Intel has pushed CISC, or specifically the x86 architecture, as fast or faster than RISC by using more bucks. The amount of R&D dollars powered into x86 vs the amount poured into PowerPC or Alpha is overwhelming.
  
  When I was at Apple our processor architect, Phil Koch, gave a talk in, I think, 1997, where he said that the PowerPC consortium had essentially optimized for power consumption and dollars spent on R&D. What was amazing at that time was that PowerPC was competitive with Intel given much lower power consumption and much lower investment of R&D dollars. However, noone really cared about lower power consumption so it didn't translate into any real advantage. Without the R&D dollar leverage given by RISC, however, the PowerPC would not have been able to compete at all. Pushing the 68K architecture to be competitive with Intel with the same R&D dollars as PowerPC would have been impossible
  
  Parent Share
  twitter facebook
  - Re:RISC gives you more bang for your buck (Score:3, Insightful)
    
    by imsabbel ( 611519 ) writes:
    
    And nowadays it becomes more and more clear that there isnt much of an advantage anymore.
    All "Cisc" chips are risc cores with a decoder frontend, and the "cheaply developed" Power PCs before the G5 were slaughtered by X86 in any bench but photoshop gaussian blur.
    
    And the G5 is only a sideproduct from IBMs Power4 program, which cant really be descriped with "low R&D expenses".
  - Re:RISC gives you more bang for your buck (Score:5, Interesting)
    
    by Waffle Iron ( 339739 ) writes: on Wednesday January 07, 2004 @11:19AM (#7902754)
    
    All things being equal, RISC gives you more bang for your buck.
    Maybe, maybe not. However, it's hard to tell because nobody makes RISC or CISC processors anymore. The RISC concept, implemented in CPUs like the MIPS R3000, originally meant very simple hardware without pipeline interlocks, instruction schedulers, or more than an absolute bare-bones set of instructions. The current Power PC does not match this at all; it is closer to the current X86.
    By the same token, CISC used to mean that many or most instructions were implemented in microcode on the processor. Once again, that's no longer the case. All X86s now have a RISC-like core and resemble the Power PC far more than the 80286.
    Pure RISC designs and pure CISC designs have both been superceded by a hybrid approach, and neither one would be competetive today outside the embedded device market.
    Basically, you were being fed a line of company FUD to get you all excited about their choice of CPU. Today, cache memory dominates the chip real estate, and CPU performance and power consumption are dictated almost exclusively by cache size and silicon process technology rather than these surface architectural details.
    
    Parent Share
    twitter facebook
    - Please get your terms straight! (Score:3, Informative)
      
      by Prof. Pi ( 199260 ) writes:
      
      The RISC concept, implemented in CPUs like the MIPS R3000, originally meant very simple hardware without pipeline interlocks, instruction schedulers, or more than an absolute bare-bones set of instructions.
      Not true at all! RISC refers to the instruction set, not the internal architecture. Even the earliest RISC processors to carry that name included pipeline interlocks -- it was the simplicity of RISC that made such techniques feasible, especially at the chip densities of the 80's.
      There's a lot of con
    - - Re:RISC gives you more bang for your buck (Score:2)
        
        by Waffle Iron ( 339739 ) writes:
        
        Actually, current "CISC" processors can be seen as having all of their instructions implemented in microcode because they're all translated into the internal RISC-like subops.
        However, there are two major differences between traditional microcode ops and RISC-like subops. First, traditional microcode opcodes were usually very wide, with enough dedicated bits to simultaneously control all of the ALU parameters, address calculations and data path multiplexers in the processor.
        Second, microcode worked like
  - Re:RISC gives you more bang for your buck (Score:2)
    
    by Jay Carlson ( 28733 ) writes:
    
    CISC processors tend to have smaller code size, even if the execution units are similar. You can think of this as CISC having a decompression engine between icache and the execution units. When main memory is slow and far away, reducing the amount of memory needed for code can be helpful, especially in modern (bloated?) systems with zillions of bytes of shared libraries loaded.
    
    Here's [yarchive.net] a ref to a discussion of RISC's response to this problem.
- Re:Being philosophical on this... (Score:2)
  
  by jhines ( 82154 ) writes:
  
  I remember the period when Digital was developing the Alpha to replace the CISC cpus in Vaxen.
  
  Nice chip, but relegated to the history books now.
- Re:Being philosophical on this... (Score:2)
  
  by Greyfox ( 87712 ) writes:
  
  RISC was another buzzword, like microkernels, XML, Java, XP, etc. It was the wave of the future -- the magic bullet that would let you get "Mainframe performance on a desktop computer." Just like the 386 was going to finally give you "Mainframe performance on a desktop computer," and the 486 was going to give you "mainframe performance on a desktop computer." I have to wonder how many IT departments bought the hype and made a switch, only to discover that they weren't really running all that much faster.
  N
oh goodie! (Score:2, Funny)

by Anonymous Coward writes:

an extra frame or two for Doom3!
- - Re:oh goodie! (Score:2)
    
    by GiMP ( 10923 ) writes:
    
    Yes, but how many systems will reach the magic 60fps ? It might bring the system from 30fps to 32fps.
Cache Contention (Score:4, Interesting)

by Detritus ( 11846 ) writes: on Wednesday January 07, 2004 @05:59AM (#7901477) Homepage

Do any modern chips support per-process cache reservation? That would alleviate some of the problems reported in the article.

Share
twitter facebook
Everything I know about Hyperthreading... (Score:5, Informative)

by obergeist666 ( 727955 ) writes: on Wednesday January 07, 2004 @06:27AM (#7901553)

... I learned from this article [arstechnica.com].

Share
twitter facebook
Quick Q (Score:5, Interesting)

by AvengerXP ( 660081 ) writes: <jeanfrancois DOT ... mckesson DOT ca> on Wednesday January 07, 2004 @06:33AM (#7901567)

Why would you want to have a virtual double processor when... you can actually get a second one? Both changes require that you change your motherboard (One for HT, one for Dual Sockets). Dual Celerons sounds like a good cheap buy, or even Dual Athlons. Why bother with this? Except for the coolness factor of having your POST screen littered with "Hyperthreading Enabled", and in most cases it's not even called that, i forgot what they really write on the screen. Seriously, i wouldnt put my money that HT will be even copied to other manufacturers any time soon, unlike SSE or MMX.

Share
twitter facebook
- Re:Quick Q (Score:5, Insightful)
  
  by renoX ( 11677 ) writes: on Wednesday January 07, 2004 @08:36AM (#7901957)
  
  > Why would you want to have a virtual double processor when... you can actually get a second one?
  
  Because it is cheaper?
  SMT increase very little the size of the CPU and can give some good improvements (depending of the application, and the OS as said in the article).
  
  SMT can work in the same motherboard as a single CPU contrary as what you said..
  
  And for the same price, the single CPU performance of your dual-CPU setup will be lower..
  
  Parent Share
  twitter facebook
  - Re:Quick Q (Score:3, Interesting)
    
    by iconian ( 222724 ) writes:
    
    It's not that simple. I believe the cheapest HT processor from Intel is the P4 2.4 Ghz, priced at $161 [newegg.com]. You can buy one Athlon XP 2400+ for $75 [newegg.com]. A dual processor Athlon motherboard probably costs more than a single processor Pentium 4 motherboard and you will probably have to pay for a bigger power supply unit. However, I don't think dual logical processors in a single Pentium 4 can beat 2 real Athlon XP 2400+ processors performance wise and in performance-price ratio. (Note: I do not work for NewEgg.)
- Cost (Score:2, Informative)
  
  by Imperator ( 17614 ) writes:
  
  Cost, cost cost. Cost cost cost cost cost, cost cost cost cost cost cost. Cost cost cost--cost! Cost cost cost, cost cost cost cost cost cost...cost cost. Cost cost "cost" cost cost cost cost cost cost cost cost. Cost cost cost cost cost COST cost cost.....
  
  The lameness filter blows. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo conse
- Re:Quick Q (Score:2, Interesting)
  
  by Ramze ( 640788 ) writes:
  
  I believe AMD has plans to incorporate more than one CPU on-die in the future. First 2, then 4, etc.
  It'll be interesting to see what happens to "hyperthreading" when dual and quad processors come standard on desktop systems for home users.
  I look at Hyperthreading as a quick hack to improve response times on a few things. It's a minor speed boost as well, but I think it has enough drawbacks to merit it as only a minor improvement which may not always be a good idea to have enabled. I doubt it will st
- Re:Quick Q (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  AMD claims to have the same idea in the works for the next Athlon 64.
  
  It was supposed to be put into the Alpha processor too, a lot of HT research was done on it and was transferred to Intel.
  
  Most of the CPU players are toying with dual full CPU on-die as well, but keep in mind that HT accounts for under 5% of the die, rather than just requiring a second die.
  
  So you _can_ also have two real processors and two more processors in virtual mode. If you know the Xeon line, the Xeon DP allows two real processor
- Re:Quick Q (Score:2)
  
  by mengel ( 13619 ) writes:
  
  It's more obvious if you scale it up to more CPU's.
  It actually makes more sense to build one chip that's, say, 8 logical processors and give it several execution units of each type (i.e. 6 integer math units, 4 floating point units, etc.) depending on instruction mix. Of course, that eats chip real estate, but if you have a multithreaded system to run, it will scream.
  If you put in 8 distinct processors, that's 8 integer math units, 8 floating point math units, etc. some percentage of which are idle mos
- quieter (Score:2)
  
  by joss ( 1346 ) writes:
  
  I could get a dual athlon system, but then I wouldnt be able to hear the dog barking
- Re:Quick Q (Score:2)
  
  by cowbutt ( 21077 ) writes:
  
  Nope - I have a Gigabyte GA-8PE667 Ultra which can use a 3.06GHz HT P4, a 1.7GHz Celeron or anything in between.[*]
  Also, SMP boards seem to be 2-3x the cost of UP boards before the cost of the CPUs.
  [*] FSB speeds permitting. It does 400MHz and 533MHz FSB speeds, but not 800MHz.
  --
Cache contention with Hyperthreading (Score:5, Interesting)

by xyote ( 598794 ) writes: on Wednesday January 07, 2004 @06:36AM (#7901580)

Threads using hyperthreading or SMT share the cache. This can be a problem if the threads are from different processes and not sharing memory. Your cache is effectively halved (with 2 hyperthreads). On the other hand, it could be a real benefit if your threads were from the same process sharing the same memory. You don't have the cache thrashing which could occur on a multi-cpu system. Since cache hits can really kill performance, this could be quite a performance boost.
To really exploit this, you'd need gang scheduling in the operating system. But it's unlikely that SMT would remain around long enough for any efforts to exploit it to be feasible. CMP with separate cache would likely take over before then since it would behave more like separate cpu's from a performance standpoint and thus offer more consistent behavior.

Share
twitter facebook
- Nitpick (Score:2)
  
  by GregWebb ( 26123 ) writes:
  
  Cache hits are what you want. It's cache misses that kill performance.
- Re:Cache contention with Hyperthreading (Score:2)
  
  by Keeper ( 56691 ) writes:
  
  The whole point of hyperthreading is that a second thread can run when the first thread stalls (ie: needs to load some data that isn't in the cache) -- instead of stalling for cache data, the cpu switches over to the other thread.
Future prognosis for HT (Score:5, Interesting)

by sam0ht ( 46606 ) writes: on Wednesday January 07, 2004 @06:48AM (#7901607)

From the article: "As bus speeds increase, and more cache becomes available on die, hyper-threading is going to be more and more efficient. It appears to be somewhat of an engineering symbiotic relationship."
Unfortunately, historically CPU speed has increased faster than memory bandwidth. That's why we've had ever more layers of cache added to our systems, to make up for the relative deficiency.
Unless things change, a technology that works better with a higher ratio of memory bandwith / CPU speed is likely to become progressively less, not more effective.
Of course, there's always the argument that marketing reasons have pushed CPU clockspeed faster than memory bandwidth, and that Intel et al will just shift their focus more towards memory in future. But defying the tide of 'what people think they want' is usually risky.

Share
twitter facebook
- Re:Future prognosis for HT (Score:5, Insightful)
  
  by sql*kitten ( 1359 ) * writes: on Wednesday January 07, 2004 @07:42AM (#7901750)
  
  Unfortunately, historically CPU speed has increased faster than memory bandwidth. That's why we've had ever more layers of cache added to our systems, to make up for the relative deficiency.
  
  Aye. Sun has big plans [sun.com] for CMT, which one of their sales reps was quick to tell us all about, up to 32 SPARC cores on one chip. That'll work well in the lots-of-small-tasks model where you can take advantage of direct access (say between disk cache and network card) on FirePlane with very simple code (like a webserver) that can execute out of the processor's cache. But we're heavy database users, and the first question he got asked was, are you seriously telling us Sun is about to makes its memory bandwith an order of magnitude greater? He couldn't answer that question. Now, that means either he was clueless, or Sun is jumping on the Intel benchmark bandwagon.
  
  Parent Share
  twitter facebook
- Memory bottleneck (was: Future prognosis for HT) (Score:5, Interesting)
  
  by davecb ( 6526 ) * writes: <davecb@spamcop.net> on Wednesday January 07, 2004 @09:53AM (#7902269) Homepage Journal
  
  One of the reasons for hyperthreading (aka chip multithreading) is the slowness of memory and cache.
  If you refer back to Marc Tremblay's CMT Article [aceshardware.com], you'll see that one of the approaches is to run one thread until it blocks on a memory read, then run another until it blocks and so on, repeating for as many threads as it takes to soak up all the wasted time waiting for the memory fetches.
  The Sun paper on their plans for it is here [sun.com]. Have a look at page 5 for the diagram.
  --dave (biased, you understand) c-b
  
  Parent Share
  twitter facebook
- I/O Bottleneck (Score:2)
  
  by EXTomar ( 78739 ) writes:
  
  When parallelism is introduced you run the risk of "process inversion". If the system runs high enough all of your execution units are working as fast as the slowest process no matter how fast the execution units can run.
  
  The key to this effect is that the slowest execution unit is taking the most time forcing all other execution to wait on it. Other faster execution units must wait for one reason or another so they all appear to be as slow as the slowest.
  
  In software you can try to soften the blow by bum
- Re:Future prognosis for HT (Score:2)
  
  by Jeff DeMaagd ( 2015 ) writes:
  
  Another way to increasing apparent memory speed is for a wider effective memory bus. It was IIRC, 16 bits 286 and before, 32 bits with 386, 64 bits with Pentium, and with selected PIII, PIV, Athlon & A64 boards, dual channel 64bits, making it 128 bits.
  
  I think an Alpha board or two went as high as 512 bits wide.
  
  Now, the wider memory bus doen't help x86 or A64 as much as one would think but with hyperthreading, it might.
Situations where HT really becomes useful (Score:5, Interesting)

by ZombieEngineer ( 738752 ) writes: on Wednesday January 07, 2004 @07:36AM (#7901725)

I have found HyperThreading a real boost for developing operator training simulators (think giant custom computer game for process plant operators [eg: Oil refineries, gas plants, chemicals, etc...]) where the a single thread will totally consume the resources of a single CPU (we call it "no-wait" where the simulation calculates what happens in the next 2 seconds and then immediately jumps to the next timestep, thus fast forwarding through slow parts of a process start-up such as warming a reactor).

An issue we encounter is the DCS (Distributed Control System) interface (the bit that links the PC to the fancy membrane keyboards, touch screens, alarm annunciators that the operator uses on the real plant [to maximise training benefit]). Although the interface typically only uses 0.5 to 2% of the CPU, when the simulation goes flat out, there is a noticable impact on other threads to the point where there is timeouts on data requests from the operator console.

In summary, if you have a system where some threads are IO bound (in our case, processing requests coming across via ethernet) and other threads are CPU intensive (high end numerical calculations) you will see a definite benifit. It allows us to give every team member a machine fit for the job at approximately 1/3 the cost (those of you who wish to argue that SMP machines are cheaper, we are bound by corporate purchasing agreements where SMP falls into the "Workstation" catagory while a uni-processor HT machine falls into the far cheaper "Desktop" catagory).

If you are performing just purely calculations and need to run two parallel threads, I would recommend a SMP or similar machine.

As always your milage may vary.

ZombieEngineer

Share
twitter facebook
- The sound of software breaking (Score:3, Informative)
  
  by Latent Heat ( 558884 ) writes:
  
  OK, you are doing all this calculation in another thread, but you have to somehow synchronize with the GUI thread (PostMessage under Windows). If your calculation thread were to run faster than your GUI thread (GUI doing a lot of screen updating), you would get these PostMessages clogging up your GUI thread message queue because WM_PAINT is of very low priority (so frequent paints don't lock out key and mouse clicks).
  In the old single-processor days, your calc thread could do a Wait(0) -- according to th
HT is awesome (Score:5, Interesting)

by Jeppe Salvesen ( 101622 ) writes: on Wednesday January 07, 2004 @08:25AM (#7901907)

In the app we develop here at work, we are highly conscious of performance and scalability. Simply put - the more transactions we can process, the bigger and happier the customers. And more money in our pockets.

With Xeon with HT, our performance has increased quite dramatically. We use Perl, so we simply fork off the jobs that do the processing. The result is that we fill all the four virtual processors in Linux if we have a sufficient number of jobs running.

Share
twitter facebook
- Re:HT is awesome (Score:2)
  
  by platypus ( 18156 ) writes:
  
  In the app we develop here at work, we are highly conscious of performance and scalability. [...] We use Perl [...]
  
  Huh? This is not meant as an offense, or a troll, but that really, really doesn't fit together. Have you considered using something faster (no, not C)? This should have a much bigger effect than a HT proc.
  - Re:HT is awesome (Score:3, Insightful)
    
    by Jeppe Salvesen ( 101622 ) writes:
    
    Absolutely. But Perl means we can produce more software with fewer manhours and fewer lines of code! Compared to our java-based competitors, we kick butt, both in terms of development team size and in terms of performance and TCO.
    
    We have profiled our code and optimized the code where we spend most of our time. On those critical sections, we use most of the tricks in the book - dynamically created code, extensive use of hashes, etc. We can even write functions in C using XS if we want to!
    
    Basically, Perl is
how to enable for older processors? (Score:3, Interesting)

by Pivot ( 4465 ) writes: on Wednesday January 07, 2004 @08:38AM (#7901962)

I have a computer with dual Xeon 1.7GHz. Those apparently have HT capability built in, but it's not enabled in the BIOS. Anyone know a way to circumvent this to enable HT on these?

Share
twitter facebook
Hyper-threading explained in 300 words or less. (Score:4, Informative)

by Anonymous Coward writes: on Wednesday January 07, 2004 @09:09AM (#7902072)

When a process blocks because it is trying to access memory that is not loaded into the cache, it sits idle while the data is retrieved from the much-slower main memory. If you can store two process contexts on the CPU instead of just one, whenever one process blocks to read from memory, the operating system can quickly switch the CPU to the other context which is waiting to run.

I can't remember the name of the machine, but one parallel shared-memory machine used this exclusively. The CPU had 128 process contexts and would switch through them in order. The time between subsequent activations of each context was great enough that data could be fetched from main memory and loaded into a register. This eliminated cache coherency problems (no cache!) and all delays related to memory fetching.

A P4 with hyperthreading is a simplified and much more practical version of that machine.

Share
twitter facebook
The thing that got me about CPU performance (Score:5, Insightful)

by awol ( 98751 ) writes: on Wednesday January 07, 2004 @09:53AM (#7902273) Journal

I did comp sci (undergrad) in the days when we used unix/VMS to learn and so I have a pretty good understanding of architecture and the basics of threads and processes. The one thing that never sat well with me was that as processor speed "exploded" in the last 5 years, I was under the impression that a "lot" of the performance increase was achieved by parallelising stuff in the execution core. (You can see that my knowledge is _limited_) So as a result unless your applications could somehow take advantage of this parallelism a given bit of code would never really get the full benefit of todays uber processors. So all the speed gains were only really marginal improvements.

I think the advent of SMT confirms that it is indeed the case that a given process cannot of itself (unless it is _real_ special) take full advantage of a modern processor and so SMT is a way of reducing the problem by assuming that whilst one process aint enough to take full advantage, two processes are able to make more advantage. It sure makes sense to me.

But it also presents the very interesting question of the marginal benefit of execution pipelines compared to complexity in the front end to allow SMT. What I mean is, what are the trade offs between having a "virtual" (for want of a better word) processor for each execution pipepline rather than using them to out of order execute parts of a single stream of instructions. Is it simply a question of the nature of the work being undertaken my the machine? Ie a processor with 8 pipelines serving 20 users doing stuff, would it be better doing 1 bis of work from each of 8 users or maybe 2-4 bits of stuff from 4-2 users. And can we answer that question heuristically to allow the front end to make good use of each pipeline with a variable profile over the chaing use of the machine. Fascinating (well to me anyway).

Share
twitter facebook
Analogy (Score:4, Interesting)

by attonitus ( 533238 ) writes: on Wednesday January 07, 2004 @10:15AM (#7902377)

This could be analogous to two people in moderate shape being able to pile more wood in total, than a single person who's in great shape

Could be, but isn't. A better analogy would be two people using the same narrow corridor to perform to chop and pile wood. If one piles wood, whilst the other chops, then they perform better than one person. If they both chop wood, and then both pile wood then they waste lots of time trying to squeeze past each other and accidentally hitting each other with axes.
Okay, so it's not that much better an analogy. But it least it bears some relevance to HyperThreading.

Share
twitter facebook
- Re:Analogy (Score:3, Funny)
  
  by cant_get_a_good_nick ( 172131 ) writes:
  
  Could be true. Not sure if many slashdot geeks can understand "being in shape" and "physical labor"
  
  ***ducks***
HT and VMWare: perfect together! (Score:3, Interesting)

by pw700z ( 679598 ) writes: on Wednesday January 07, 2004 @10:30AM (#7902466)

I use VMware workstation extensively... and HT rocks. Ever have a virtual machine go to 100% CPU utilization, and your machine slow down to a crawl? With the extra 20% of cpu available, you system can still function and be responsive, and allow you to deal with whatever is going on. Or I can run two VMs and get much better performance out of them and the system as a whole.

Share
twitter facebook
- Re:HT and VMWare: perfect together! (Score:3, Informative)
  
  by mixmasta ( 36673 ) writes:
  
  Also, make sure to set the vm's to low priority when you are not in the window, it makes a huge difference in system response, even without Ht.
  
  -Mike
HT Technology (Score:3, Informative)

by sameerdesai ( 654894 ) writes: on Wednesday January 07, 2004 @11:23AM (#7902785)

I have some insight into this technology as I was part of a research group researching SMT. It is a really cool technology that exposes Instruction level parellelism (ILP) and increases performance. The basic HT technology for the processor however distributes the resources. The details of Intel HT are available here at http://www.intel.com/technology/hyperthread/ You can also find whitepapers associated with this. Now the catch is application should be multi threaded. You just can't buy a HT processors and run single thread application and expect to improve performance. The performance benefits lie if optimal number of threads are used. If too less it will be unnecessary wastage of resources. If too high they will queue up and cause bottlenecks. The other thing that can affect performance is unbalanced workload and can cause threads which cannot exploit the parallelism. This is a new technology and lot of research is going on in this area and it looks really promising.

Share
twitter facebook
Distributed Computing (Score:2)

by DeadBugs ( 546475 ) writes:

With HT enabled I can run 2 copies of Folding@Home [stanford.edu].

This is a significant boost in production over a non-HT processor because these programs.

I would assume this would also help other DC projects like Seti@Home.
Beware of HT! (Score:2)

by kryptkpr ( 180196 ) writes:

To folks considering buying HT-enabled processors, be warned that not everything will work when HT is enabled!

For one, burst!, my BitTorrent client simply crashes on start-up. I've been in contact with Intel about the issue, and after some initial jerking me around, I seem to have finally found a tech who's looking into the issue.. Probably has something to do with my compiler (the crash offset is within the delphi RTL).

My app is not alone, as others in this thread pointed out, hyperthreading can also tr
AnandTech on Hyperthreading (Score:3, Informative)

by glinden ( 56181 ) writes: on Wednesday January 07, 2004 @12:46PM (#7903460) Homepage Journal

AnandTech did an excellent article [anandtech.com] on hyper threading a while back. Well written and worth reading.

Share
twitter facebook
IBM Will Do SMT Right (Score:4, Informative)

by fupeg ( 653970 ) writes: on Wednesday January 07, 2004 @01:06PM (#7903712)

IBM will have SMT in the Power5 [cbronline.com]. Their approach looks even better than Intel's, but part of that is the Power architecture and part of that is IBM learning from what Intel did. SMT is really the best way to get past the limiting reagents of modern processors : bandwidth.

Share
twitter facebook
"hyper-threading" vs. cache size (Score:5, Informative)

by Animats ( 122034 ) writes: on Wednesday January 07, 2004 @02:10PM (#7904362) Homepage

The basic problem with hyperthreading is, of course, memory bandwidth. CPUs today are memory-bandwidth starved. 30 years ago, CPUs got about one memory cycle per instruction cycle. Since then, CPUs have speeded up by a factor of about 1000, but memory has only speeded up by a factor of 30 or so. The difference has been papered over, very successfully, with cache. The cache designers have accomplished more than seems possible. Compare paging to disk, which is a form of cacheing that hasn't improved much in decades.
If you want to benchmark a hyper-threaded machine, a useful exercise is to run two different benchmarks simultaneously. Running the same one is the best case for cache performance; one copy of the benchmark in cache is serving both execution engines. Running different ones lets you see if cache thrashing is occuring. Or try something like compressing two different video files simultaneously.
If you're seeing significant performance with real-world applications using a a "hyper-threaded" CPU, that's a sign that the operating system's dispatcher is broken. And, of course, hyper-threading dumps more work on the scheduler. There's more stuff to worry about in CPU dispatching now.
Intel seems to be desperate for a new technology that will make people buy new CPUs. The Inanium bombed. The Pentium 4 clock speed hack (faster clock, less performance per clock) has gone as far as it can go. The Pentium 5 seems to be on hold. Intel doesn't still have a good response to AMD's 64-bit CPUs.
Remember what happened with the Itanium, Intel's last architectural innovation. Intel's plan was to convert the industry over to a technology that couldn't be cloned. This would allow Intel to push CPU price margins back up to their pre-AMD levels. For a few years, Intel had been able to push the price of CPU chips to nearly $1000, and achieved huge margins and profits. Then came the clones.
Intel has many patents on the innovative technologies of the Itanium. Itanium architecture is different, all right, but not, it's clear by now, better. It's certainly far worse in price/performance. Hyperthreading isn't quite that bad an idea, but it's up there.
From a consumer perspective, it's like four-valve per cylinder auto engines. The performance increase is marginal and it adds some headaches, but it's cool.

Share
twitter facebook
- Re:"hyper-threading" vs. cache size (Score:5, Informative)
  
  by Brandybuck ( 704397 ) writes: on Wednesday January 07, 2004 @02:47PM (#7904799) Homepage Journal
  
  If you're seeing significant performance with real-world applications using a a "hyper-threaded" CPU, that's a sign that the operating system's dispatcher is broken. And, of course, hyper-threading dumps more work on the scheduler. There's more stuff to worry about in CPU dispatching now.
  
  That was my suspicion. Hyperthreading can't be much more efficient than threading via the OS, unless the software is specifically compiled for it, or you use a scheduler specific to hyperthreading. Scheduling work STILL has to be performed, and hyperthreading STILL isn't parallel processing. So where are these performance improvements people are seeing coming from?
  
  I'm not using Linux, but FreeBSD. When I got my new HT P4, I considered turning it on. Then I read the hardware notes. Since FreeBSD does not use a scheduler specific for hyperthreading, it can't take full advantage of it. In some cases it might even result in sub-optimal performance. Just like logic would lead you to think.
  
  The OS cannot treat hyperthreading the same as SMP, because they are two different beasts.
  
  Parent Share
  twitter facebook
Other Conclusions (Score:2)

by suitti ( 447395 ) writes:

The author benchmarks a 2.8GHz Xeon with 533MHz FSB and 1MB of L3 cache. and a 3.2GHz P4C with an 800MHz FSB and 0.5MB of L3 cache. He claims he doesn't want to compare the two, but he does. Here are some other conclusions.
The Xeon has a slower clock, and yet outperforms the higher clock P4C. This is further evidence that MHz isn't everything.
The P4C has higher memory bandwidth (the FSB) yet slower performance. This shows that on-chip cache can be king over memory bandwidth too.
Some of my historic
- Re:Ever buy a car with auto-everything? (Score:5, Interesting)
  
  by pdbaby ( 609052 ) writes: on Wednesday January 07, 2004 @05:04AM (#7901301)
  
  I hate to say it, but your logic is flawed.
  
  To put hyperthreading into your car analogy:
  Hyperthreading is like a car that has power assisted steering. If you want, you can switch it off; you'll likely have a slightly smoother time with it on. But if you want the control (or don't trust it) then you can switch it off.
  
  For the geek who reads posts as a stack of strings delimited by <br>, Nobody's forcing you to use hyperthreading. Use it, don't use it. Don't complain that it's a Bad Thing[tm] simply because you're being given the choice
  
  Parent Share
  twitter facebook
  - Re:Ever buy a car with auto-everything? (Score:2)
    
    by kasperd ( 592156 ) writes:
    
    But if you want the control (or don't trust it) then you can switch it off.
    
    That is not a good analogy. Sure you can choose not to use HT, it will give you the same control over the system as you would have on a computer without HT. But there is no way you could utilize the full power of the CPU without HT.
- Bug fixing my post (Score:2, Funny)
  
  by ObviousGuy ( 578567 ) writes:
  
  I meant to say the 0xF00F bug which freezes the Pentium.
  
  The 0xCAFEBABE bug just slows it down to a crawl.
  - Re:Bug fixing my post (Score:2)
    
    by TheMidget ( 512188 ) writes:
    
    The 0xCAFEBABE bug just slows it down to a crawl.
    And your post is just a trawl!
    Btw, at least the 0xCAFEBABE bug doesn't open up the barn door for all viruses and trojans to come in and have a jolly good time in your computer, unline that infamous ActiveX bug! And with 1.4, performance is not that bad either.
- Re:Ever buy a car with auto-everything? (Score:5, Insightful)
  
  by Dominic_Mazzoni ( 125164 ) * writes: on Wednesday January 07, 2004 @05:26AM (#7901378) Homepage
  
  Whether it's something obvious like the Pentium off by 1+1=1.9999943 error
  
  The Pentium math bug was with division, not addition, and it only occurred in very specific circumstances [maa.org]. So while it supports your general point that complicated systems are more difficult to debug, that wasn't a very good example of an "obvious" bug. Careless, yes.
  
  One thing that was good for the industry was to move away from the complex instruction set (CISC) towards a reduced set of instructions (RISC), and we have seen the speed improvements as well as a general reduction in hardware bugs since that time.
  
  You do realize that Intel x86 processors are still CISC, right? (OK, actually internally they do execute things very much like a RISC chip, but the instruction set is still CISC, and modern x86 processors are certainly not any _simpler_ for having some RISC-like elements to them.
  
  Besides, RISC chips don't actually have fewer instructions. Most of them these days have more. The difference between CISC and RISC is that RISC chips don't have certain complicated, slow instructions, but rather break these up into smaller pieces. For example, CISC processors usually have an instruction to move memory-to-memory while RISC only moves memory-to-register and register-to-memory. Also, CISC processors often have a division instruction while many RISC processors instead just have a multiplicitive inverse instruction (so to compute a/b you instead compute a*inv(b)).
  
  But to add Hyperthreading, an untested and unproven technology which can guarantee no more than a 12% speed improvement, is folly. Better to amp the CPU clock and deal with a known like heat than to risk your company's livelihood on letting the CPU figure out which thread is which. That is something an OS is much more reliable in handling.
  
  Now that's just ridiculous. Hyperthreading is not untested or unproven. Similar ideas have been discussed in academic papers for years; Intel was just the first to put it into a modern CPU. It's hardly untested, either - Intel started seeding the first Hyperthreading-capable processors what, two years ago now? At that point I wouldn't have suggested running a mission-critical application on a machine with Hyperthreading enabled, but now? You'd be crazy not to if it actually speeds up the application you need to run.
  
  The reality is that in order to advance the speed of computer processors, it's necessary to make them more complicated.
  
  Parent Share
  twitter facebook
  - YHBT HAND! (Score:5, Informative)
    
    by TheMidget ( 512188 ) writes: on Wednesday January 07, 2004 @05:38AM (#7901429)
    Indeed, you've bitten on the following hooks:
    
    FDIV error: yes, it was division, not addition. However, conditions ware far less specific as Intel would have liked us to believe...
    
    CISC vs RISC: you correctly pointed out that Pentiums still are CISC (even though they nowadays have a RISC core)
    
    And you've missed the following hooks:
    
    CAFEBABE: that's java's magic number. The code that used to lock up Pentium II's was F00FC7C8 [ddj.com]
    
    Hyperthreading and the OS's job: no, hyperthreading does not do sth which the OS normally would do. It just pretends that there is a second processor. The OS is still responsible to assign threads to both virtual processors, just like it would do with two real processors!
    
    Note to moderators: mod grand-parent down. It is obviously a troll (albeit a rather well written troll!). If you absolutely must mod it up, at least use Funny rather than Interesting
    Parent Share
    twitter facebook
  - Linguistic CISC vs RISC (Score:2)
    
    by marko123 ( 131635 ) writes:
    
    "Hyperthreading is not untested or unproven"
    
    This commented used RISC type language, and in the process, a logical error was accidentally introduced... the correct programmatic statement would be:
    
    "Hyperthreading is not untested _nor_ unproven"
    
    CISC has it's advantage in the way the intended statement would be encoded:
    
    "Hyperthreading is better"
    
    This is a complex statement succinctly written with fewer keywords and fewer potential (epistemological) errors.
- Re:Ever buy a car with auto-everything? (Score:5, Informative)
  
  by BlueBiker ( 690984 ) writes: on Wednesday January 07, 2004 @05:28AM (#7901393)
  
  Well Intel is already encountering heat problems which limit how fast they can crank the clockspeed. Hyperthreading is a moderately successful attempt to make use of the available execution units on the chip which would otherwise sit idle. It's also not so new and untested, it has been implemented but not enabled on earlier P4 steppings.
  
  Athlon and Athlon64 are generally better able to make use of their execution units, and wouldn't benefit from HT as much as P4/Xeon.
  
  Parent Share
  twitter facebook
- - Re:Just Marketing BS by Intel to get suckers to bu (Score:5, Interesting)
    
    by idiotnot ( 302133 ) writes: <sean@757.org> on Wednesday January 07, 2004 @05:19AM (#7901358) Homepage Journal
    
    Perhaps I'm feeding a troll here, but....
    
    64 bits, while not interesting in and of itself, is interesting in AMD's implementation. I have an UltraSparc sitting on my desk at work, and I assure you it's one of the most boring machines in the world. Why is AMD interesting? In the Opteron/Athlon 64 they've fixed some of the shortcomings of the x86 architecture. More registers. Access to more than 4GB of RAM without menutia (like Intel uses). Things that were expensive in a register-starved 32 bit processor aren't on an Athlon64.
    
    No, it's not innovative, not by a longshot. It's the same damn thing Intel did when they introduced the 80386. But it continues the line unbroken, and that's why the processor is important.
    
    Hyperthreading is interesting, I agree, but I'd much prefer more affordable dual processor machines. Why in the world do Intel, AMD, and Microsoft go out of their way to keep SMP machines off the desktop? Apple certainly is going in the opposite direction.
    
    Parent Share
    twitter facebook
    - Re:Just Marketing BS by Intel to get suckers to bu (Score:5, Interesting)
      
      by drsmithy ( 35869 ) writes: <drsmithy@nOSPAm.gmail.com> on Wednesday January 07, 2004 @06:38AM (#7901586)
      
      Why in the world do Intel, AMD, and Microsoft go out of their way to keep SMP machines off the desktop? Apple certainly is going in the opposite direction.
      No, they aren't. The Apple "common desktop" oriented machines - the eMac, iMac and perhaps at a stretch the 1.6Ghz G5 - are all single CPU machines and are likely to remain so now the G5 has finally appeared (price alone, without going into other aspects, puts the dual G5s into workstation/high-end enthusiast desktop territory).
      Apple briefly flirted with putting dual CPUs into their nearly-home-desktop machines, but this was driven by the massive speed deficit at the time of G4 CPUs - they *had* to have dual CPUs to be even remotely competitive. No matter what else Apple's marketing department might have tried to say.
      If you could option a dual CPU onto an eMac, and all the iMacs were dual CPU, then your comment would be accurate. Two high-end machines out of a base range of seven (and that's ignoring the laptops) is not a paradigm shift. By that measure, just about any major manufacturer is "going in the opposite direction".
      
      Parent Share
      twitter facebook
    - Re:Just Marketing BS by Intel to get suckers to bu (Score:3, Interesting)
      
      by ThaReetLad ( 538112 ) writes:
      
      I wouldn't say that intel and AMD are against dual CPU machines on the desktop exactly, its just that they cost too much for most users, and most of the time money is better spent on a high end single processor machine than a dual processor one. Of course that is mostly to do with the fact that most SMP systems available up until now haven't scaled very well, not least because with Athlon MP's and Xeons the second CPU has to share the available bandwith with the first. Now though there is the Opteron dual p
- Re:Capsule summary. (Score:3, Informative)
  
  by msgmonkey ( 599753 ) writes:
  
  The only way "better caches" will improve SMT is if you had one cache for each thread, however with that kind of configuration you basically end up with two cores on one chip.
  
  The original thinking behind SMT was that with cache and branch prediction misses staring to have very large penalties, switching to an alternate thread would result in significant performance increase.
  
  It turns out however that doing context switching at this ultra-fine granularity causes the cache miss rate to go up as each thread
  - Re:Capsule summary. (Score:2)
    
    by AlecC ( 512609 ) writes:
    
    Not entirely so. If each process can get its core program into cache separately, and each occupies less than half the cache for that core functionality (i.e. not for the large-scale data being processed), then they will not fight for use of the cache. It also depends upon the associativity of the cache. Each program will have a small number of cache "hotspots", at which nit is intensively using the data. The more threads there are running, the more chance there is in an N-way associative cache that several
- Re:Capsule summary. (Score:2)
  
  by Glasswire ( 302197 ) writes:
  
  Howver, HT as Intel has implemented it, will reduce overall latency running more than one single threaded app, not just multi-threaded apps. Who has a system with ONLY ONE THREAD executing?
- Re:Jim Kirk (Score:2, Informative)
  
  by GigsVT ( 208848 ) writes:
  
  You are thinking of James T Kirk... See this is James R Kirk. :)
- Re:Cinebench (Score:2)
  
  by iansmith ( 444117 ) writes:
  
  What is wrong with it? There is a line in the graph missing, but he explains that.
  
  "Obviously when hyper-threading was disabled on my P4 test system, I was unable to run the Multiple CPU portion of Cinebench's rendering benchmark."

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

SMT (Score:2, Troll)

Re:SMT (Score:4, Interesting)

Re:SMT (Score:3, Interesting)

Re:SMT (Score:3, Insightful)

Re:SMT (Score:3, Informative)

Re:SMT (Score:3, Informative)

Re:SMT (Score:2)

Re:SMT (Score:2)

Re:SMT (Score:5, Interesting)

Re:SMT (Score:2)

Re:SMT (Score:5, Interesting)

Re:SMT (Score:2)

Assembly sucks? (Score:3, Informative)

Re:SMT (Score:2, Informative)

Interesting. (Score:5, Informative)

Re:Interesting. (Score:5, Funny)

From the article: (Score:2, Funny)

Re:From the article: (Score:5, Informative)

What it is, really (Score:2)

Re:From the article: (Score:5, Funny)

Re:From the article: (Score:3, Funny)

The diff between a used-car salesman and a PC one- (Score:2)

Intel's Whitepaper (Score:5, Informative)

Re:Intel's Whitepaper (Score:5, Informative)

Call that hyperthreading? (Score:5, Funny)

For the real technical details (Score:5, Informative)

Celery (Score:4, Insightful)

Re:Celery (Score:5, Informative)

Re:Celery (Score:2)

Re:Celery (Score:3, Insightful)

Wrong percentages? (Score:5, Interesting)

Yup, all over the place... (Score:2, Informative)

Re:Wrong percentages? (Score:4, Interesting)

Re:Wrong percentages? (Score:2)

Being philosophical on this... (Score:5, Interesting)

RISC gives you more bang for your buck (Score:5, Interesting)

Re:RISC gives you more bang for your buck (Score:3, Insightful)

Re:RISC gives you more bang for your buck (Score:5, Interesting)

Please get your terms straight! (Score:3, Informative)

Re:RISC gives you more bang for your buck (Score:2)

Re:RISC gives you more bang for your buck (Score:2)

Re:Being philosophical on this... (Score:2)

Re:Being philosophical on this... (Score:2)

oh goodie! (Score:2, Funny)

Re:oh goodie! (Score:2)

Cache Contention (Score:4, Interesting)

Everything I know about Hyperthreading... (Score:5, Informative)

Quick Q (Score:5, Interesting)

Re:Quick Q (Score:5, Insightful)

Re:Quick Q (Score:3, Interesting)

Cost (Score:2, Informative)

Re:Quick Q (Score:2, Interesting)

Re:Quick Q (Score:2)

Re:Quick Q (Score:2)

quieter (Score:2)

Re:Quick Q (Score:2)

Cache contention with Hyperthreading (Score:5, Interesting)

Nitpick (Score:2)

Re:Cache contention with Hyperthreading (Score:2)

Future prognosis for HT (Score:5, Interesting)

Re:Future prognosis for HT (Score:5, Insightful)

Memory bottleneck (was: Future prognosis for HT) (Score:5, Interesting)

I/O Bottleneck (Score:2)

Re:Future prognosis for HT (Score:2)

Situations where HT really becomes useful (Score:5, Interesting)

The sound of software breaking (Score:3, Informative)

HT is awesome (Score:5, Interesting)

Re:HT is awesome (Score:2)

Re:HT is awesome (Score:3, Insightful)

how to enable for older processors? (Score:3, Interesting)

Hyper-threading explained in 300 words or less. (Score:4, Informative)

The thing that got me about CPU performance (Score:5, Insightful)

Analogy (Score:4, Interesting)

Re:Analogy (Score:3, Funny)

HT and VMWare: perfect together! (Score:3, Interesting)

Re:HT and VMWare: perfect together! (Score:3, Informative)

HT Technology (Score:3, Informative)

Distributed Computing (Score:2)

Beware of HT! (Score:2)

AnandTech on Hyperthreading (Score:3, Informative)