×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps

timothy posted about a year ago | from the code-slower dept.

Intel 176

An anonymous reader writes "Tom's Hardware has published a lengthy article and a set of benchmarks on the new "Haswell" CPUs from Intel. It's just a performance preview, but it isn't just more of the same. While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast which might be important for digital imaging applications and scientific computing." The serious performance increase has a few caveats: you have to use either AVX2 or FMA3, and then only in code that takes advantage of vectorization. Floating point operations using AVX or plain old SSE3 see more modest increases in performance (in line with integer performance increases).

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

176 comments

And that's why (-1)

Anonymous Coward | about a year ago | (#43207325)

That's why I bought a Saturn.

Re:And that's why (0)

noh8rz10 (2716597) | about a year ago | (#43207625)

"Tom's Hardware has published a lengthy article and a set of benchmarks on the new "Haswell" CPUs from Intel.

Yes, but will it blend?

Would that improve hashing speeds in, say, Bitcoin (1)

d33tah (2722297) | about a year ago | (#43207359)

Would that improve hashing speeds in, say, Bitcoin?

Re:Would that improve hashing speeds in, say, Bitc (0)

Anonymous Coward | about a year ago | (#43207397)

That does not sound like something that would benefit from faster floating point operations.

Re:Would that improve hashing speeds in, say, Bitc (4, Informative)

slashmydots (2189826) | about a year ago | (#43207459)

Slightly, but you haven't been keeping up on the latest hardware? My pair of Sapphire 5830's graphics cards would top off at about 435MH/s at a total system wattage of around 520W. The new Jalapeno chips from butterfly labs will do 4500 MH/s using 2 watts total system power. For comparison, my i5-2400 performed 14MH/s at 95W or so. So the Jalapeno is about 321x faster and about 47x more power efficient so combined, I believe that's 15,267.864x more efficient.

Re:Would that improve hashing speeds in, say, Bitc (1)

0100010001010011 (652467) | about a year ago | (#43207919)

Can the Jalapeno chips do anything else when the Bitcoin market crashes? At least with the video cards I cant still drive video cards with them.

Re:Would that improve hashing speeds in, say, Bitc (0)

Anonymous Coward | about a year ago | (#43208217)

Calculate password hashes? Or collisions?

Re:Would that improve hashing speeds in, say, Bitc (-1)

Anonymous Coward | about a year ago | (#43208605)

What a cretin you are. You dribble your garbage at the self-same moment every citizen in Cyprus is denied access to their bank accounts, because their masters are about to debate how much money the banks will be allowed to steal from every investor.

Hundreds of billions have been spent since WW2 building investor confidence in the banking system of Europe. Now we are supposed to believe that for the sake of TEN BILLION (equivalent to the banking system to the loose change you'd find down the side of your settee), the politicians would destroy that confidence.

The type of currency represented by Bitcoin and Gold becomes vastly more valuable and desirable at times like these. Of course, betas are told NEVER to buy gold (gee, I so wonder why). Mouthy betas, like '0100010001010011', are proud to broadcast the propaganda messages of their 'betters'.

Bitcoin, what a joke- dribble, dribble. Obama, what a great leader, dribble, dribble. Our boys in uniform, what Humanitarian heroes, dribble, dribble.

Re:Would that improve hashing speeds in, say, Bitc (0)

viperidaenz (2515578) | about a year ago | (#43208823)

Bitcoins still hold no value to me. No one I deal with accepts them as currency, hence they hold no value.
I can't pay my taxes with bitcoins, I can't buy food, I can't repay my mortgage, I can't buy petrol. What can I do with a bitcoin?

Re:Would that improve hashing speeds in, say, Bitc (0)

Anonymous Coward | about a year ago | (#43209277)

Can I buy gold with bitcoins?
OMG yes! [coinabul.com]

Re:Would that improve hashing speeds in, say, Bitc (0)

Anonymous Coward | about a year ago | (#43208683)

And will you still be using your outdated video cards when that time comes? Perhaps, perhaps not. Sure, it could theoretically still drive video, but if it's not being used anymore, what's the difference?

Re:Would that improve hashing speeds in, say, Bitc (0)

Anonymous Coward | about a year ago | (#43208951)

I understand that a big question is whether the Butterfly Labs chips actually exist, let alone work.

Re:Would that improve hashing speeds in, say, Bitc (3, Insightful)

Anonymous Coward | about a year ago | (#43207611)

Would that improve hashing speeds in, say, Bitcoin?

Bitcoin is based on SHA256 hashing, which has zero floating point operations. So no, this will not impact Bitcoin mining at all.

Let's see... (5, Funny)

bluegutang (2814641) | about a year ago | (#43207361)

" Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps "

How much of a gain? More or less than 0.00013572067699?

Re:Let's see... (0)

kimvette (919543) | about a year ago | (#43207389)

FTFS:

While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast

HTH

Re:Let's see... (5, Informative)

0100010001010011 (652467) | about a year ago | (#43207461)

It's a joke. The Intel P5 Pentium FPU had a bug where

4195835/3145727=1.333739068902037589 The correct answer is 1.333820449136241002.

Less rounding of floating point numbers (4, Informative)

raymorris (2726007) | about a year ago | (#43207539)

While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast HTH

Integer and floating point are separately implemented in the hardware, so an improvement to one often doesn't apply to the other. You can add integers by counting on your fingers. To do that with floating point, you have to cut your fingers into fractions of fingers - a very different process.
See: http://en.wikipedia.org/wiki/FMA3 [wikipedia.org]
It's common to have an accumulator like this:

X = X + (Y * Z)

To compute that in floating points, the processor normally does:

A= ROUND(Y*Z) X=ROUND(X+A)

Each ROUND() is necessary because the processor only has 64 bits in which to store the endless digits after the decimal point. FMA can fuse the multiply and the add, getting rid of one rounding step, and the intermediate variable:

X= ROUND( X + (Y*Z) )

That makes it faster. Since integers don't get rounded to the available precision, the optimization doesn't apply to integers. The above processor would do Y*Z, then +X, then round, then X=. A CPU designer can make that faster by including either a "add and multiply" circuit or a "add and round" circuit or a "round and assign' circuit. Any set of operations can be done in two clock cycles, if the maker decides to include a hardware circuit for it.

Re:Less rounding of floating point numbers (0)

Anonymous Coward | about a year ago | (#43207627)

Any set of operations can be done in two clock cycles, if the maker decides to include a hardware circuit for it.

Any set of operations can be done in any number of clock cycles, if your clock cycles are the appropriate length.

So in other words (0)

Anonymous Coward | about a year ago | (#43207403)

Certain kinds of apps will get a nice performance boost if they're running in house, or on a vendor managed server. If the customer installs the software, then no.

Hope it's going in the new Mac Pro (3, Interesting)

GlobalEcho (26240) | about a year ago | (#43207431)

I hope there's really a new Mac Pro coming [ibtimes.com] and that it has these chips in it! I do a heck of a lot of PDE solving, statistics and simulations, and would love to have a screamin' machine again.

Re:Hope it's going in the new Mac Pro (5, Insightful)

Anonymous Coward | about a year ago | (#43207527)

Do you really need a Mac for that? If not, it seems you're limiting your potential by having to wait for the holy artifacts to be released.

Re:Hope it's going in the new Mac Pro (-1, Troll)

Anonymous Coward | about a year ago | (#43207669)

He buys the special edition that comes with a dildo molded after Steve Jobs' cock that you mount into your seat. That way whenever he sits at his desk he can have Steve's cock fucking his ass like a good little iTard.

Re:Hope it's going in the new Mac Pro (1)

Anonymous Coward | about a year ago | (#43208369)

Thank you for that imagery :D

Re:Hope it's going in the new Mac Pro (1)

Anonymous Coward | about a year ago | (#43208549)

He buys the special edition that comes with a dildo

Oh, I get it! Because Mac owners are homosexuals...that's funny! Stupid homosexuals. Mod parent up!

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43209329)

Gay Mac user spotted! *whoop* *whoop* Gay Mac user spotted!

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43209575)

no. Just Steve Jobs was the homosexual. You can either willingly take his cock, or call it rape.

Re:Hope it's going in the new Mac Pro (5, Interesting)

semi-extrinsic (1997002) | about a year ago | (#43207555)

If you're doing numerics, what the fuck (if you'll pardon my French) are you doing buying Apple? I'm working on two-phase Navier-Stokes solvers myself, and I just bought a new rig consisting of 3 boxes each with a Intel Core i7 @ 3.7 GHz, 12 GB RAM, an SSD drive and a big-ass cooling system. In total that cost less than the Mac Pro with a single Core i7 @ 3.3 GHz listed in that article.You're paying 3x more than you should, and you get what extra? A shiny case? Puh-lease.

Re:Hope it's going in the new Mac Pro (1)

Anonymous Coward | about a year ago | (#43207805)

Most physics researchers (source: physics PhD) use Mac desktops/laptops and Linux servers. Macs are perfect environments for a mix of coding and general computing, with good support for *nix tools. Anything serious gets done on a cluster. I've seen this in several universities, all of them top tier (e.g. Oxford, Imperial, UCL, Warwick), so it's not isolated.

But hey, this is Slashdot.

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43208147)

Mac's have their advantages but in the places I visited (more experimental physics than theoretical) Windows PC are still far more common. Roughly 60% Windows, 20% Mac and 20% Linux here. But this is not set in stone as we only need two or three Windows only software packages that you can run in a virtual machine.

Re:Hope it's going in the new Mac Pro (1)

newcastlejon (1483695) | about a year ago | (#43208289)

If your experimental labs are anything like our workshops you'll probably find them running a few ancient Win95/DOS tools that don't take kindly to being cooped up in a VM without direct access to hardware. As I think back, though, I do recall a lonely old G3 being used as a data logger.

Re:Hope it's going in the new Mac Pro (1)

IWannaBeAnAC (653701) | about a year ago | (#43209083)

Most of the people in the physics department here use windows desktops, but pretty much all of the numerics people use linux desktops. Naturally, all of the computing clusters are linux. It seems that virtually all laptops are macs though, which is curious. Possibly people would like to use macs on the desktop but there is some barrier (eg, purchasing or IT administration policies) ? I'll have to find out!

Re:Hope it's going in the new Mac Pro (2)

LordLimecat (1103839) | about a year ago | (#43209499)

Youre paying at least double for the same hardware on a Mac. The Mac cited in the article has 2x 6-core Xeons @ 2.4gHz. Those (assuming E5645s) can be had for ~$575 each, with a motherboard at ~$275. Everything else is pocket change; a whole right with SSDs etc could be had for under $1700.

But Im sure someone somewhere will explain why the aluminum makes the extra $2000 for the Mac worth it.

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43207811)

Your Navier-Stokes solutions aren't going to be anywhere near as hip and thin as his.

Re:Hope it's going in the new Mac Pro (0)

mozumder (178398) | about a year ago | (#43208337)

The Core i7's are consumer-grade processors and are slower than the Xeon's the Mac Pros use, they don't even use ECC cache memory. Good luck running a week-long simulation job with one random bit-error in your data. So yes, Core i7's are amateur junk, and using them in a pro workstation is a good way to get you fired from your job, because you do not know professional requirements. "Herpaderp why can't we just use my overclocked Core i7 herpaderp! It's good for gaming! It should be good enough for this nuclear simulation! herpaderp!"

So if you want absolute speeds, ECC reliability, & cheap prices, you have to go with Apple Mac Pro workstations. Even Dell & HP can't even compete against Mac Pros. Have you actually spec'd out equivalent systems from Sun, IBM, Dell & HP? Go ahead, try it, and see how much you save. There's a reason smart people use Mac Pros. These are physicists, not noob morons like you dorks. Your best bet is to learn from them.

And don't make your boss laugh before he fires you when you tell him you actually want to build your own system...

Re:Hope it's going in the new Mac Pro (3, Insightful)

Aardpig (622459) | about a year ago | (#43208621)

Erm -- ECC memory is slower than non-ECC memory, I think.

Re:Hope it's going in the new Mac Pro (5, Informative)

KonoWatakushi (910213) | about a year ago | (#43209131)

ECC memory is only marginally slower. Considering error rates and modern memory sizes, it is far past time that it became a standard feature. The extra cost would be totally insignificant if were standard, and not used as an excuse to gouge people on Xeons.

Re:Hope it's going in the new Mac Pro (5, Informative)

washu_k (1628007) | about a year ago | (#43208879)

The Core i7's are consumer-grade processors and are slower than the Xeon's the Mac Pros use

This is completely incorrect. The current Mac Pros use Nehalem based Xeons which are two generations back from the current Ivy Bridge i7s. Xeons may have differences in core count, cache and/or ECC support but their execution units are the same as their desktop equivalents. The base Mac Pro CPU is equivalent to an i7-960 with ECC support. The current Ivy Bridge i7s are a fair bit faster.

Re:Hope it's going in the new Mac Pro (1)

viperidaenz (2515578) | about a year ago | (#43209117)

The current top of the line Mac Pro has a pair of 3 year old CPUs (2x 6 core E5645/50/75, released Q1, 2010). You can't compare to any current HP or Dell etc as they use newer generation Xeons.

A 12-core MacPro in NZ costs $6100.
A top of the line 12-core Dell costs $6200.

Dell has E5-2630 CPU's, Mac has E5645. Dell wins there, more cache, newer CPU.
Dell has 16GB ECC Ram, Mac as 12GB. Dell wins there, 1600mhz, 128gb max, Mac is 1333Mhz, 64gb max.

I'm sure a 2 year old Dell is cheaper than a brand new Mac Pro.

Of course, with those same CPU and RAM specs, there are cheaper Dells, down to $5300. So you save $900 and get more performance.

Morons like this are why you lose shuttles, etc (0)

Anonymous Coward | about a year ago | (#43209145)

Imagine employing someone as stupid as 'mozumder' in any mission critical situation. He is like the embodiment of the phrase "no-one ever gets fired for buying IBM". A brain-dead clod who is the joy of every shark selling any over-priced brand.

Correctness in calculation is a computer science and maths discipline. It is NEVER achieved EVER, EVER by relying on the accuracy of any given piece of hardware. Indeed, any company doing real critical work would immediately FIRE any cretin like 'mozumder' who stated "we can trust this hardware".

"I don't have to know how to do my job properly- that's why I bought a Mac Pro."

For those that wonder, it is impossible to build a perfectly reliable CPU, and one shouldn't even try. Instead, you build 'good enough' hardware, and use correctly composed software systems to compensate for statistically rare anomalies. ECC memory is largely a marketing gimmick. There are, sadly, hundreds of thousands of places where 'data' can become corrupt in a CPU. Most of this possible type of error cannot be feasibly detected by inbuilt hardware solutions. ECC is used simply because it is trivial to add to memory blocks- blocks that represent only the tiniest fragment of all possible logic errors.

The greatest vulnerability in a modern system is in serial data transports, where the transmission line is driven as fast as possible. However, error correction is always used on these interconnects to enable such high speeds. Ordinary logic is clocked at vastly less troubling speeds, so that the likelihood of failure is statistically very low indeed. Any hardware errors that do then happen can be considered as unavoidable- to be countered by proper software procedures.

Mission critical calculations MUST be subject to sanity tests. This may involve running the same calculation more than once- running different algorithms that should give the same result, using multiple computers, or calculating reasonable bounds for the expected results.

The idea that someone could say "Duh, I don't have to bother- we use a Mac Pro with ECC" is so terrifying, people expressing such opinions should probably be identified to ensure they aren't working somewhere where their idiocy and complete lack of maths skills may get someone killed.

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43209433)

http://store.apple.com/au/browse/home/shop_mac/family/mac_pro

2999 dollars (au) for a mac pro with 6gb ram and a single quadcore 3.2ghz Xeon

Got a quote yesterday for a HP with a 3.4 ghz Xeon 8gb ram, hardware raid and 1tb drive (2x 500gb presumably raided) including Windows 2008 r2 standard (so you can't say I didn't pay for OS)

for 2248 dollars.

What were you saying about mac pros being cheaper?

About the only thing "better" in the mac pro; was the ATI 5770. Which is cheaper than the 700 dollar price premium.

Re:Hope it's going in the new Mac Pro (1)

fyngyrz (762201) | about a year ago | (#43208417)

Not to put too fine a point on it, he gets OSX, the OSX ecosystem, the vast majority of the *nix ecosystem, the ability to VM several varieties of the Windows ecosystem *or* any one of a number of pure *nix ecosystems, all in parallel if he likes, the ability to drive a bunch of monitors (I've got six on mine), all manner of connectivity, and yes, perhaps last and even perhaps least, probably one of the best cases out there -- it's not just shiny. it's bloody awesome.

I don't even *like* Apple the company -- they piss me off more than I can adequately say for a list of reasons I won't bore you with -- but my Mac Pro was worth every penny for all the things it brings to the table. Could it be better? Yep. Will it be better next time around? Almost certainly.

Now go back to being happy with your stuff, and we'll go back to being happy with ours.

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43209473)

Other than OSX and the higher price tag; what was the point of the rest of your comment?

Don't other PCs provide you access to nix and windows VMs? or is that a Mac "feature". "all in parallel?" driving bunches of monitors? I NEVER SAW THAT ON A PC EVER! and connectivity! never seen that since macs.

I guess he gets older Xeon processors for his extra money. Cause that's better right. They don't make them like they used to right!!

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43208419)

Your argument fails due to facts. The Mac Pro line is not available with Core i* processors. It's only available with Xeon processors.

Re:Hope it's going in the new Mac Pro (1)

GlobalEcho (26240) | about a year ago | (#43208429)

If you're doing numerics, what the fuck (if you'll pardon my French) are you doing buying Apple?

Fair question. It turns out, PDE solving etc. isn't all I do, so while I like my machine to be reasonably fast at the numerics, I require it to work well as a general-purpose computer, too. To me, Windows, Linux and FreeBSD fail to meet that criterion.

I do small-to-medium problems locally without having to think about remote execution issues, and then farm truly heavy numerics out to parallel processing farms like anybody else (aside from the PDE solvers, much of what I do is embarrassingly parallel). It's really quite nice, say, running some giant calculation in Mathematica or Matlab and then being able to click-n-drag the output plot into presentation software. That workflow is unavailable in Linux, and probably full of pitfalls on Windows.

[Same answer to the poster who wonders why bother to wait. ]

Re:Hope it's going in the new Mac Pro (3, Interesting)

spire3661 (1038968) | about a year ago | (#43207567)

Why not just do that on real workstation hardware and tap into it remotely?

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43208013)

Because that doesn't allow you to show off your "Oooh, shiny!!". Duh.

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43208149)

Since when was the Mac Pro not "real" workstation hardware?

Re:Hope it's going in the new Mac Pro (1)

alen (225700) | about a year ago | (#43208237)

Since it sells with 2 year old cpu's
Or was it 2 generation old cpu's

Re:Hope it's going in the new Mac Pro (0)

Anonymous Coward | about a year ago | (#43208327)

How stupid. Most workstations companies buy are no different. Almost no company is buying bleeding edge hardware in their workstations.

Re:Hope it's going in the new Mac Pro (1)

petermgreen (876956) | about a year ago | (#43208361)

The mac pros currently ship with westmere based CPUs. The most recent comparable CPUs are sandy bridge based. So even if you count both new core designs and die shrinks as "generations" it's still only one generation behind comparable CPUs.

Re:Hope it's going in the new Mac Pro (2)

mozumder (178398) | about a year ago | (#43208171)

The Mac Pros use Xeon chips, which are usually updated about 1 year after the mainstream Core processors are out.

Might be important, but probably not... (4, Interesting)

MasseKid (1294554) | about a year ago | (#43207449)

For problems where you need floating point AND is not multithread friendly AND need large computing power AND is specially coded, then this will be of great use. However, most massive computing problems like this are multi-thread friendly and this will still be roughly an order of magnitude from the speeds you can get by using a GPU.

Re:Might be important, but probably not... (3, Insightful)

semi-extrinsic (1997002) | about a year ago | (#43207577)

The good thing about manufacturers speeding up SSE/AVX/etc. is that the linear algebra libraries (specifically the ATLAS implementation of BLAS and LAPACK) usually release code that makes use of the new hawtness in about six months after release. Do you know how much software relies on BLAS and LAPACK for speed?

Also (1)

Sycraft-fu (314770) | about a year ago | (#43208583)

Intel's C/C++ and FORTRAN compilers are exceedingly efficient at vectorization, and are of course updated to use their new instructions. Does take a bit for software to be compiled using it, but you can see some real gains in a lot of things without special work.

I also think people who do GPGPU get a little over focused on it and think it is the solution to all problems. You find that some things like, say, graphics rendering, are extremely fast on the stream processors that make up a modern GPU. However you find other things not so much, they can even be slower. Intel CPUs are very good as mixed tasks, and the better vector units only make that more true.

Re:Might be important, but probably not... (0)

Anonymous Coward | about a year ago | (#43207697)

How good are GPUs for large matrix multiplications nowadays compared to CPUs? This sounds like something that could help a lot with linear algebra, which is a huge part of scientific computing. Also, I think you are overestimating the number of problems where a GPU can get its full performance. A problem needs to be much more parallelizable to get good performance on a GPU than on a multi-core CPU.

Re:Might be important, but probably not... (0)

Anonymous Coward | about a year ago | (#43207733)

Massive computing problems get specially coded, and don't run on a single machine anyways. The gpu bus bottleneck severely restricts the problem set that general purpose scientific computing users on workstations can benefit from offloading the cpu.

Re:Might be important, but probably not... (0)

Anonymous Coward | about a year ago | (#43207761)

It's still relevant. If you're using OpenCL, then your GPU and CPU will both be pegged at 100% performing whatever multithreaded math you throw at it.

At least with older mac pros, there were some things that actually were faster on the CPU than the GPU, unless you threw massive money at your video card.

Re:Might be important, but probably not... (1)

GlobalEcho (26240) | about a year ago | (#43208461)

That's one of the nice things about OpenCL. I wish they would come up with more (and better) math libraries.

Re:Might be important, but probably not... (1)

Aardpig (622459) | about a year ago | (#43208633)

I wish NVIDIA would update their drivers to support OpenCL 1.1. Oh wait, that's not going to happen because they are trying to push CUDA instead...

Re:Might be important, but probably not... (1)

godrik (1287354) | about a year ago | (#43207867)

Intel Xeon Phi relies on avx (version 1 I believe) and using avx gets you good improvement compared to not using avx for both sequential and parallel codes. Of course, course sequential code on Xeon Phi is typically slower than a regular sandy bridge processor.

Many applications can use 16 float operations simultaneously. Certainly many video codecs and physics engine.

GPUs can be good for many computations but tehre are many case where they are not so good. Most pointer chasing type of application tend not to be so GPU-friendly. If you need to go back and forth between CPU and GPU, then you pay some latency. GPUs suffer from programming abstraction problems (no CUDA on AMD, OpenCL is suboptimal on NVIDIA, openacc is only good for simple tasks).

Larger SIMD lanes on the CPU side will certainly be a good thing for performance.

Re:Might be important, but probably not... (1)

godrik (1287354) | about a year ago | (#43207981)

replying to self. Xeon Phi uses larger lanes than AVX. It is 512 bits in Xeon Phi and 256 in AVX, I got the names mixed up.

Re:Might be important, but probably not... (2)

Bengie (1121981) | about a year ago | (#43208015)

http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner [intel.com]

An important component of the Intel Xeon Phi coprocessor’s core is its vector processing unit (VPU), shown in Figure 5. The VPU features a novel 512-bit SIMD instruction set, officially known as Intel® Initial Many Core Instructions (Intel® IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision (DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers.

Re:Might be important, but probably not... (1)

godrik (1287354) | about a year ago | (#43208141)

My bad, I realize later that AVX was the new instruction set for sandy bridge and not for xeon phi. AVX (version whatever) and IMCI instructions are quite similar (gather/scatter, Fused Multiply Add, swizzling/permute). Their main different is the SIMD width.

My overall point remains valid. Doing floating point arithmetic by packs of 256 bits is overall useful.

Re:Might be important, but probably not... (1)

Bengie (1121981) | about a year ago | (#43207999)

Not all multi-threaded code is large matrix friendly and GPUs need large matrix math to become useful.

Re:Might be important, but probably not... (1)

pclminion (145572) | about a year ago | (#43209403)

Yeah, pretty much. Basically, they just doubled the width of the vector execution units. Obviously, that will double the FLOPS for vectorized code. In other news, 8 cores can do twice the work of 4 cores, if your code is multithreaded properly.

Awesome! (0)

Anonymous Coward | about a year ago | (#43207497)

I'm gonna buy some i5 "watchacally" chips soon and I'll wait for the price to come down.

With tech, unless you need it NOW, wait because the price will always come down.

And I win again!

Nearing complete integration (1)

bstrobl (1805978) | about a year ago | (#43207501)

The thing that interests me most about this generation is the progress towards a single chip solution. Ultrabooks and tablets can get a multi chip package with the PCH (last remnant of the old chipset) soldered along the CPU/GPU die. Shouldn't take long till everything is fabbed onto one piece of silicon, reducing power requirements and gadget size.

wtf? fma3? (1, Offtopic)

convolvatron (176505) | about a year ago | (#43207595)

could someone tell me how many separate instruction sets, pipelines and register files I
get in a mainline CPU these days? i turned away for a second and completely lost track.

what happens with the 10 that you aren't using? just sitting there reducing the yield?

Re:wtf? fma3? (0)

Anonymous Coward | about a year ago | (#43207765)

agree. those idiots at intel have no clue what they're doing.

Re:wtf? fma3? (0)

Anonymous Coward | about a year ago | (#43209215)

With respect to vector extensions, no they don't--the extensions and improvements have been totally half-assed at every step. To say nothing of yields, it is a total PITA as a developer having to check individually for 107 different features which all should have been standard a decade ago.

It is a total joke that it has taken this long to get FMA support, and they still don't have a proper vector permute instruction.

128 bit floats: when? (1)

rmstar (114746) | about a year ago | (#43207633)

While speed for single and double floats is all well and good, I wonder - when will there finally be hardware support for 128 bit (quadruple precission) floats? [wikipedia.org]

Re:128 bit floats: when? (1)

godrik (1287354) | about a year ago | (#43207753)

What is the use for them? for "personal" use, floats are all you will ever need. Many physics computation stays in single precision to avoid doubling the memory usage. I guess fluid mecanic computation use double, but is there really a use for quad. Who needs that kind of precisions?

Re:128 bit floats: when? (0)

Anonymous Coward | about a year ago | (#43208653)

Define personal use. Most people use floats through scripting languages. A 32-bit floating point object w/ a 23-bit mantissa is worthless, because more often than not you're doing integer arithmetic. Not being able to represent more than 8 million is pretty limiting. And believe it or not, a 53-bit mantissa isn't that much better.

Re:128 bit floats: when? (1)

Twinbee (767046) | about a year ago | (#43207843)

I would have hoped more bits were given to the exponent in quad precision. It's given 15 bits compared to double precision's 11.

So many bits, and it almost all goes to the fraction - a real shame.

Re:128 bit floats: when? (0)

Anonymous Coward | about a year ago | (#43208259)

[10^-4932,10^4932] isn't a big enough range?

Re:128 bit floats: when? (2)

Twinbee (767046) | about a year ago | (#43208405)

It would prevent the need to some extra math for extra high numbers (not just those that end on a high numbers, but where the intermediate calculation may be high (e.g.: factorial math to find out the probability of something if I recall). Plus, 96 bits is more than enough for the fraction if you ask me - very greedy in fact to take that to 112 at the cost of 16 bits the exponent could well do with.

Re:128 bit floats: when? (0)

Anonymous Coward | about a year ago | (#43208729)

I'm computing the zeros of the Zeta function in the region Im(z) > 10^4932 you insensitive clod!

Re:128 bit floats: when? (2)

gnasher719 (869701) | about a year ago | (#43208063)

While speed for single and double floats is all well and good, I wonder - when will there finally be hardware support for 128 bit (quadruple precission) floats?

It was there on PowerPC for many years, and with Haswell it will be there for x86 as well. FMA is all you need for efficient 128 bit arithmetic.

Floating point apps are almost twice as a fast (0)

Anonymous Coward | about a year ago | (#43207671)

Link translated from the original Italian.

Hmmmm (0)

Anonymous Coward | about a year ago | (#43207709)

"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x."

The RED bar is integer not floating point.

Confused? (1)

Narishma (822073) | about a year ago | (#43207785)

The serious [floating point] performance increase has a few caveats: you have to use either AVX2 or FMA3,

Isn't AVX2 just the integer version of AVX? Like SSE2 added integer versions of the SSE floating point instructions? If so, that sentence doesn't make sense.

Re:Confused? (1)

godrik (1287354) | about a year ago | (#43207975)

No, there is more to it:

        * Expansion of most integer AVX instructions to 256 bits
        * 3-operand general-purpose bit manipulation and multiply
        * Gather support, enabling vector elements to be loaded from non-contiguous memory locations
        * DWORD- and QWORD-granularity any-to-any permutes
        * Vector shifts
        * 3-operand fused multiply-accumulate support

source: wikipedia http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2 [wikipedia.org]

Re:Confused? (0)

Anonymous Coward | about a year ago | (#43207993)

Isn't AVX2 just the integer version of AVX? Like SSE2 added integer versions of the SSE floating point instructions? If so, that sentence doesn't make sense.

At a guess, the performance gains probably come from AVX2's introduction of fused multiply-accumulate, which can greatly speed up many common operations such as matrix multiplications.

AVX2 also adds support for loading SIMD vector elements from non-contiguous memory locations, which is a potential win for certain types of applications.

Re:Confused? (0)

Anonymous Coward | about a year ago | (#43208771)

AFAIK, you are correct and as usual, the summary is complete nonsense. It seems to me that the few new floating-point vector instructions would only improve performance in very isolated cases. E.g., if you have to load data from non-contiguous locations, you've mostly already lost the battle unless you are going to do a very large amount of work with that vector.

Re:Confused? (0)

Anonymous Coward | about a year ago | (#43208895)

In fact, for fractals, the article shows a 2x speedup for integers and less for floating-point (with some cryptic explanation that follows).

Error! (0)

Anonymous Coward | about a year ago | (#43207897)

"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x." Sorry to ruin it for everyone but the RED bar is integer not floating point.

ERROR (1)

xlokix (2869115) | about a year ago | (#43207947)

"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x." Sorry to ruin it for everyone but the RED bar is integer not floating point.

Poor AMD (0)

Billly Gates (198444) | about a year ago | (#43208349)

There new Thunder and durgango APUs are rumored to finally get close to the I7's!

This will crush them as AMD's former strength is floating point calculations and today it is multithreading or rather can get close to performance in multithreading.

Re:Poor AMD (4, Insightful)

dshk (838175) | about a year ago | (#43209197)

AMD already has FMA3. They also published great results. Of course nobody read it, at least I have seen mentioned it in the usual generic benchmark articles people like to refer (which does not use FMA3).

Tom's Hardware = official Intel PR outlet (-1)

Anonymous Coward | about a year ago | (#43208433)

Both Anandtech and Tom's Hardware receive vast amounts of money and favours from Intel to shill their products. Take this so-called 'preview'. No overclocking, no power consumption tests, no image quality tests on the games benchmarked. Why? Because Intel describes to the last detail exactly how Haswell is to be tested.

A number of points.
- the per-clock improvements over Ivybridge (the last Intel design) are fake. Modern CPUs no longer guarantee a particular clock speed, even when so-called power saving modes are turned off. CPU and GPU burst is ALWAYS active, and Intel makes the burst speeds on each new generation faster than the last. What you want to see (missing here) is performance per watt, performance per dollar, or maximum performance from either the most expensive part or the most overclocked part.
- Intel's GPUs plain stink. The drivers are rotten, and so is the hardware. There is a good reason Intel pays sites to ONLY use very low quality settings (where the game looks far worse than the console version) and to only quote the average frame rate. Even then, Tom's preview had to acknowledge over and over that even high FPS scores mislead, because the game was NOT fluid in use (Intel has massive latency issues in its GPU system).
- Haswell's so called FPU performance lead is a joke. All serious FPU work is done on the GPU these days, where AMD slaughters Intel. Intel was supposed to be the master of compatible code performance boosts, but Haswell's new FPU units are NOT used unless programs can be compiled to produce the new AVX instructions.

Here's how the con works. Anandtech reviewed the first notebook APU (llano- the forerunner to trinity) from AMD. They included dozens of graphs that were to Intel's benefit. The one graph they were paid NOT to include showed the AMD notebook gaming for TWICE the time on the same battery as did the Intel notebook (similar class).

Tom's preview refers to a 'better' Haswell GPU, the GT3, that the bent preview describes as BGA only. What Tom's actually means is that GT3 is a notebook only option (they don't say this outright, because AMD's vastly better APU parts are desktop too). Now, in notebooks, when people buy a gaming machine, they expect first-class discrete GPUs from either AMD or Nvidia. Intel's mega-expensive Haswell + GT3 is NOT going up against cheapo APUs from AMD- it is going up against discrete GPU parts of vastly greater performance.

So, Intel expects gamers to lay down $1200+ for a gaming notebook with Intel's discrete GPU solution. Have you ever heard anything more hilarious in your life? Any OEM building such a notebook might as well save themselves the time, and ship them straight to the same landfill where all those old ET game cartridges were buried.

The real story of Haswell has yet to be told. Intel promises the core can work at tablet/phone levels of power consumption, and beat ARM and AMD's Jaguar core. These should be the Haswells getting the early preview if such (hilariously laughable) claims are true. On the desktop people want faster parts- which Haswell will not provide. On the desktop, people want cheaper parts, which Intel will never agree to. On the desktop, people don't care if another few watts have been shaved from the energy bill.

Where are Intel's affordable 6-core parts (each shrink makes the cost of a given core that much lower)? Where is Intel's improved memory bus (Intel is still stuck in the stone age, with a 2x64 bit bus and DDR3, while AMD has provided Sony with a 256-bit bus and GDDR5 for a bandwidth beyond Intel's wildest dreams)?

If Intel was releasing a 6-core part for $150, there would be cheering from the rooftops, even if it were still based on ivybridge. Intel's pure greed is its greatest gift to its competitors today.

Re:Tom's Hardware = official Intel PR outlet (1)

Aardpig (622459) | about a year ago | (#43208663)

No overclocking? Ermahgerd, that's a showstopper for those wanting to do HPC!!!!!!!!!!!!!!!!!!

lies and bullshit (1)

decora (1710862) | about a year ago | (#43208493)

"hey kids, our CPU is twice as fast as the next guys!"*

*(you must rewrite your code to do twice as much stuff at once)
**(which has been true for like, 15 years ever since SSE + friends made it into the PC market)
***(which means developers have to spend time writing non-portable optimization code)

Re:lies and bullshit (0)

Anonymous Coward | about a year ago | (#43208959)

And you can gain ten times the performance already by using older and cheaper graphics hardware if all you care about is optimizing floating point performance.

GT3 (3, Interesting)

edxwelch (600979) | about a year ago | (#43208593)

AMD has lost the CPU race a long time ago, but still beats Intel with integrated graphics. Now, It looks like Haswell could win that battle too.
The article shows GT2 to be 15% - 50% faster than the old HD4000. That's still a bit slower than Trinity, but GT3 has double the execution units than GT2, potentially blowing anything away that AMD could offer.

Meanwhile in AMD land... (-1)

Anonymous Coward | about a year ago | (#43209455)

...crickets... They wanted a war and now Intel is driving them into the ground with a constant tick-tock set of improvements and a race to 0 (nm). We should obviously put a stop to this to allow AMD to recover.

bs hype is what this is (1)

Anonymous Coward | about a year ago | (#43209475)

when avx came out, it was supposed to be a major speedup..
guess what, lots of things are still faster in SSE2/3

many of the new registers appear to speed things up, but what isn't readily apparent is there haven't always been improvements in memory ports.

the major speedups are going to come from cleaning up the way instructions are handled and the memory lanes in the chip, not just throwing more registers at us

This guy (Agner Fog) is the best reference on the net for what's going on in these chips:
http://www.agner.org/optimize/blog/read.php?i=142

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...