AMD Designing All-New CPU Cores For ARMv8, X86 181
crookedvulture (1866146) writes "AMD just revealed that it has two all-new CPU cores in the works. One will be compatible with the 64-bit ARMv8 instruction set, while the other is meant as an x86 replacement for the Bulldozer architecture and its descendants. Both cores have been designed from the ground up by a team led by Jim Keller, the lead architect behind AMD's K8 architecture. Keller worked at Apple on the A4 and A4 before returning to AMD in 2012. The first chips based on the new AMD cores are due in 2016."
Keller worked at Apple on the A4 and A4 (Score:5, Funny)
Probably worked on the A4 and A4 and the A4, as well.
Re: (Score:2)
Re: (Score:2)
Re:Keller worked at Apple on the A4 and A4 (Score:5, Funny)
Ahh, you figured it out, the multiple A4s were referencing different things [wikipedia.org].
So pick two of those.
Re: (Score:2)
Re: (Score:2)
Are you sure? I heard he worked on the A4 not the A4.
Re: (Score:2)
I heard he worked on the A4 not the A4.
Classic Apple disinfo machine. :)
Re:Keller worked at Apple on the A4 and A4 (Score:5, Funny)
Re: (Score:2)
Keller worked at Apple on the A4 and A4
Either they meant that it was a dual-core CPU, or that Apple was churning them out like M&M's.
Re: (Score:2)
Either they meant that it was a dual-core CPU
Somebody has been spending too much time staring at /proc/cpuinfo !
Re: (Score:2)
Ah! The M4? It's close to the A4: http://en.wikipedia.org/wiki/M... [wikipedia.org]
Re:Keller worked at Apple on the A4 and A4 (Score:5, Informative)
You mean the A4 [wikipedia.org] on the A4 [audiusa.com] on the A4 [wikipedia.org]?
Re: (Score:2)
Couldn't one core... (Score:2)
Right, because that worked so well (Score:2)
How's Transmeta doing these days? Oh that's right they are defunct.
That kind of thing doesn't work well for performance.
Re: (Score:3)
Now, once Intel stopped pretending that Netburst was something other than a failure, and put some actual effort into lower power designs, it was Game Over; but they didn't do that overnight.
Re: (Score:2)
Re:Right, because that worked so well (Score:5, Interesting)
Transmeta was at the end of the era where decoding performance mattered. Keeping the translated code around was actually useful. These days decoding is approximately free on any CPU with half-decent performance -- the amount of extra die space for a complex decoder is not worth worrying about.
You can save a bit of power with a simpler decode stage, but you are unlikely to beat ARM Thumb-2 on power by software-translating x86 the way Transmeta did. Besides, most of the interesting code for low power applications is ARM or MIPS already, so what is the point?
Re: (Score:2)
These days decoding is approximately free on any CPU with half-decent performance
In what way? And what do you mean by "decoding"? Do you also include dependency solving, interlocking, reordering etc.? Because what I was thinking about was pushing even more to the SW component. The problem is, CPUs have been widening for quite some time because of our over-reliance on single-threaded SW. But even if it doesn't work nearly as well for eight-issue monsters, given that simple cores like Jaguar, which seem to be practicable if you have many more of them, push you back into the time of "quart
Re:Right, because that worked so well (Score:5, Interesting)
You cannot meaningfully do reordering and so on in software on a modern CPU. You do not know in advance which operands will be available from memory at which time. You have to redo that work every time you get to the code (unless it is in a tight loop, but modern x86's are REALLY good at tight loops) because circumstances will likely have changed -- and you cannot reorder in software every time, that is just too costly.
If you want to see an architecture which looks like it has a chance of breaking the limits on single-threaded performance, look at the Mill [millcomputing.com]. In theory you could software-translate x86 to Mill code and gain performance, but it would be really tricky and no Mill implementations exist yet.
Re: (Score:2)
Re: (Score:2)
You do not know in advance which operands will be available from memory at which time. ... If you want to see an architecture which looks like it has a chance of breaking the limits on single-threaded performance
Do I really need to know that, or can I just switch to a different thread of execution until then? And do we really need to care about single-threaded performance that much these days? What if I want to program in Go instead of C++? (E.g., what if Google wants 0.5M of new servers for deploying of Go services?) Perhaps some level of "outoforderiness" is desirable, but a lower one would do? I really don't care in what way the performance gets squeezed into my battery-powered devices
Re: (Score:3)
Do I really need to know that, or can I just switch to a different thread of execution until then?
Sun tried it, market penetration near zero. You can get 12 threads per socket on a desktop Intel CPU, good luck keeping 12 threads busy on mainstream workloads.
Single threaded performance is everything for a CPU; it is cheap to add sockets and cores for parallel workloads. For real parallel work you use the GPU anyway.
Re: (Score:2)
Sun tried it, market penetration near zero.
That just might have something to do with the ridiculous price they were asking for it, doesn't it?
You can get 12 threads per socket on a desktop Intel CPU, good luck keeping 12 threads busy on mainstream workloads.
Doesn't seem like that much of an issue to me. I can't think of an application where they wouldn't come in very handy. And I really tried, but nothing came out of it.
Single threaded performance is everything for a CPU; it is cheap to add sockets and cores for parallel workloads. For real parallel work you use the GPU anyway.
Well, that's good for certain kinds of data parallelism, but probably not for all of them. At least, right now, even though AMD is trying to stretch its usefulness as much as they can.
Re:Right, because that worked so well (Score:4, Insightful)
> And do we really need to care about single-threaded performance that much these days?
Not every task is parallelizable.
Second, are you going to pay for an engineer to make their code multi-threaded that shows X% run-time performance?
Re: (Score:2)
Not every task is parallelizable.
That's a red herring. Many more tasks probably are than most people would think. See Guy Steele's work. I think I even came up with a scheme to run TeX passes using speculative execution (results always correct, and most of the times faster) the other day (the state to keep around fortunately isn't very large).
Re: (Score:3)
Not to mention that on most 'desktop' or 'server' machines, the OS is constantly juggling hundreds or thousands of processes, so while an individual program may be single threaded, the operating system can be spread across all available processes. The hard thing is knowing, for an individual process and core, when it is worth switching context - shunting it off to wait for I/O and shoveling a different process onto that core - or just idling that core for a while. IIRC (from _long_ ago), I/O typically cos
Re: (Score:3)
But that was a part of the very concept of VLIW, which both Crusoe & Efficeon were. But those processors were somewhat more RISC than VLIW, except that their integer units were 128-bit and 256-bit, as opposed to 32-bit or 64-bit. Essentially, the idea here was that the bottom core would be constant, and any time there was an instruction set upgrade in a CPU from Intel or AMD, the Transmeta CPU would implement those new instructions in terms of their own native instructions, which would presumably eith
Re: (Score:2)
Actually Intel has recently returned to that. They now keep a small microinstruction cache of decoded instructions around so that loops can be executed more efficiently.
Re: (Score:2)
Fair enough, but they still choose to have all decoding done in hardware, so they still pay the (rather small) die-space penalty of a complex decoder.
Re: (Score:2)
Ahh, the good old days. This reminded me of the Motorola 6800's Halt and Catch Fire [wikipedia.org] instruction. :D Tight loops can be ... interesting.
Re: (Score:2)
The Transmeta chip was not a smash hit so probably not.
The really cool thing is that you will see ARM and X86 will share parts. GPU cores are a no brainer. Throwing in things like cache and memory controllers could be a big deal.
ARM sharing a socket with x86 will be really cool IMHO.
Re: Couldn't one core... (Score:2)
There are hardly any ARM CPUs or MCUs around that will ever get inserted in a socket. They are all mostly SMD chips.
Re: (Score:2)
AMD is targeting this at the server market. It is supposed to be socket compatible with the new X86 they working on. It is not strictly targeting the mobile market.
Re: (Score:2)
stumbling over progress (Score:2)
Double facepalm!! That's one version of the story. In other news, the day after the first Prius was available for sale, there was a global recall on internal combustion engines—the kind of recall where they don't give back.
The hump where protected mode starts to drive real productivity benefit is somewhere above a 486SX/25 with 8 MB of RAM and a 120 M
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
As the other poster said, you should have tried OS/2. I had pretty good multi-tasking on a 486DLC at 33Mhz on a good 386 board with 8MBs, localbus and a 120MB drive. It seemed to fly on a 486/100 with 32MBs, even X was fast and it was nice having 3 desktops (X, Win16 and WPS) running at once though you had only one displayed (actually Windows ran seamless with each program in its own session so when they crashed only the program died instead of the system).
Re: Couldn't one core... (Score:2)
I don't see how SSE is anything like it. Either you have a SSE or AVX unit or you don't. If you do, you use it exclusively. With a hybrid x86+ARM+GPU chip, you need to give work to at least all 3 of them, and it's nearly impossible to predict which unit will be the best for each task or even to schedule the damn thing dynamically.
Re: (Score:2)
It took a long time (ten years?) to get just a basic 32-bit protected mode operating system out to people at large after the hardware (80386) was out.
Ah, this is exactly one of the reasons why the MS OS/2 2.0 fiasco is one of my favorite topics!
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
As I recall, the shared registers were a real problem with MMX. It meant that there was a big latency cost as the chip switched between superscalar and traditional operating modes. It made for penalty that frequently negated the MMX performance benefit.
Re: (Score:3)
But unixisc, that's a solved problem. We don't write software in Assembly Languages anymore.
See, we can simply compile the program on the chip we want to use it on.
The problem is that humans are stupid. Languages at the Human interface level should never compile down into machine code. All languages should compile down into bytecode. You should NEVER distribute programs as binaries (that would be dumb). Then the hardware abstraction layer (your OS) can compile the bytecode INTO OPTIMIZED machine code f
Re: (Score:3)
Re: (Score:2)
Except that when you do this, you have the opportunity to effectively turn a hardware interpreter into a software compiler, reducing control logic (and its constant switching during code execution) and improving efficiency in the same way in which software compilers are better than software interpreters, even if the gap won't be nearly that wide. You can turn the same hardware interpreter into a hardware compiler, but then you have something like a trace cache and the logic has actually increased.
^^^^
that
doesn't support this,
Would the SW solution decrease performance per thread? Quite likely. Would it improve performance per watt, which is what will really matter in the future? Well, what if it will?
HSA (Score:2)
Did it say anywhere if they're going to juiced with HSA?
nanosecond latency to some bigass stream processors and no risk of memory-scribbling is too much to not want.
Been a long time since I cared (Score:2)
The last time I truly got excited about AMD was when the K6-2 came out. These days, I just wish AMD would put a focus on power consumption and high quality rather than simply trying to out-core Intel.
Re:Been a long time since I cared (Score:5, Insightful)
The last time I truly got excited about AMD was when the K6-2 came out.
What? During the P4 days AMD was ahead in almost every category in the benchmarks... did you miss that whole era? No denying the picture today is far less exciting, though.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
But that's only because Intel let the marketing department make engineering decisions and kept making chips with higher and higher clock frequency. As soon as they regained their sanity, they once again dominated the benchmarks.
I do love how AMD brilliantly capitalized on the blunder. By labeling their chips according to the clock speed of the performance equivalent Intel chip - every time Intel put insane engineering effort into ratcheting the clock up 10% and only getting 1% better performance, AMD simply
Re: (Score:2)
Yup, on the server side AMD was ahead from the first Opteron until Shanghai, and then Intel launch Nehalem and they've been ahead ever since. One the desktop Intel got competitive again with the Core2 but on a performance per $ metric it wasn't until Nehalem that they dominated.
Re: (Score:2)
On a performance per $ metric, AMD are arguably still competitive, at the expense of selling cheaply and barely breaking even financially. They are currently not competitive in performance per watt and absolute performance (both on the desktop, mobile looks a bit better).
AMD really fucked up with the Bulldozer, and while there have been modest improvements to that with Vishera and Steamroller, they were insufficient to close the gap to Intel.
Re: (Score:2)
I didn't say the K6-2 was the peak of AMD; just that it was the last time I really got excited about anything they came out with. AMD did some good stuff during the mid-2000's, but there were other computer upgrades that had more impact on performance -- particularly RAM. Those were the days when adding a stick of RAM was a legitimate means of being able to do amazing things like browse the internet while listening to music... at the same time!. Upgrading from 512 to 2GB was a huge boost in productivity.
Re: (Score:2)
"During the P4 days AMD was ahead in almost every category in the benchmarks"
It was ahead in many categories off the benchmarks too.
Like how quickly it heats your room up, and how much power it drained.
I had one but my god in the summer months did I wish I'd gone Intel as I was sat sweltering from the heat of that computer on top of the already high ambient temperature.
Re: (Score:2)
Certainly in power consumption and temperature :D Though it trailed far behind in performance.
Re: (Score:3)
Re: (Score:3)
I had an AMD486 80Mhz. It was cheaper than an i486 66Mhz and performed great. The Pentium had just come out at the time but was super expensive. I was able to find late model 486 board with PCI slots though and with the awesome value of the AMD chip was able to have a nice "budget" system for the time. It was even able to run Quake playably(a game which "required" the Pentium and it's baller FPU).
Re: (Score:2)
I had an AMD 486 DX4 - 120 MHz. It beat the pants off the contemporary Pentium processors.
Re: (Score:2)
I have a really hard time believing this, and would state that your memory does not serve you very well. A 33MHz 486 couldn't handle more complex scenes in DOOM, and definitely not in Quake. I gamed actively at the time when Quake came out, and recall that only much later, on a P233MMX, I could get an fps amount rivaling the screen refresh rate. Any 486 is so much behind that machine, that it's not even funny.
A low ID number username like you probably won't believe a brat like me, so here's some proof: http [youtube.com]
Re: (Score:2)
I have a really hard time believing this, and would state that your memory does not serve you very well. A 33MHz 486 couldn't handle more complex scenes in DOOM, and definitely not in Quake. I gamed actively at the time when Quake came out, and recall that only much later, on a P233MMX, I could get an fps amount rivaling the screen refresh rate. Any 486 is so much behind that machine, that it's not even funny.
A low ID number username like you probably won't believe a brat like me, so here's some proof: https://www.youtube.com/watch?... [youtube.com]
The fps is abysmal. The machine would need to be 10-20 times faster to reach a decent fps.
Nevermind what some random youtube link says, I ran quake reliably on a 486 dx4 100 (33x3 IIRC) with a 1MB trident graphics card and 8MB of RAM.
Re: (Score:2)
K6-2 was good, but the K6-III was much better. It was the first consumer-level CPU with on-die L2 cache. It scared Intel enough that they renamed the PII to PIII (because anything with a 3 in the name is clearly better than anything with a 2 in the name). The down side was that the K6-III overclocked for shit.
Serious Question (Score:4, Interesting)
Is AMD just around so Intel doesn't get bogged down by anti-monopoly or antitrust penalties?
Re: Serious Question (Score:4, Interesting)
64 cores per U, 80% intel performance per core, at 12% intel price.
Re: Serious Question (Score:5, Insightful)
Well, something of an oversimplification/exaggeration.
64 'cores' is 32 piledriver modules. That was a gamble that by and large did not pan out as hoped. For a lot of applications, you must consider those 32 cores. Intel is currently at 12 cores per package versus AMD's 8 per package. Intel is less frequently found with their EP line in a 4 socket configuration because the performance of dual socket can be much higher with Intel's QPI than 4 socket. AMD can't do that topology, so you might as well do 4 socket. Additionally, the memory architecture of Intel tends to cause more dimm slots to be put on a board. AMD's thermals are actually a bit worse than Intel's, so it's not that AMD can be reasonably crammed in but Intel cannot. The pricing disparity is something that Intel chooses at their discretion (their margin is obscene), so if Intel ever gets pressure, they could halve their margin and still be healthy margin-wise.
I'm hoping this lives up to the legacy of the K7 architecture. K7 architecture left Intel horribly embarrassed and took years to finally catch up with when they launched Nehalem. Bulldozer was a decent experiment and software tooling has improved utilization, but it's still rough. With Intel ahead in both microarchitecture and manufacturing process, AMD is currently left with 'budget' pricing out of desperation as their strategy. This is by no means something to dismiss, but it's certainly less exciting and perhaps not sustainable since their costs are in fact higher than Intel's cost (though Intel's R&D budget is gigantic to fuel that low-cost per-unit advantage, so the difference between gross margin between Intel and AMD is huge, but net margin isn't as drastic). If the bulldozer scheme had worked out well, it could have meant another era of AMD dominance, but it sadly didn't work as well in practice.
Re: (Score:2)
Intel is less frequently found with their EP line in a 4 socket configuration because the performance of dual socket can be much higher with Intel's QPI than 4 socket.
I've not heard of this before. Do you have a link? I'm guessing it's something about just corss connecting all QPI lines between two sockets rather than 4? Also, can't HT do that?
Intel quad sockets do seem less popular recently. I think AMD are most competitive on that end of servers, especially after memory prices came down so you can get a 5
Re: (Score:2)
Unfortunately I do not have a link. I do however know some system designers.
They designed a 4 socket Opteron system, and did not make a dual socket. It was peculiar to me so I asked why not a dual socket and they said there was no point in a dual socket because there was no performance advantage.
They also designed both a 4 socket EP system and a 2 socket EP system. I asked why and they said that they could gang up the two QPI links between two sockets for better performance.
I admittedly did not ask point
Re: (Score:2)
Somehow these days, I think it's yes. And I think Intel's lobbing customers AMD's way to ensure that AMD survives. E.g., the current generation of consoles now sport AMD processors. I'm sure Intel would be more than happy to have the business, but not only do they not need it, they see it as a way to give AMD much needed cash for the next few years.
Hell, I'm sure part of the whole Intel letting others use their fabs
Re: (Score:3)
Somehow these days, I think it's yes. And I think Intel's lobbing customers AMD's way to ensure that AMD survives. E.g., the current generation of consoles now sport AMD processors. I'm sure Intel would be more than happy to have the business, but not only do they not need it, they see it as a way to give AMD much needed cash for the next few years.
Consoles are primarily about graphics, not CPU power. While Intel's integrated graphics suck somewhat less than they used to, the PS4 has 1152 shaders backed by 8GB DDR5 and Intel has never had anything remotely close to that, maybe a third or quarter of that tops. An Intel CPU with AMD dedicated graphics would be very unlikely since AMD would almost certainly price it so their CPU/GPU combo came out better. So realistically it was AMD vs Intel+nVidia, neither of which like to sell themselves cheap. I don't
Re: (Score:3)
Well, in the *desktops*, core marked an end to AMD dominance in most practical terms, but architecturally they still were not very good for scalability. Basically, they turned back the clock to pentium iii on modern processes and that was enough to recover the desktop space.
Nehalem is the point at which Intel basically overtook AMD again and AMD has not come back since that point. So Intel's had the ball for 3 of their 'tocks'. AMD prior to K7 was pretty weak for a lot longer than that and I don't think
Re: (Score:2)
AMD, assets: $4.3bn, employees: ~10,000, profit: -$83million
Intel, assets: $92.4bn, employees: ~107,000, profit: $9.62bn
Intel is about ten times as big as AMD by every metric (except the negative profit metric - Intel actually makes $10bn profit a year, AMD is just losing money).
AMD is tiny, it's an irrelevance in the grand scheme of things. Pretending no one would notice Intel's demise whilst AMD will be around long after is comical. Anyway, AMD doesn't even make half the chips you're on about, that's comp
Best of luck to them (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
Re:Best of luck to them (Score:4, Insightful)
I don't get it. Do you, and just about everyone else who has posted in this discussion, only by chips that cost > $200? Because AMD is, and always has been, competitive with Intel in the sub $200 price range.
Sub $200 chips have, for a very long time, been very fine processors for the vast majority of desktop computer tasks. So for years now, if you're anything close to a mainstream computer user, there has been an AMD part competitive with an Intel part for your needs.
Of course, once you get to the high end, AMD cannot compete with Intel; but that's only a segment of the market, and it is, in fact, a much smaller segment than the sub $200 segment.
I personally have a Phenom II x6 that I got for $199 when they first came out (sometime in 2011 I believe) that was, at the time, better on price/performance than any Intel chip for my needs (mostly, parallel compiles of large software products) and absolutely sufficient for any nonintensive task, which is 99% of everything else I do besides compiling.
Anyway, if you only think of the > $200 segment, why stop there? I'm pretty sure that for > $10,000 there are CPUs made by IBM that Intel cannot possibly compete with.
Re: (Score:2)
From AMD's end, that's a critically important segment since it's where the most money is, and chip design and manufacturing are exceptionally expensive.
Re: (Score:2)
don't get it. Do you, and just about everyone else who has posted in this discussion, only by chips that cost > $200?
AMD are also competetive in the quad socket server end of things. Those CPUs cost more like $1000 a pop.
Re: (Score:2)
Re: (Score:2)
I don't know how much of a profit they're making on their APUs
Last quarter, they lost $3 million on CPU/APUs so in practice they're breaking even, but revenue is going down which means less and less goes to R&D. Their profits last quarter are a bit from dedicated graphics cards but mostly from console chips. Which is of course better than a loss, but consoles have a very special life cycle with high launch and Christmas sales with little in between so it's unclear how long that'll last.
Why are people designing cores? (Score:2)
It seems that it would be fertile territory for genetic algorithms to design the die. Sure, humans need to define the features, but run everything through a genetic algorithm, simulate and let the computer grow its own chips. Perhaps whole chips are not practical, but sub-processing units could do it.
Re: (Score:2)
Pretty sure that firing all of the hot shot CPU designers and having such algorithms design their CPUs for them is how they wound up with the Bulldozer fiasco.
Looky here. [xbitlabs.com]
Re: (Score:2)
Nonsense. Code does routing and floor planning, it doesn't design two-core modules.
Oh and in the current designs the automatic layout saved significant real estate and power compared to hand layouts.
The article you refer to is utter bullshit.
Re: (Score:3)
Re: (Score:2)
Specifically, they don't scale well to large problems, which is exactly the opposite of what we need to be able to automate the design of an entire core.
Well, that's why one should try it with small problems instead! The core I've mentioned above is barely VLSI by modern standards; it has something like 30k gates. Is this still above the limit you mention?
And RISC slowly rediscovers that CISC is better (Score:2)
Meanwhile... (Score:2)
Re: (Score:2, Troll)
Which is why consoles don't use AMD at all. Oh wait...
Re: (Score:3)
Re: (Score:2, Troll)
Excuse me for injecting a note of reality into your rant, but I thought consoles care about heat. Also, aren't "thermals" and "power efficiency" the same thing? Or does that get in the way of your rhetoric.
Re: (Score:2)
Which is why consoles don't use AMD at all. Oh wait...
So... intel paying astromods now?
amd needs pci-e 3.0 / faster HyperTransport (Score:2)
or at least give all CPU's 2-3 HT links so you can have 2 or more HT to chipset / HT to pci-e bridges on a 1 cpu board.
Re: (Score:2)
Re: (Score:3)
I'm still waiting for an upgrade to my AMD FX-6300. I bought it on the promise that there would be an upgrade. I've liked AMD for a long time, but getting burned on the first processor I buy from them is no way to keep customers.
So you've been here longer than I have (UID), liked AMD for a long time yet never bought one in the golden years from 1999 (launch of Athlon) - 2006 (Intel launching Core) or relative competitiveness up to 2010 (with Phenom II x6 still giving Intel a fair fight) but waited until October 2012 when they were clearly well into a decline? Pardon me but your story smells worse than shrimps left out in the sun for a week.
Re: (Score:2)
There's an upgrade, FX-9590.
Re:Steamroller/Excavator ??? (Score:4, Funny)
Re:Steamroller/Excavator ??? (Score:4, Insightful)
I still think Intel's business agreements in the mid 2000s that put AMD in its current position were immoral if not illegal, so I buy AMD anyway. But I don't buy because the product is better, I buy because the competition were assholes even though they're currently assholes with better products.