Intel Announces New Enterprise Xeons, More Powerful Xeon Phi Cards

Intel Announces New Enterprise Xeons, More Powerful Xeon Phi Cards 57

Posted by Unknown Lamer on Tuesday June 18, 2013 @03:07AM from the send-me-ten dept.

MojoKid writes "Intel announced a set of new enterprise products today aimed at furthering its strengths in the TOP500 supercomputing market. As of today, the Chinese Tiahne-2 supercomputer (aka Milky Way 2) is now the fastest supercomputer on the planet at roughly ~54PFLOPs. Intel is putting its own major push behind heterogeneous computing with the Tianhe-2. Each node contains two Ivy Bridge sockets and three Xeon Phi cards. Each node, therefore, contains 422.4GFLOP/s in Ivy Bridge performance — but 3.43TFLOPs/s worth of Xeon Phi. In addition, we'll see new Xeons based on this technology later this year, in the 22nm E5-2600 V2 family, with up to 12 cores. The new chips will be built on Ivy Bridge technology and will offer up to 12 cores / 24 threads. The new Xeons, however, aren't really the interesting part of the story. Today, Intel is adding cards to the current Xeon Phi lineup — the 7120P, 3120P, 3120A, and 5120D. The 3120P and 3120A are the same card — the 'P' is passively cooled, while the "A" integrates a fan. Both of these solutions have 57 CPUs and 6GB of RAM. Intel states that they offer ~1TFLOP of performance, which puts them on par with the 5110P that launched last year, but with slightly less memory and presumably a lower price point. At the top of the line, Intel is introducing the 7120P and 7120X — the 7120P comes with an integrated heat spreader, the 7120X doesn't. Clock speeds are higher on this card, it has 61 cores instead of 60, 16GB of GDDR5, and 352GBps of memory bandwidth. Customers who need lots of cores and not much RAM can opt for one of the cheaper 3100 cards, while the 7100 family allows for much greater data sets."

Intel Announces New Enterprise Xeons, More Powerful Xeon Phi Cards

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 57 Comments Log In/Create an Account

Comments Filter:

Programmers will be happy. (Score:5, Interesting)

by SuricouRaven ( 1897204 ) writes: on Tuesday June 18, 2013 @03:16AM (#44037019)

The x64 Phi cards are a lot easier to program then GPUs. No need to jump through hoops with memory mapping, keep things in sync for SIMD processing or worry about running out of stack space when doing recursion.

- Re:Programmers will be happy. (Score:4, Interesting)
  
  by Anonymous Coward writes: on Tuesday June 18, 2013 @03:36AM (#44037063)
  
  If you are an assembly junkie I guess you are right. But I rather prefer the implicitly vectorized CUDA programming model than having to use vector intrinsics by hand. If you want to avoid explicit data transfers take a look at (https://code.google.com/p/adsm/). Moreover, the performance of current Xeon Phi boards is not on par with Kepler GPUs. But, finally, NVIDIA is facing some competition.
  
  - Re: (Score:3)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by mwvdlee ( 775178 ) writes:
  
  How does the performance measure up to GPUs for TFLOPS/$$$?
  - Re: (Score:3)
    
    by pla ( 258480 ) writes:
    
    How does the performance measure up to GPUs for TFLOPS/$$$?
    
    If you need double precision FP, you don't have a lot of alternatives.
    
    If you only need single or half precision, the Radeon 7990 rates at 7x the TFlops for about 15% of the price.
- Re: (Score:2)
  
  by CSMoran ( 1577071 ) writes:
  
  Does Intel's MKL support the Phis out of the box? It would be very convenient if, instead of having to re-write code for them, we could just use phi-capable BLAS and LAPACK.
  - Re:Programmers will be happy. (Score:4, Informative)
    
    by dargaud ( 518470 ) writes: <slashdot2@@@gdargaud...net> on Tuesday June 18, 2013 @04:02AM (#44037137) Homepage
    
    Using Intel Math Kernel Library on Intel Xeon Phi Coprocessors [intel.com]
    
    - Re: (Score:2)
      
      by CSMoran ( 1577071 ) writes:
      
      Excellent. Thank you.
    - Re: (Score:2)
      
      by _merlin ( 160982 ) writes:
      
      The "no-work" option is only useful if the bulk of the time in your code is in well-known algorithms that are implemented in Intel's library. Even going up to the "minimal work with Intel compiler" approach will require you to wrangle vector intrinsics manually to take advantage of these cores.
      - Re: (Score:2)
        
        by CSMoran ( 1577071 ) writes:
        
        Yes, "well-known algorithms" is my use case -- massive LAPACK generalized diagonalizations that take forever on a single CPU, almost forever when threaded with openMP-capable BLAS to, say 8 cores, and do not scale at all to distributed-memory clusters (ScaLAPACK with MPI) because the comms becomes a bottleneck.
        
        Thus I'm hoping for a solution where the vendor themselves wrangles those intrinsics in their BLAS or LAPACK implementation in MKL with me oblivious to all that mess. Assuming the computation time s
        
        Re: (Score:2)
        
        by _merlin ( 160982 ) writes:
        
        Yeah, sure. I'm glad it works for your use case, and I'm sure it's great for a lot of others, too. Unfortunately it doesn't work for me - there will never be an off-the-shelf library for vol models developed in-house.
- - Re: (Score:2)
    
    by bill_mcgonigle ( 4333 ) * writes:
    
    which don't deal with mass amounts of data that exceeds the memory limits by orders of magnitude (which is roughly the point it becomes a mild pain)
    I was talking to an HPC friend this weekend at the ice cream parlor and he was telling me how their problem had no advantage on GPU processing because they were really memory-bound, not processing-bound.
    He has a quad-rate Infiniband going into each machine (40Gbps) and a couple CPU's, and keeps them saturated (say 5Gbps per core).
    Looking at TFA's expansion card,
- Some SIMD required (Score:2)
  
  by Ottibus ( 753944 ) writes:
  
  You won't get full performance from a Xeon Phi without using the SIMD instructions, so it is not as easy to program as you might hope.
  - Re: (Score:3)
    
    by robthebloke ( 1308483 ) writes:
    
    ispc [github.com], OpenCL [intel.com], and LLVM on the way [haskell.org]. Failing that, you could of course use C++ and AVX intrinsics (which would be a good choice if you already have a load of SSE4/AVX optimised code lying about).
    - - Re: (Score:3)
        
        by robthebloke ( 1308483 ) writes:
        
        struct vec3_FPU { float x, y, z; };
        struct vec3_SSE { __m128 x, y, z; };
        struct vec3_AVX { __m256 x, y, z; };
        struct vec3_PHI { __m512 x, y, z; };
        
        template<typename T>
        T add(const T& a, const T& b)
        {
        T r;
        r.x = add(a.x, b.x);
        r.y = add(a.y, b.y);
        r.z = add(a.z, b.z);
        return r;
        }
        
        Porting existing SSE4/AVX code to Phi is usually just a case of changing a typedef (or template type param), and overloading a bunch of low level functions (e.g. add, sub, etc). If it's not that simple
        
        Re: (Score:1)
        
        by Ottibus ( 753944 ) writes:
        
        This supports the argument that porting SSE/AVX code to Xeon Phi is easier than porting SSE/AVX code to GPU. It does little to support the original claim that "x64 Phi cards are a lot easier to program then GPU" which is more general and seems to be about original programming rather than porting.
        
        Re:Some SIMD requirede what I have available at wo (Score:2)
        
        by robthebloke ( 1308483 ) writes:
        
        Well, if you've got an NVidia card + XEON (which happens to be what I have available at work), then any newly written code is going to be in OpenCL or LLVM IR (via C++ or custom language). If you're going that route, any code you write will more or less work on Phi with little modification (although I have not got a Phi on which I can actually test my hypothesis here, so I may be talking BS!). So in theory at least, it won't be any harder to write code for Phi than for NVidia/AMD. The thing that appeals to
    - Re: (Score:1)
      
      by Ottibus ( 753944 ) writes:
      
      ispc, OpenCL, and LLVM on the way. Failing that, you could of course use C++ and AVX intrinsics (which would be a good choice if you already have a load of SSE4/AVX optimised code lying about).
      Having to use specialist languages like ispc to get performances does not support the claim that Xeon Phi is "a lot easier to program then GPUs". OpenCL is no easier to write on x64 than GPU and is arguably harder. And you certainly can't rely on LLVM (or any compiler) to turn your scalar code into high-performance optimised vector without a significant amount of work.
      So the original claim that "x64 Phi cards are a lot easier to program then GPUs" needs a lot more evidence before it will stand up.
- Re: (Score:3)
  
  by JanneM ( 7445 ) writes:
  
  Here's a preliminary "best practice" guide: http://www.prace-project.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML?lang=en [prace-project.eu]
  Seems OpenMP and openMPI are both available, so typical hybrid systems should at least run out of the box, though you'll of course need a fair bit of tuning to make full use of the thing. It should be less work than adapting a system for running on a GPU though.
- Re: (Score:1)
  
  by Steve_Ussler ( 2941703 ) writes:
  
  Whatever happened to AMD?
I do a lot of CGI Rendering (Score:1)

by Silpher ( 1379267 ) writes:

Will this be interresting for me? Price/value wise?
- Re: (Score:2)
  
  by bill_mcgonigle ( 4333 ) * writes:
  
  I was expecting 32 cores minimum in desktop CPUs by the start of this decade. All this new supercomputer stuff is well and good, but what about lots of cores for us mere mortals too?
  You wouldn't like the speed of typical software on a 32-core CPU using the same transistor count (i.e. at the same cost) of the machine you're running now.
  Cache sharing, NUMA access, etc. turn out to be tricky to get fast, right, and cheap. In the meantime, much of the existing software library can't even properly take advantag
- - Re: (Score:1)
    
    by Fyzzler ( 1058716 ) writes:
    
    Uh... that's what these Xeon Phi cards are. Lots of cores. FYI, that 80-core research chip wasn't x86.
    Actually larabee was exactly 80 486DX cores on one die. They just couldn't figure out how to get them to do useful work (They were thinking graphics processing of all things). So they rethought their approach and canceled that project.
Why bother? (Score:2)

by pla ( 258480 ) writes:

In addition, we'll see new Xeons based on this technology later this year, in the 22nm E5-2600 V2 family, with up to 12 cores.

...And yet, because of corporate policies on running the shittiest AV on the planet (Symantec) cranked to the max, my desktop PC will still have the responsiveness of a sloth on 'luudes.

Seriously, I already have 8 cores worth of Xeon (2x4) and the load meter never even twitches, enough RAM to load my entire system drive into, and an SSD system drive. More cores won't help at t
- - Re: (Score:3)
    
    by Muad'Dave ( 255648 ) writes:
    
    Here's a nickel, kid, go buy yourself a better OS.
    Here's the best part - after 'buying' that better OS [linux.org], you'll still have the nickel!
- - Re: (Score:2)
    
    by NatasRevol ( 731260 ) writes:
    
    Because that's a mid range machine these days.
    - no it's not (Score:2)
      
      by Chirs ( 87576 ) writes:
      
      An 8-core Xeon (not i7) is not a mid-range desktop. Nor is "enough RAM to load my entire system drive into", or an SSD system drive.
      Even now, those are all higher-end in the general scheme of things. More common on enthusiast machines, sure, but far from "mid-range" in a business system.
How many "Intel Inside" stickers on Tianhe-2? (Score:5, Funny)

by elwinc ( 663074 ) writes: on Tuesday June 18, 2013 @07:25AM (#44037757)

How many "Intel Inside" stickers will they be posting on Tianhe-2? I can see an a argument for a mere 16000 - one per node; 32000 - one per Ivy Bridge chip; and 80000 - one per Intel core carrying chip. But I think Intel's marketing dept should hold out for 3.12 million stickers - one per core!
It's too bad Thinking Machines Incorporated never had a sticker policy, because the "Fat Tree" routing topology is straight out of TMI (the prior TMI topology, hypercube, didn't allow the customer as much choice to balance cores vs interconnect).

It's a gas! (Score:4, Funny)

by Impy the Impiuos Imp ( 442658 ) writes: on Tuesday June 18, 2013 @08:44AM (#44038191) Journal

Xeon, Itanium. I think I've figured out the real genius at Intel.
1. Pick a cool element.
2. Remove a letter.
3. ?????
4. Profit!!!
2015 Arbon
2018 Heliu
2023 Litium
2024 Silion
2026 Eon

- nah (Score:2)
  
  by nten ( 709128 ) writes:
  
  2015 Ron
  2018 Aluminum
  - Re: (Score:2)
    
    by ColdWetDog ( 752185 ) writes:
    
    2013 Old
- Re: (Score:3)
  
  by elwinc ( 663074 ) writes:
  
  Nope.
  AltiVec was Motorola's 1999 SIMD instructions & hardware, a response to the SIMD instructions & hardware released by AMD in 1998 (AMD called theirs 3DNow!). Intel also released SIMD instructions & hardware in 1999, called SSE. 3DNow!, AltiVec & SSE were all 128 bit wide pipes that could handle 4 single precision floating point operations simultaneously in parallel. Some of them may have also been able to do two double precision floats also (not AltiVec though), and they all did va
12 core Xeon (Score:2)

by Plumpaquatsch ( 2701653 ) writes:

Where did I hear that before [apple.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Intel Announces New Enterprise Xeons, More Powerful Xeon Phi Cards 57

Intel Announces New Enterprise Xeons, More Powerful Xeon Phi Cards More Login

Intel Announces New Enterprise Xeons, More Powerful Xeon Phi Cards

Programmers will be happy. (Score:5, Interesting)

Re:Programmers will be happy. (Score:4, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re:Programmers will be happy. (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Some SIMD required (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:1)

Re:Some SIMD requirede what I have available at wo (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

I do a lot of CGI Rendering (Score:1)

Re: (Score:2)

Re: (Score:1)

Why bother? (Score:2)

Re: (Score:3)

Re: (Score:2)

no it's not (Score:2)

How many "Intel Inside" stickers on Tianhe-2? (Score:5, Funny)

It's a gas! (Score:4, Funny)

nah (Score:2)

Re: (Score:2)

Re: (Score:3)

12 core Xeon (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot