Why does Mike Pound measure his computer's computational ability by its graphics cards, and not its processors?

Question

I was recently watching a great Computerphile video on passwords in which Mike Pound brags of his company's supercomputer having 4 graphics cards (Titan X's, to be exact).

As a numerical simulation enthusiast, I dream of building a desktop solely for simulation work. Why does Mike Pound measure his computer's computational ability by its graphics cards and not its processors? If I were building a computer, which item should I care about more?

see Gorilla vs. Shark -- "if you... don't want your question to get instantly closed... - try to keep Gorilla vs. Shark in mind." — gnat, Oct 05 '17 at 04:33
I don't think this is necessarily a Gorilla vs. Shark question... There's a simple question: "Why does Mike Pound measure his computer's computational ability by its graphics cards, and not its processors?" which can be answered and its answer has constructive value for future readers. — Maybe_Factor, Oct 05 '17 at 04:50
@gnat: not even close. Of course, the question, in its current form, is not really about software engineering. But I guess it could be interpreted as a question about system's engineering, where system = "combination of hardware + software". — Doc Brown, Oct 05 '17 at 06:27
Learn more about OpenCL and you'll understand for what class of problems and programs GPGPUs can be useful. — Basile Starynkevitch, Oct 05 '17 at 07:12
A computer with 4 graphics cards does not amount to a supercomputer (and neither does a cluster of 10 Raspberry Pis for that matter). — Matti Virkkunen, Oct 05 '17 at 08:59
That's just a very expensive PC setup, not a supercomputer... — Bakuriu, Oct 05 '17 at 10:14
@Bakuriu: I'm only in my early 20s, but I still think it's insane that you're correct in saying that nowadays. — tonysdg, Oct 05 '17 at 14:23
Isn't the simple answer to "Why does Mike Pound measure his computer's computational ability by its graphics cards" because the context is password cracking? If you problem space is something else, what you need to care about might be something else entirely. — JimmyJames, Oct 05 '17 at 16:46
Since, I don't want to go into technical details, I'll just make this comment. Consider what main purpose most graphic cards are designed for: playing games. All games are basically highly complex simulations. Its not surprising then that graphic cards would be the main factor for calculating computation power in certain cases. — John Smith, Oct 05 '17 at 18:33
It's a supercomputer by 1990s or early 2000s standards. I can't view the video currently but was he using "supercomputer" in that sense? (In that sense my cellphone is also a supercomputer) — user253751, Oct 05 '17 at 20:49

score 32 · Accepted Answer · answered Oct 05 '17 at 04:59

32

Mike Pound obviously values the computational ability of the graphics cards higher than the computational ability of the CPUs.

Why? A graphics card is basically made up of MANY simplified processors which all run in parallel. For some simulation work, alot of the computation can be easily parallelised and processed in parallel on the thousands of cores available in the graphics cards, reducing the total processing time.

which item should I care about more? It really depends on the workload you care about, and how that workload can/is parallelised for use on a graphics card. If your workload is an embarrassingly parallel set of simple computations, and the software is written to take advantage of available graphics cards, then more graphics cards will have a far greater performance impact than more CPUs (dollar for dollar).

answered Oct 05 '17 at 04:59

Maybe_Factor

1,391

5

Adding some numbers. Let's say your main computer would be an AMD Epyc Server, 64 cores, 128 with Hyperthreading. Let's also say that a graphics card "core" is only 10% as fast. ONE TitanX still has 3072 cuda cores, roughly 12000 for the setup. Get the idea? IF you can run the problem on the graphics card, it is not "faster" - it is like comparing the speed of a horse carriage to a formula 1 car. – TomTom Oct 05 '17 at 11:00
3

+1 for 'embarrassingly parallel set of simple computations', Very well written. Short and to the point. – Michael Viktor Starberg Oct 05 '17 at 11:24
11

@TomTom: Actually my preferred comparison is comparing a formula 1 car (your CPU) with a bullet train. Sure, the train and the car is approximately the same speed. But the train can move 1000 people from A to B faster than the formula 1 car. – slebetman Oct 05 '17 at 12:18
2

@slebetman the point is the CPU is typically much faster in single-core performance (not approximately the same speed). Maybe we can compromise, and compare a supersonic jet airplane with a steam locomotive. – Darren Ringer Oct 05 '17 at 13:19
2

If I have to choose an analogy based on vehicle, I'd say the CPU is like a fighter jet (it's much faster for point-to-point transport and have many tricks up its sleeve that other vehicles can't, but can only carry very small load) while the GPU is like a cargo ship (it can carry significantly more load in parallel, but have much slower turnaround). – Lie Ryan Oct 05 '17 at 18:19
1

@MichaelViktorStarberg, "embarrassingly parallel" is a term of art -- it's not something coined here. – Charles Duffy Oct 05 '17 at 19:55
@DarrenRinger The original analogy compared PC class CPUs with supercomputers and the comparison was a formula 1 car with a cargo ship. Supercomputers are really bad at doing real-time simulations like playing 3D games but are good at doing batch jobs. The comparison with GPU isn't as bad because both CPUs and GPUs are designed for real-time processing. Which is why I changed the comparison to a bullet train. GPUs may not be as good as your CPU at single core performance but it is not THAT bad – slebetman Oct 05 '17 at 22:55

score 5 · Answer 2 · edited Oct 05 '17 at 18:49

5

Check out https://developer.nvidia.com/cuda-zone (and google cuda nvidia for lots more info). The cuda architecture and high-end graphics cards are pretty widely used for desktop supercomputers. You can typically put together a several-Tflop box for under $10K(usd) using off-the-shelf whitebox components.

So...

As a numerical simulation enthusiast, I dream of building a desktop solely for simulation work

...cuda's pretty much far-and-away the best game in town for you. Maybe try asking again in https://scicomp.stackexchange.com/ or another stackexchange website, more directly involved with this kind of thing.

(By the way, I assume you're comfortable with the idea that we're talking about massively parallel programming here, so you may need to get familiar with that paradigm for algorithm design.)

edited Oct 05 '17 at 18:49

Ismael Miguel

105

answered Oct 05 '17 at 10:52

John Forkosh

821

And we are back to Ordos as usual. – Michael Viktor Starberg Oct 05 '17 at 11:48
2

@MichaelViktorStarberg Am I the only one not understanding the Ordos reference? – MarnixKlooster ReinstateMonica Oct 05 '17 at 15:45
I'm afraid you are... :/ – Ismael Miguel Oct 05 '17 at 16:06
4

@MarnixKlooster: I had to Google "Ordos." Not sure what a "ghost city" in China has to do with supercomputers or teraflops. – Robert Harvey Oct 05 '17 at 16:27
@MarnixKlooster You indeed are not. – jpmc26 Oct 05 '17 at 18:38

score 2 · Answer 3 · answered Oct 05 '17 at 15:57

If I was building a computer, which item should I care about more?

From a practical standpoint you should probably pay quite a bit of attention to the motherboard and CPU given the relative difficulty of upgrading compared to the GPU. After purchase is an awful time to discover you don't have space for four GPUs or a fast enough processor to keep them all busy.

You should also be aware that GPU performance is most often reported in single-precision FLOPs, and drops quite a bit for double precision. If you need the extra precision in your simulations you'll end up well below the advertised speed.

Off to the software engineering races

There are really two primary concerns from a software standpoint, the Von Neumann bottleneck and programming model. The CPU has fairly good access to main memory, the GPU has a large amount of faster memory onboard. It's not unknown that the time moving data in and out of the GPU completely negates any speed win. In general the CPU is a winner for moderate computation on large amounts of data while the GPU excels at heavy computation on smaller amounts. All of which brings us to the programming model.

At a high level the problem is the ancient and honored MIMD/SIMD debate. Multiple-Instruction/Multiple-Data systems have been the big winners in general and commercial computing. In this model, which includes the SMP, there multiple processors each executing their own individual instruction stream. It's the computer equivalent of a French kitchen, where you direct a small number of skilled cooks to complete relatively complicated tasks.

Single-Instruction/Multiple-Data systems, on the other hand, more closely resemble a huge room full of clerks chained to their desks following instructions from a master controller. "Everybody ADD lines 3 and 5!" It was used in its pure form in the ILLIAC and some "mini-super" systems but lost out in the marketplace. Current GPUs are a close cousin, they're more flexible but share the same general philosophy.

To sum up briefly:

For any given operation the CPU will be faster, while the GPU can perform many simultaneously. The difference is most apparent with 64-bit floats.
CPU cores can operate on any memory address, data for the GPU must be packaged into a smaller area. You only win if you're doing enough computations to offset the transfer time.
Code heavy in conditionals will typically be happier on the CPU.

Why does Mike Pound measure his computer's computational ability by its graphics cards, and not its processors?

3 Answers3