Can OpenGL And OpenCL Overhaul Your Photo Editing Experience

Transcription

Can OpenGL And OpenCL Overhaul Your Photo Editing Experience
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience?
http://www.tomshardware.com/reviews/photoshop-cs6-gimp-aftershot-pro,3208.html
12:00 AM - June 11, 2012 by William Van Winkle
Source: Tom's Hardware US
Table of content
1 - Fast Action Behind Still Photos
2 - Q&A: Under The Hood With AMD
3 - Q&A: Under The Hood With AMD, Cont.
4 - Test Platforms
5 - Applications: GIMP, AfterShot Pro, And Musemage
6 - Applications: Adobe Photoshop CS6
7 - Q&A: Under The Hood With Adobe
8 - Q&A: Under The Hood With Adobe, Cont.
9 - Q&A: Under The Hood With Adobe, Cont.
10 - Benchmark Results: GIMP
11 - Benchmark Results: AfterShot Pro
12 - Benchmark Results: Musemage
13 - Benchmark Results: Photoshop CS6
14 - The Picture Is Changing
Fast Action Behind Still Photos
The processing loads common in video editing are well known; it doesn’t take more than a couple of 1080p tracks and a filter or two to soak up 100% of CPU
resources in many systems. However, not as many people appreciate the significant compute burden imposed by modern digital photography workloads.
Adding a sepia filter to an eight-megapixel (MP) image may be no big deal, but how about a complex blur to an 18 MP RAW image?
Even if your editing tasks don't swamp your CPU, they can still take a significant amount of time to execute, especially for multi-image batch jobs. More time
means more waiting. For professionals, that translates to a loss of income. And pauses in your workflow prevent you from operating at the pace of your
creativity, stopping you dead in your tracks. You want to edit as the creative options stream through your mind.
The object of the game, of course, is to devise new ways of getting more processing work done in less time. In two prior articles, we examined how modern
GPUs, including those embedded within CPUs and APUs (compute engines with on-die graphics capabilities), can be leveraged by industry standard APIs to
accelerate highly parallel operations within video post-processing and games. The same is now increasingly true within the world of photo editing.
Adobe, for example, has a long and impressive history of adopting hardware technologies that lend themselves well to accelerating media operations. The
company has used OpenGL to accelerate certain functions through the GPU within Photoshop since version CS4, and such use has expanded with each
subsequent release. Now, with the introduction of Photoshop CS6, the program opens its doors to OpenCL as a means to achieve even broader GPU-based
acceleration.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 1 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Adobe is not alone. An intriguing but little-known photo workflow tool called Musemage now leverages OpenGL and OpenCL throughout its many workflow
stages. The ever-popular open source photo editor GIMP now similarly employs these APIs in several areas. We also acquired a pre-release version of Corel’s
AfterShot Pro, tweaked by Corel to bring early OpenCL support online specifically for our testing here.
We will test each of these applications across five system configurations and look for patterns in the results. How much does open standards-based GPU
acceleration really help in these image-oriented tasks? Is there a difference in how much APUs and GPUs leverage accelerated features? Does acceleration
scale evenly with graphics horsepower? Let’s find out.
Q&A: Under The Hood With AMD
One of our core objectives in this series on heterogeneous computing is to get a better understanding of some of the decisions surrounding OpenCL and
DirectCompute. Why would a vendor choose to utilize them instead of other APIs, such as OpenGL or DirectX? What are these programming interfaces doing
with data behind the scenes? What are their limits and how much untapped potential do they leave on the table?
Those are the questions you don’t see answered in marketing materials. Fortunately, we were able to corner two excellent authorities for this article and start
gathering some answers. First up is Alex Lyashevsky, a performance application engineer at AMD and a senior member of the technical staff brought in from
AMD’s acquisition of ATI in 2006. Lyashevsky is no talking head from marketing. He holds patents on parallel lossless image compression and the world’s first
GPU-based H.264 decoder. Few people understand GPGPU computing as well as Lyashevsky, and fewer still can discuss it in the same breath as OpenCL
acceleration.
Tom's Hardware: Photoshop CS6 is our headliner benchmarking app for this article, and Photoshop is no stranger to OpenGL. So why are we now getting
OpenCL added into the mix?
Alex Lyashevsky: OpenGL is pretty widely used, and it actually has many of the same compute capabilities as OpenCL. However, OpenGL is targeted more
towards graphics. When you run OpenGL, you usually assume there is some kind of an image or buffer you are trying to draw upon. OpenCL actually provides
much more of a generic programming platform, more in the sense of computational domain. You can have an absolutely free way of defining your own
computational domain instead of being attached to some kind of image or two-dimensional, pixel-based guestimation. Other than that, frankly, I sometimes
encourage people to use OpenGL, because it has very good hardware-supported input buffer filtering, for example, and very efficient color buffer composting
on output.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 2 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Tom's Hardware: For developers, is there a significant difference between the two APIs in coding?
Alex Lyashevsky: Programming the OpenGL shader language is a difficult thing to get on top of. OpenCL may be a bit easier for developers. You see, OpenGL
assumes that you have to set up some graphic context, meaning you have to set up viewpoint, model matrix transformations, and so on. OpenGL is a
graphical language, and this works well for some types of operations where graphics are related to the computation problem. But from a general
programmer’s point of view, OpenGL is kind of nonsense. If they want to do data manipulation, why should they set up a triangle, viewpoint, or matrix? A
more general way to program the GPU, which is enabled by OpenCL, is necessary for more widespread adoption. For example, it’s probably not very useful to
use OpenGL to accelerate something like deflate and encryption in compression apps, but it is probably useful for image processing apps.
Q&A: Under The Hood With AMD, Cont.
Tom's Hardware: How much enthusiasm have you found in the developer community? Are they coming to you saying, “Help us out—get us running with
OpenCL”?
Alex Lyashevsky: Well, there is no resistance. Frankly, using something like OpenCL is not easy, but I try to break the psychological barrier. People start using
OpenCL and often, because it is cumbersome, they won’t get the performance they expect from the marketing claims, so they assume it doesn’t work. Users
think it’s a marketing ploy and they won’t use it. My goal is to break the barriers and show that there is a lot of benefit. How to do that is another matter. The
problem is that it is not exactly C. It’s not that it’s quite different from C; it’s the problem of mindset. You have to understand that you are really dealing with
massively parallel problems. It’s more a problem of people understanding how to run parallel 32 or 64 synchronous threads, and this prevents wider, easier
adoption.
From there, you probably know there are architectural problems. There are system-wide performance concerns, because we need to move data from the CPU
or system sides, or “system heaps” as we say, to the video heap to get the best performance. That movement, first of all, causes a problem and resistance.
Second, on a system level, it decreases performance because you need to move the data. The question is how efficiently you move it. That’s what we explain
how to do, even without optimal efficiency. The future will have a fully unified memory as part of HSA, and it’s physically unified on our APUs, but not unified
on our discrete offerings. So, to get the best performance from our device, you have to use specialized memory or a specialized bus. This is another piece of
information that people miss when they start developing with OpenCL. People come with some simplified assumption and feel that they cannot or should not
differentiate CPU architectures from GPU architectures. And fortunately or unfortunately—it’s difficult to say—while OpenCL is guaranteed to work on both
sides, there is no guarantee that CPU and GPU performance will be equal on both sides. You have to program knowing that you are coding for the GPU, not
the CPU, to make the best use of OpenCL on the GPU.
Tom's Hardware: In a nutshell, what is your mission? What message mustget through to developers?
Alex Lyashevsky: First of all, you must understand that you have to move data. Second, understand that your programming must massively parallelize the
data. And third, almost the same as the second, you have to understand that optimization is achieved through parallel processing. You have to understand the
architecture you are programming for. That’s the purpose of the help we are providing to developers, and it’s starting to really pay off. We have surveys of
developers in multiple geographies showing that OpenCL is gaining a lot of momentum.
Tom's Hardware: Let’s talk about APU in the context of mobile computing and power. Have the rules of power-savings algorithms changed? Does an APU
throttle differently depending on AC or battery power?
Alex Lyashevsky: On CPU, I’m afraid this is the same thing. The memory and power management system is very sophisticated, and sometimes it depends on
the grading system. The grading system has most of the rights to decrease activity. But our GPU is pretty adaptive, and its activity sometimes even affects
our performance management. If you don’t put reasonable load pressure on our hardware, it will try to be as low-power as possible. It basically slows itself
down when it doesn’t have enough work to do. So yeah, it is adaptive. I couldn’t say that it’s absolutely great, but we try to make it as power-efficient as
possible.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 3 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Test Platforms
As in our prior stories, we used the following test configurations:
Test Hardware
Test System 1
Processor
AMD FX-8150 (Zambezi): 3.6 GHz, Socket AM3+, 8 MB Shared L3 Cache, Turbo Core enabled, 125 W
Motherboard
Asus Crosshair V Formula (Socket AM3+), AMD 990FX/SB950
Memory
8 GB (2 x 4 GB) AMD Performance Memory AE34G1609U2 (1600 MT/s, 8-9-8-24)
SSD
240 GB Patriot Wildfire SATA 6Gb/s
Graphics
AMD Radeon HD 7970 3 GB
AMD Radeon HD 5870 1 GB
Power Supply
PC Power & Cooling Turbo-Cool 860 W
Operating SystemWindows 7 Professional, 64-bit
Test System 2
Processor
AMD A8-3850 (Llano) 2.9 GHz, Socket FM1, 4 MB L2 Cache, 100 W, Radeon HD 6550D Graphics
Motherboard
Gigabyte A75-UD4H (Socket FM1), AMD A75 FCH
Memory
8 GB (2 x 4 GB) AMD Performance Memory AE34G1609U2 (1600 MT/s, 8-9-8-24)
SSD
240 GB Patriot Wildfire SATA 6Gb/s
Graphics
AMD Radeon HD 7970 3 GB
AMD Radeon HD 5870 1 GB
Power Supply
PC Power & Cooling Turbo-Cool 860 W
Operating SystemWindows 7 Professional, 64-bit
Test System 3: Gateway NV55S05u
Processor
AMD A8-3500M (Llano), 1.5 GHz, Socket FS1, 4 MB L2 Cache, 35 W, Radeon HD 6620G Graphics
Motherboard
Gateway SJV50-SB
Memory
1 x 2 GB Hyundai HMT325S6BFR8C-H9 PC3-10700 (667 MHz), 1 x 4 GB Elpida EBJ41UF8BCS0-DJ-F PC3-10700 (667 MHz)
SSD
Western Digital Scorpio Blue 640 GB, 5400 RPM, 8 MB Cache, SATA 3Gb/s
Graphics
AMD Radeon 6220G Integrated
Operating SystemWindows 7 Home Premium, 64-bit
Test System 4: HP Pavillion dv6
Processor
Intel Core i5-2410M (Sandy Bridge), 2.3 GHz, Socket G2, 3 MB Shared L3 Cache, 35 W, HD Graphics 3000
Motherboard
Hewlett-Packard 1658
Memory
4 GB (2 x 2 GB) Samsung M471B5773CHS-CH9 PC PC3-10700 (667 MHz)
SSD
Seagate Momentus 7200.4 500 GB, 7200 RPM, 16 MB Cache, SATA 3Gb/s
Graphics
Intel HD Graphics 3000
Operating SystemWindows 7 Professional, 64-bit
With luck, we will be adding a new Trinity (next-gen AMD APU)-based notebook to the above heterogeneous compute platforms. Unfortunately, review units
weren’t available in time for our testing here.
Applications: GIMP, AfterShot Pro, And Musemage
For this article, we tested with four applications: GIMP, Corel AfterShot Pro, Musemage, and Adobe Photoshop CS6.
GIMP (GNU Image Manipulation Program), an open source project since 1995, is pretty much the go-to app for anyone who doesn’t want to pay for an
image/graphics editor. The title is brimming with advanced features ranging from channels and paths to animation and pattern tools. While we tested under
Windows, there are versions for several operating systems, including Mac OS X, Linux, FreeBSD, and even AmigaOS 4. Specifically, we tested with the version
2.8 build released on April 2, 2012, that incorporated OpenCL support for 19 filters. We accessed three of these filters (Gaussian blur, bilateral, and motion
blur) through the Generic Graphics Library (GEGL) menu list that first started appearing in GIMP 2.6. AMD described GEGL to us like so:
“GEGL is a floating-point-based processing pipeline that will be the foundation for the next upcoming major release of GIMP. GEGL requires more
computational power than the baseline GIMP pipeline, which is 8-bit-based. While the computation requirements are high, floating-point does provide
flexibilities that 8-bit processing just can’t match. Since GEGL is going to be the future of GIMP processing, we have focused our OpenCL work with GIMP on
accelerating the GEGL pipeline (as opposed to the baseline GIMP pipeline). As such, only GEGL operations will experience OpenCL acceleration. GEGL is being
integrated piece by piece into GIMP, and that’s why you see special menus for GEGL operation in this build and in the near future, until it’s fully integrated in
GIMP, at which point there will not be a special menu for GEGL since everything will be GEGL.”
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 4 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
In speaking with GIMP/GEGL developer Victor Oliveira, we gained an interesting insight. OpenGL is inherently made for graphics processing, and I’d long
assumed that OpenCL was much the same, only aimed at somewhat different graphical tasks. However, it turns out that the API is more robust and flexible
than most people appreciate.
“OpenCL not only gives GPU acceleration, but we can also use OpenCL in the CPU to provide good multi-threading, which GIMP lacked, and vectorization
support,” says Oliveira. “Notice that GIMP's audience is very heterogeneous, so if we want to, for example, support the AVX instruction set in our code, we
would have to generate two builds, because it wouldn't work on older machines, or detect it on runtime. Either option is bad. With OpenCL, we can do that
while distributing just one build, it's a really interesting technology.
Our GIMP build has command line options for running with or without OpenCL support, as well as displaying a debug window that shows benchmarking
results. For a test file, we used a 4096x2048 30 MB bitmap image.
Corel’s AfterShot Pro is a non-destructive photo workflow application. From a technical perspective, this means that every time an image is opened, edited, or
output, ASP reapplies all of the image processing, starting at the very earliest step of decoding the image content all the way though rendering the final image
on-screen. Throughout this process, no data is eliminated. A better metaphor may be to say that changes are stacked, and users can modify any change
made within that stack. Of course, if that stack is flattened or merged, then the non-destructive workflow vanishes.
We obtained a special preview build of ASP from Corel that implements OpenCL in such a way that it helps accelerate file conversions. So we gathered a batch
of 50 RAW images, each measuring 6048x4032 and roughly 37 MB, and used ASP to batch convert them into JPG with and without OpenCL assistance.
Musemage is a newer and lesser-known photo editor deserving of a larger audience. As we hear from AMD, this application was the work of several engineers
in China, which explains a lot. Technically, the app is a marvel of utility and convenience for photo manipulation, especially for batch editing, but it seems to
have received practically no marketing here in America. This is unfortunate because, unlike so many editing tools that have bolted-on GPU-based acceleration
as an iterative afterthought, Musemage was built from the ground up with such acceleration.
In our Musemage batch test, we used eight JPEG images supplied by JLucasPhoto.com, each of 10 to 12 MP, totaling 35.4 MB of data. Musemage hosts
dozens of adjustments, color effects, lens effects, distortions, resizing options, and so on. During the batch run, we applied eleven of these processes onto
each image: Auto Contrast, Auto White Balance, Gaussian Blur (4.0 pixels), Color Denoising (0.8 pixel), Negative Film (highlight 50, shadow 50), Advanced
Defog (threshold 0.10, strength 0.65), Vignetting (FoV angle 45), Soften Skin (radius 3.3 pixels, strength 0.05, whiten 0.02), Horizontal Flip, Resize (150%
fixed width and height), and Add Text (40% opacity).
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 5 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Applications: Adobe Photoshop CS6
One common definition of heterogeneous computing is the integration of graphics capabilities on the same die as the CPU (central processing unit). This isn’t
quite how Adobe uses the term. Rather, Adobe takes a more systemic view, looking for all the ways in which its software can leverage computing resources,
and then discovering which areas of the system best provide those resources. As the company told us, heterogeneous computing is “multiple instruction sets
and types of code in the same computer, and utilizing all those resources to give the users a better experience.”
The new Photoshop CS6 illustrates this very well. From its years-ago roots as a multi-threading pioneer to its just-added entrance into OpenCL, Photoshop
looks for ways to accelerate features wherever they can be feasibly found. The object of the game is to never delay the user’s experience. Ideally, filters
should apply in real-time, and a system should handle 30 layers on a 30-megapixel image with ease. Traditionally, we would have counted on CPU advances
to make this dream a reality, but as we've seen from both AMD and Intel lately, expecting major leaps forward in processor performance will likely set you up
for a bad time. Today, progress is being facilitated through the integration and utilization of many different subsystems (including the GPU), faster pipes,
more parallelism, and so on.
Because different software features and disparate workloads benefit from a variety of hardware design decisions, it’s impossible to have a complete discussion
of GPU-based acceleration without examining the ways speed-ups are achieved. Photoshop CS6 introduces support for OpenCL in its Blur Gallery filter tools.
Meanwhile, Photoshop continues to build on its OpenGL capabilities by accelerating many functions, including Liquify, adaptive wide angle, transform, warp,
puppet warp, lighting effects, and pretty much every 3D feature (shadows, vanishing point, sketch rendering, etc.), except the ray tracer. Additionally,
OpenGL powers the oil paint filter, while new features like background saving are threaded, which is really handy on very large jobs.
Despite Adobe being a large company and Photoshop being one of the industry’s most renowned applications, the Photoshop team is surprisingly small: fewer
than 60 software and quality engineers. In our discussions, Adobe quipped at one point, “Throwing more bodies at a problem doesn't always make software
better.” Especially given this relatively small head count, it’s doubly remarkable that Adobe has tightened its revision times from 18 to 24 months down to 12.
Just keep in mind that faster revision cycles will likely mean a slower apparent adoption of a given new technology from version to version.
Adobe provided us with two scripts for testing its latest OpenCL and OpenGL performance. In the first case, the choice was inevitable: Blur Gallery, the only
family of OpenCL-accelerated filters in the application. We ran four iterations of a general blur effect being applied to a 60.2 MB, 5615x3744 PSD file. Tests
were either RGB or CMYK, and they applied a blur value of either 25 or 300, giving us four tests per blur group. The script ran the effect seven times and
output a CSV file, from which we pulled the average time to complete the filter effect.
Adobe supplied a very similar script for its OpenGL-based Liquify filter, which had no varying parameters.
Note that we controlled GPU acceleration through Photoshop’s Preferences > Performance > Graphics Processor Settings options. In this area, unchecking the
Use Graphics Processor box will disable all OpenGL and OpenCL acceleration, which turned out to be the only way to get accurate results for non-accelerated
tests. The Advanced Settings button will spawn a pop-up with check boxes for OpenGL (unintuitively called Use Graphics Processor to Accelerate Computation)
and OpenCL. Apparently, unchecking both of these still leaves OpenGL enabled.
Q&A: Under The Hood With Adobe
No vendor commands more respect or has a longer pedigree in the photo editing world than Adobe, so we felt it important to get an in-depth developer’s look
at OpenCL from someone with extensive knowledge of how the Photoshop is designed and how it evolves across versions. Russell Williams is the principal
scientist and architect on Adobe’s Photoshop team. Look closely and you’ll see that his is the third name listed on the Photoshop CS6 splash screen. His job is
to handle the program’s technical direction and make decisions on which technologies should be used, how much focus should be put on them, and how the
team is going to make it happen.
Tom's Hardware: From dual-thread support to OpenGL, Photoshop has a long history of adopting new acceleration optimizations. But you can’t adopt
everything. What decision process do you use when weighing new acceleration options?
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 6 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Russell Williams: We don't have a single process. One important thing is that we have a fixed amount of resources. We have a certain amount of time. The
team is a certain size. Each time we go through what will be in the next version, performance is one of the things that has to be traded off. Just like any new
feature, we evaluate how much bang for the buck we can get. For instance, improving start-up time is not a sexy performance issue, but it affects every user
and has a huge impact on perceptions of how fast Photoshop is. It also affects productivity—you don't need to have a break in your thought process while
waiting for the app to launch.
Tom's Hardware: With CS 6, we’re seeing the first bits of OpenCL creeping into Photoshop. From the development side, how long did it take to integrate that
support? Hasn’t OpenCL been around long enough for it to have at least started in 5.5?
Russell Williams: We usually don’t implement major new features or functions in most dot releases (such as CS 5.5). So, if you go back to CS5, that was
shipping in spring 2010, but that means we were designing and implementing features starting in late 2008. OpenCL wasn't maturing and didn't have crossplatform drivers until at least late 2009, when we were already locking down into final testing mode. Thus CS6 is really the first opportunity we had to
implement it.
Tom's Hardware: You have several APIs to choose from for accelerating features. Why pick OpenCL?
Russell Williams: One, it's an open standard and an alternative to other GPU-compute technologies like CUDA or DX Compute. The significance is that open
standards work on all platforms. In particular, on Windows, you have the choice to buy one card or another card in many cases. But laptops typically don't
offer that flexibility. I don't like leaving platforms behind, and it’s not feasible to rewrite each feature for each platform. If we wrote a feature in CUDA, we'd
get only Nvidia cards. If we did DX Compute, we'd only get Windows and not Macs, and even both combined would still miss a big chunk of customers. So
cross-platform support for both OS and graphics platforms is very important to us.
The other reason why we’d choose OpenCL and not OpenGL is—well, there's two reasons. With GPU compute languages like OpenCL, they unlock or permit
the use of some parts of the hardware’s functionality, such as low-level algorithmic compute capabilities that aren't accessible through OpenGL. And secondly,
OpenGL and DirectX are very much designed to do a certain thing: render 3D graphics scenes. And if you want to do anything other than render 3D graphics
scenes, you sort of have to think of your problem as "how can I make this look like rendering a 3D graphics scene?" You have to do a lot of graphics setup for
that. For programmers trying to accelerate some algorithm on the GPU, trying to make something like CS6’s blur feature (like PS CS6), or any arbitrary
function that will go faster with GPU, the people writing those things don't have experience working with 3D graphics rendering. They aren't thinking of their
problems in that way, so it's a lot more of a hurdle to use OpenGL. OpenCL gives us the promise of more widespread adoption of GPU technologies across our
developers.
Tom's Hardware: There are a bunch of new features and optimizations in CS6, but only one OpenCL-enhanced feature: Blur Gallery. Not to be tactless, but
why only one? Is this just a toe in the water? Or are other factors involved?
Russell Williams: You have to start somewhere. The OpenCL ecosystem is just getting there, and we don't have historical expertise in this. OpenCL availability
and maturity has only recently become a reality. We intend to do a lot more with it in CS7. Also, we pay close attention to relative returns when looking for
features to accelerate. Like, the eye dropper is already fast enough. You have to speed up features that really are disruptively slow to common workflows. You
want to make an impact to users, not just speed uncommon things up simply because they're there.
I should add that it's not just us at the beginning of the OpenCL path. The whole industry is at the beginning of having and working with drivers that support
OpenCL. When we began CS6, that support was still quite limited.
Q&A: Under The Hood With Adobe, Cont.
Tom's Hardware: Within Photoshop, what limits exist in terms of what you can do with these APIs?
Russell Williams: With some things, we can look at them and know they're not suited for OCL or the GPU in general. In other cases, it's only after we expend
some effort, by implementing it, that we discover we're not going to get the speed-up that will justify the effort. It's well known that the GPU is completely
suited for certain kinds of things and completely unsuited for others. I believe it was AMD that told us that that while a GPU can speed up the things it is
suited to by several hundred times compared to the CPU, a problem that it is ill-suited to GPUs, something inherently sequential, then it might be 10 times
slower.
Some people think, “If the GPU is so much faster, then why not do everything on the GPU?” But the GPU is only suited for certain operations. And for every
operation you want to run on there, you have to re-implement it. For instance, we accelerated the Liquify filter with OGL, not OCL, and that makes a
tremendous difference. For large brushes, it goes from 1 to 2 FPS to being completely fluid, responsive, and tracking with your pen. That kind of
responsiveness for modifying that much data could only be done on a GPU. But it took one engineer most of the entire product development cycle for CS6 to
re-implement the whole thing.
Tom's Hardware: Which gets us back to why was only one feature implemented in OCL this time around. You don't have an infinite number of developers and
only one year between versions.
Russell Williams: That's right. And, of course, we have an even more limited supply of developers that already know OCL.
Tom's Hardware: Did graphics vendors play a role in your OpenCL adoption? Education, tools, and so forth?
Russell Williams: We didn't have much input on creating the tools they had. But they gave us a tremendous amount of help in both learning OpenCL and in
using the tools they have given us. Both Nvidia and AMD gave us support in prototyping algorithms, because both of their interests are to make more use of
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 7 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
the GPU. For us, the big issue is where the performance is. We can't count on a particular level of GPU being in a system. Many systems have Intel integrated
graphics, which have more limited GPU and OpenGL support, and no OpenCL support. A traditional C-based implementation has to be there, and it’s the only
thing we can count on being there. On the other hand, if something is performance-critical, the GPU is really where most of the compute power is in the box.
Beyond that, AMD had their QA/engineering teams constantly available to us. We had weekly calls, access to hardware for testing, and so on. Nvidia and Intel
helped, too, but AMD definitely stepped up.
Tom's Hardware: So which company has the better products, AMD or NVIDIA? [laughs]
Russell Williams: You're Tom's Hardware. You know that depends on what you're running and which week you ask the question—and how much money you
have in your pocket. That's why it is so critical for us to support both vendors. Well, three if you include Intel integrated graphics, which is starting to become
viable.
Tom's Hardware: At some point, performance bottlenecks are inevitable. But how far out do you look when trying to avoid them? Do you say, “Well, we’re
already getting five times better performance—that’s good enough!” Or do you push as far as possible until you hit a wall?
Russell Williams: We do think about that, but it’s very hard. It’s impossible to know in a quantitative way, ahead of time, to know what those will be. We know
qualitatively that we should spend a lot of time on bandwidth issues. Photoshop is munging pixels, so the number of times that pixels have to be moved from
here to there is a huge issue, and we pay attention to every time in the processing pipeline that happens. Quite often, that’s more the limiting factor than just
computation being done on the pixels. In particular, when you have discrete graphics, there is an expensive step of moving the pixels to the graphics card and
back.
Tom's Hardware: So the bus is usually your bottleneck?
Russell Williams: Yes, the PCI bus. I really expect in the future that APUs will require us to rethink which features can be accelerated. Particularly once APUs
start using what some people call zero-copy. When you have an APU, you don't have to go across that long-latency PCI bus to get to the GPU. Right now, you
still have to go through the driver, and it still copies from one place in main memory to another place in memory—the space reserved for CPU and another
place reserved for GPU. They're working on eliminating that step. And as they make that path more efficient, it becomes more and more profitable to do
smaller and smaller operations on the APU, because the overhead on each one is smaller.
On the other hand, APUs are not as fast as discrete GPUs. In some ways, it comes down to on-card memory bandwidth versus main memory. But you can also
think of it as power budgets with discrete cards that are sucking down several hundred watts by themselves. You have to keep copying large things across
this small pipe to this hairdryer of a compute device. It just depends. There has to be enough computation involved to pay for copying it out across the PCI
bus and bringing it back.
Q&A: Under The Hood With Adobe, Cont.
Tom's Hardware: Are we anywhere close to saturating 16 lanes of second-gen PCIe for image editing operations?
Russell Williams: I don't have numbers off the top of my head, but think of a 16-megapixel DSLR image. Say you want to do something, like modifying the tilt
of the blur plane in the blur gallery, and you want to get feedback in real-time—30 to 60 FPS. Then you have to composite the result with 50 other layers, and
that compositing needs to be done back on the CPU, because the entire compositing engine isn't done on the GPU. So copying data back at 60 FPS, you're
copying the full image that's being processed two or three times per frame. Suddenly, that PCIe doesn't look as fast as you originally thought.
Or look at it from a different point of view. Regardless of whether PCIe is fast enough, what matters is how fast it is compared to how fast the computation
out on the card is. If the on-card computation takes half as long as before, the trip across the bus can mean that you only sped up the entire thing by 10% or
so. I have a pithy metaphor: it's like driving to New York to make a sandwich.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 8 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Tom's Hardware: Say what?
Russell Williams: If you want to make a sandwich, and you invent a machine that can make your sandwich in two seconds, it still doesn't make sense to drive
to New York to use the machine when you live in California. The shorter latency of the APU empowers us to use the GPU in all sorts of ways that don't make
sense for discrete graphics. Really, the APU is a new kind of compute device. In the future, it's likely our code will have quite a few cases where it says "if
discrete GPU, use discrete" but quite a few more that say "if APU, use APU."
Tom's Hardware: What about the future of shaders in a time of OpenCL and similar APIs. Adobe has taken a proprietary approach with Pixel Bender, but do
you see this continuing as the market shifts to open standards?
Russell Williams: Shaders have a very solid future. Graphics APIs like OpenGL and DirectX are not going anywhere. OpenGL with custom shaders still provides
the best solution for problems that are similar to 3D rendering, like 3D rendering in Photoshop or the Liquify filter. Now, I can’t speak for Adobe on this, but
my own opinion is that GPGPU programming has come a long way since Adobe started Pixel Bender, and now that there's an industry standard—OpenCL—that
addresses this area, we're adding more emphasis to that. We're members of Khronos, and we'll be contributing the experience we gained designing and
building Pixel Bender to help improve future versions of OpenCL.
Tom's Hardware: My own impression is that many people still view CPUs with integrated graphics—APUs—as a budget solution. Maybe it’s just a habit from so
many years of suffering with graphics-equipped Intel northbridges, I don’t know. But today...has the market shifted? Is APU and heterogeneous architecture
really a game-changer?
Russell Williams: There are different sources of compute power in the box. It used to be there was just one—the CPU—and you wrote in C to use that
resource. Now, a great deal of power is in the GPU, but it’s only suited for some problems. And a great deal of the CPU is in multiple cores and compute units,
like vector units, which are only good at certain problems. In order to use the compute resources and utilize the performance of the machine, you have to use
all the different kinds of units and resources in the machine. You have to "light up" all these things at the same time, with the CPU, GPU, vector units, and so
on all doing the things they're best at. We're trying to use them all at once to give the user the most responsive experience. We're trying to move away from
“fill out a dialog box, click OK, and watch the progress bar” to a more game-like, cinematic FPS experience, where you modify the image directly and get
immediate feedback. The only way to do that is to utilize all the compute resources.
The significance of having integrated performance plus highly capable graphics is it moves this capability into more platforms. Many platforms that don't have
the space, cost, or power budget for discrete. The APU-based solutions give you a tremendous potential performance boost in those environments. The other
critical impact of APU is performance. We have a fixed power budget, and we don't know how to make a CPU go faster in a significant way on that power
budget. We've seen the last of the 50% per year performance boosts on the CPU side. And we're not going to just keep scaling cores—it’s too difficult to make
use of them. The number of programs that could really take advantage of a 24-core single-socket CPU is near zero. So the GPU is essentially the path to bring
that transistor budget to users in a way that can be used.
I think that GPGPU and APUs are just beginning to deliver on the promise that many people have seen in them for many years. We'll see a lot more advantage
taken of that, not just in Photoshop, but in other Adobe apps over the next couple of versions.
Benchmark Results: GIMP
Our early GIMP testing threw us a bit of a curve ball. We originally set out to test with the GEGL effects bilateral filter, edge-laplace, and motion-blur.
However, in repeated testing, we found that the edge-laplace and motion-blur tests were coming back with identical results on the A8-based desktop platform
when running with OpenCL enabled, regardless of whether we were testing with APU graphics or our discrete Radeon HD 7970 card. The 7970 should have
blown the APU out of the water, or at least been decisively faster.
Discussions with AMD and developers confirmed our suspicions: we were hitting a CPU bottleneck on the A8. There simply wasn’t enough compute work
happening for the GPU to make its presence felt. This raises an interesting value point: if your workloads aren’t sufficiently demanding, depending on how
your app is coded, you may not realize as much GPU-assist benefit as expected.
For our purposes, we had to modify our tests in order to increase the processing load to demonstrate GPU compute scaling. We replaced edge-laplace with
Gaussian blur, cranking up the Size X and Y variables to 20.0 each. We kept the motion-blur filter, but increased the Length parameter to 100 and Angle to
45. This gave us the following GIMP results.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 9 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
In these and subsequent tests, you’ll notice the obvious results gap next to our HP notebook where OpenCL results should be—because today’s Sandy Bridgebased HD Graphics engines don’t support OpenCL (and we still haven't been able to get our hands on any Ivy Bridge-based Core i5 machines). Still, we left
the Intel platform in this mix for comparison, because there are some cases in which the performance of Intel’s CPU working only in software makes for an
interesting counterpoint to GPU-based acceleration. After all, with GPU-assist still in its toddler stage and many applications not yet optimized for the new
technology, it’s important to keep one eye on how non-accelerated platforms behave.
In these GIMP tests, though, the benefits of OpenCL-based GPU acceleration are glaring. Even stating the difference as a percentage or multiple seems
irrelevant. The point is that without acceleration, these filters are nearly unusable on any system. Workflow comes to a complete stop as the system creeps
through adding the blur one block at a time. With OpenCL turned on, suddenly we see very even, expected performance scaling as we edge up from mobile to
desktop APU and APU into discrete. Note how it’s not just the graphics processor doing all of the work. Depending on the test, the CPU side still contributes
another 20% to 40% to the end result.
Of course, this is true when a suitable workload is present. Remember that we had to modify our original testing in order to expose more noticeable scaling
from the GPU. Without that deliberate pressure, AMD's x86 cores stand in the way of greater utilization of graphics resources. We're certain that software
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 10 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
developers know what they're up against when it comes to balancing resource utilization, and have to assume that what we're presenting might become a
more prevalent condition as developers code for highly parallel operations. Today, expect the impact to be somewhat more muted.
Benchmark Results: AfterShot Pro
As we indicated earlier, Corel was kind enough to supply us with a tech preview of AfterShot Pro featuring OpenCL support. The company really went out of its
way to accommodate us, and we’re grateful. Up to present, AfterShot has focused on CPU-based optimization, as described by Jeff Stephens, head of product
development for AfterShot Pro and former president of Bibble Labs.
“In AfterShot Pro 1.0, there is zero GPU utilization,” says Stephens. “Years ago, Bibble Labs, which Corel acquired about one year ago, decided to focus on
multi-core processing instead of relying on graphics processing. That decision paid off for years. Bibble 4, followed by Bibble 5, and now AfterShot Pro have
been and are the fastest RAW conversion applications we've ever tested or seen. That's true not only on a single CPU, but as the user adds CPUs, our scaling
is very near linear up to eight CPUs and continues to excel to up 16 cores and beyond. What OpenCL is allowing us to do is to squeeze up to a 2x performance
gain out of existing computers by utilizing the GPU that was otherwise idle while continuing to fully utilize as many CPU cores as are available.”
As promised, Corel squeezes twice the performance from our test systems when using GPU acceleration compared to running only through the CPU in
software. Actually, OpenCL bestows just over a 100% benefit.
You might notice how our A8-based desktop platform's results are nearly identical, regardless of whether we use the APU's shader cores or a discrete Radeon
HD 7970. Sound familiar? But before we jumped to the same conclusion regarding a bottleneck, we went back to Corel with our findings. Working with a
platform very similar to our A8 configuration, Corel reported 27.35 seconds using OpenCL and the Tahiti-based 7970 (as opposed to our 32.02) and 36.30
seconds with OpenCL on APU (as opposed to our 31.50). That seems much more in line with expectations. Unfortunately, after confirming identical BIOS
builds, driver versions, OpenCL versions, and everything else we could think of, no amount of persuasion would shift our numbers significantly, so we had to
publish our own results. Caveat given. Your mileage may vary.
The nature of our preview build of AfterShot Pro resulted in our HP notebook kicking out a “program can’t start because OpenCL.dll is missing” error, and the
program would not start—hence no HP/Intel results in this benchmark.
Benchmark Results: Musemage
The amateur family photographer in us was most interested to see how Musemage testing would turn out. Inevitably, we find ourselves at the end of the
weekend with dozens of images snapped at some family function or a beach trip or birthday party. While every shot is different, many need bounce flashes
toned down, saturation improved, sizes scaled down for emailing, or any number of other alterations. Most often, we blow off this sort of editing because it’s
simply too time-intensive. But Musemage offers the promise of reducing such jobs to mere seconds—if it works as promised.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 11 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
First, we turned to the program’s integrated benchmarking module, a clever nod by the designers toward enthusiasts and reviewers like us. The benchmarking
tool loads a sample image pre-stocked inside the application and cycles it through roughly 80 effects. The better the overall processing performance, the
higher the score. We can see the huge performance gap between the Radeon HD 7970 card and APU. Clearly, the application does an admirable job of
leveraging the GPU for scaling, and circumventing the bus-imposed bottlenecks mentioned by Adobe.
We discovered during testing that Musemage, like Photoshop CS6, does nearly all of its GPU-based acceleration via OpenGL. The only feature Musemage
currently codes for OpenCL is HDR processing.
With current drivers, Intel’s HD Graphics 3000 is OpenGL 3.0-compatible (although still lacking OpenCL support), which is why our lowly Intel Core i5
notebook is able to beat every configuration here in software-based HDR processing, even AMD’s FX-8150.
Turn OpenCL back on, though, and results practically drop off the left edge of the chart. It’s a bit odd that our FX-based system with the Radeon HD 7970
card is slightly slowly than the A8 running the same card, but with such fast processing times, 60 milliseconds is probably within an acceptable variance
range.
And last up, the test we really wanted to see. As expected, performance scales fairly well up the AMD stack, with the FX/Radeon HD 7970 combo taking about
half the time to crunch our eight-image batch as the APU-based notebook did. We were a little surprised to see the Intel notebook slip into the middle of the
results, even edging past the desktop A8 configuration leveraging its integrated graphics. This tells us that Musemage is likely coding its OpenGL support for
the 3.1 or prior generation, rather than the current 4.x, in order to maximize compatibility with Intel’s large installation base. Note that Intel HD Graphics
4000 supports OpenGL 4.0 and OpenCL 1.1. Still, when you want top OpenGL performance, it’s clear that discrete graphics is the way to fly.
Benchmark Results: Photoshop CS6
Finally, the crown jewel of our benchmark apps: Photoshop CS6.
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 12 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
Again, with no hardware-based support, Intel’s Core i5 puts in a remarkably fine showing, losing out only to our FX/Radeon HD 7970 configuration. But with
GPU-based OpenGL enabled, we see performance increase by roughly 200% to 500%. Surprisingly, our A8/APU config turns in the best GPU-accelerated time,
and we did not see the same sort of scaling expected when we moved to testing on the Radeon HD 7970.
Not unexpectedly, CMYK takes 10% to 20% longer to process than RGB, but otherwise, the response patterns are almost identical. Adobe’s first swing with
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 13 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
OpenCL in Photoshop exhibits very clear scaling across our platforms. From the “low” of our mobile A8 APU to the best-of-breed FX/Radeon HD 7970, we see
over a 2x gap, which, in our opinion, speaks fairly highly of the APU’s capabilities. When you can get 50% of the performance of a top-end card essentially for
free, that’s a good deal. The desktop A8 APU lands smack between these two, making an even more persuasive case for budget buyers with plans to use this
application.
When we step up the blur load to 300, we see CMYK suddenly finishing much faster—go figure. Nevertheless, the scaling pattern remains similar, although
we’re now approaching a 200% difference between low- and high-end with GPU-accelerated OpenCL enabled. Moreover, check out how much more benefit
GPU acceleration delivers across the board compared to running in software with the workload increase to 300—up to a 15x benefit in our A8/Radeon HD 7970
configuration.
The Picture Is Changing
Add it all up, and the results definitively show that GPU-based acceleration of some sort should be mandatory for anyone with a significant amount of editing
work to process. Not only do content creators need to keep an eye out for hardware able to accelerate their favorite applications, but they also need to pay
attention to how their software of choice utilizes that hardware. And if the tools you're using don't yet take advantage of GPU-based acceleration, it's worth
finding out why. Not all workloads are ideally suited to the sort of parallelism that a GPU introduces. But when it comes to media-oriented tasks, many do, in
fact, benefit. The question now becomes how quickly vendors will make this support widely available throughout their wares.
“I don't think the OpenCL API in itself is hard,” says GIMP/GEGL developer Victor Oliveira. “In fact, in my opinion, it is cleaner for general-purpose
computation than other APIs like OpenGL and CUDA. Things can get hairy when you have to integrate OpenCL in an existing application that doesn't take
performance and parallelism into account. Especially when data processing is split in many functions and you have to put all this in a kernel, that can
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 14 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
7/23/12 6:40 AM
complicate things and may explain why OpenCL adoption is slow.”
Oliveira expects proprietary acceleration APIs to keep their current footholds in niche vertical markets, such as HPC, because such organizations tend to be
more forgiving of proprietary systems. In the consumer world, though, he expects open source APIs to become the dominant, superior paradigm. The more
vendors that step up their efforts in this space, the faster the transition will happen.
“I think it's very positive that AMD pushes open standards,” says Oliveira. “It really helps to make developers—at least me—more confident about OpenCL,
especially in the open source world. As OpenCL support becomes commonplace, we’ll see more applications like GIMP using it, starting with areas that can
easily take advantage of the GPGPU parallel programming model: image/video/audio editing, machine learning, games, and so on.”
“We'll continue to look at all new GPGPU advances, as well as CPU advances, for ways to make our products faster,” adds Corel’s Jeff Stephen. "Faster
doesn't just mean doing the same thing in less time; it also means opening up new options and opportunities that would otherwise be too slow to consider.”
For us, these changes can’t come fast enough. As long as users keep their processing local rather than in the cloud, we see a new breed of need for currentgen systems with GPGPU support, especially in the mobile arena. Previously, we never would have dreamed of throwing these sorts of graphics loads at
notebooks, and now heterogeneous platforms are able to knife through them more elegantly. By the time our next heterogeneous compute story is on deck
(anticipate a focus on media transcoding tools), we should have the next generation of GPU and APU parts ready, and then...well, we expect awesomeness.
But there’s only one way to know for sure.
Tom's Hardware - http://www.tomshardware.com
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
Page 15 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
7/23/12 6:40 AM
Page 16 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
7/23/12 6:40 AM
Page 17 of 18
Can OpenGL And OpenCL Overhaul Your Photo Editing Experience? - Print
http://www.tomshardware.com/index.php?ctrl=editorial_reviews_print&p1=3208
7/23/12 6:40 AM
Page 18 of 18