Archives for: April 2010

So, at Mozilla we've been looking into more ways to improve our performance in the area of complex graphics. One area where Direct2D is currently not giving us the kind of improvements we'd like, is in the case of drawing complex paths. The problem is that drawing paths will re-analyze the path on every frame using the CPU, causing these scenarios to be bound mainly by the speed of the CPU. This is something we'd like to address in order to improve performance of for example dynamic SVG images, after all once you have analyzed a certain path once, you want to retain as much as you can from that analysis, and re-use it when drawing a new frame with only small changes.

Path Retention Support in Cairo

One of the things that needs to happen is we need to support retaining paths in cairo, in such a way that a cairo surface can choose to associate and retain backend specific data related to that path. Much like is already possible in cairo for surface structures. That is a task which has been taken up by Matt Woodrow and has been coming along nicely (see bug 555877) and I'm not going to spend a lot of time talking about this. What I am going to talk about is my investigation into how to put this to good use from a Direct2D perspective.

Tessellation Caching in Direct2D

When I started my investigation, I was hoping that perhaps ID2D1Geometry would have some level of internal caching. In other words, if I'd just fill the same ID2D1Geometry every frame, this would be significantly faster than re-creating the geometry each frame. For testing this I chose the following geometry, the geometry I chose here is fairly simple, but it has some intersections and some nice big curves, so tessellation should be non-trivial:

sink->BeginFigure(D2D1::Point2F(600, 200), D2D1_FIGURE_BEGIN_FILLED);
D2D1_BEZIER_SEGMENT seg[3];
seg[0].point1 = D2D1::Point2F(1100, 200);
seg[0].point2 = D2D1::Point2F(1100, 700);
seg[0].point3 = D2D1::Point2F(600, 700);
seg[1].point1 = D2D1::Point2F(100, 700);
seg[1].point2 = D2D1::Point2F(100, 200);
seg[1].point3 = D2D1::Point2F(600, 200);
seg[2].point1 = D2D1::Point2F(1400, 300);
seg[2].point2 = D2D1::Point2F(1400, 1400);
seg[2].point3 = D2D1::Point2F(600, 1000);
sink->AddBeziers(seg, 3);
sink->AddLine(D2D1::Point2F(30, 130));
sink->EndFigure(D2D1_FIGURE_END_CLOSED);

Sadly there seemed to be no caching going on, the only speed improvement I could see was from not creating the geometry, the actual rendering showed no performance benefits. However, as we are determined to see if it is possible to do something else to get the desired effect, our eye was caught by another D2D interface.

The ID2D1Mesh and its limitations

So Direct2D has a Mesh object, this is a device dependent object which can be created on a render target, and then filled with the tessellation of an existing geometry (with a certain transformation applied). I should note here that since this Mesh is a collection of triangles, the level of detail is determined by the transformation passed into Tessellate. This means that if you simply zoom in on the mesh, at some point curves will no longer be curves. This is the first limitation of Meshes, however for the purposes of this investigation I'm going to assume we will not scale and I'm simply going to be drawing the same untransformed geometry over and over again. In any case, more often than not we won't be scaling up significantly, and this isn't really a limitation, it just means we have to re-tessellate in some cases.

Now there's another limitation which is more problematic, Meshes only work with Direct2D render targets which have Per Primitive Anti-Aliasing disabled (From here on PPAA). PPAA is an analytical anti-aliasing routine, which is most likely part of the reason why tessellations are not cached by Geometries internally. Anti-Aliasing is important to us, non-AA drawing in Mozilla is rare, and without it things would truly not look so good! There is another option though, when drawing to DXGI surfaces, as we do, you can set the GPU to use Multi-Sample Anti-Aliasing(From here on MSAA) to do anti-aliasing.

MSAA vs. PPAA

So, quality of MSAA is worse than that of PPAA, however it is also faster than PPAA on decent graphics hardware. But we'll get to analyzing the performance of several different solutions later, let's see about the quality. First of all, with no scaling:


MSAA 8x

PPAA

Now for a bit more detail:


MSAA 8x

PPAA

Notice the smoother transition from white to red on the left edge in the PPAA version. So there's most certainly a difference in quality, although MSAA isn't that bad either! (On some hardware it may be higher or lower quality due to hardware MSAA capabilities)

Another Limitation of MSAA

So at this point, we would be about ready to see about performance differences, except for one thing: MSAA is no longer used when you use PushLayer! The intermediate surface that gets created with PushLayer appears to not inherit the original surface's MSAA settings. Since we use Layers in order to do geometric clipping this poses another problem. We need to be able to do geometric clipping, while continuing to use our retained mesh, and with MSAA. To overcome this method in my investigation I've optionally used another method of clipping, I've created a texture with MSAA enabled (much like CreateLayer), and then I've created a non-MSAA texture, around which a SharedBitmap was created (so that it can be drawn to the main render target). When clipping, the geometry would be drawn to the MSAA texture, which could then be resolved to the non-MSAA texture, which was drawn into the clipping area using FillGeometry. The clipping area was chosen to be a single triangle, non-rectangular as to prevent any optimizations using scissor rects, but also to be trivial to tessellate so that the FillGeometry call for the clipping would not poison the measurement (optionally we could use FillMesh for the clipping area as well using this approach if we had a complex clipping path!)

Testing Conditions

- Core i7 920
- ATI Radeon HD5850
- Stand-alone skeleton D2D application
- MSAA x8 where MSAA is specified
- Surface 1460x760 pixels
- Drawn 100 times per frame
- 10 draws per clip where clipping is enabled
- All D3D multithreaded optimizations disabled
- Rendering as often as possible, no VSync, clearing once per frame
- No Mesh Measurements with PPAA (since it doesn't work)

CPU Usage

As we can see there's a very consistent pattern: The CPU is consistently saturated for drawing the Geometry without cached tessellation. When we draw our existing Mesh, we can see a significant reduction in CPU usage and we supposedly become GPU bound.

Rendering Speed

We can see that using the retained tessellation through a ID2D1Mesh can offer a significant performance benefit over using an ID2D1Geometry. Also note that drawing to a clipping layer appears to be somewhat faster than drawing to the backbuffer surface directly.

What do we see?

So these are the numbers. The cause of drawing to a clipping layer being slightly faster is most likely that a DXGI surface render target needs to do some degree of syncing that an internal D2D render target (created by PushLayer) does not.

We can clearly see that we can free up a lot of CPU when retaining tessellations of some complexity, even while we produce higher framerates.

One thing I've noticed is that BeginDraw and EndDraw take a lot of CPU, not doing these calls when using the intermediate clipping render target seemed to significantly reduce CPU usage (although of course the results are no longer guaranteed to be correct since EndDraw ensures that all rendering commands are flushed, hence this method wasn't used). Additionally using Flush on the render target rather than EndDraw before resolving the MSAA surface (which should in theory produce correct results) seemed to also lower the CPU usage by some degree, however due to the correctness being hard to judge in these cases I chose not to do the latter either. However there is room for further analysis here and perhaps an even further decrease of CPU usage in the Mesh rendering with manual clipping approach.

Any Conclusion?

Well, I can't really draw any conclusions from this at this point, there's a clear trade-off between performance and quality. It's certainly worth investigating further and possibly a 'mixed' approach could be used depending on the complexity of the path and quality requirements of the user. I realize this was a pretty long and technical post :) But I hope that for those of you interested in this sort of stuff I've been able to provide some interesting initial measurements in the area of complex geometry rendering in Direct2D. I'm looking forward to any opinions, criticisms, hints or other form of input on my methods and ideas!

So, I talked to you all a while ago about layers, and how we are going to be using it to accelerate composition of web pages across all platforms. There's more news on that front! Recently we landed a first version of the OpenGL layers backend onto trunk (See bug 546517). That backend included all the necessary code to use OpenGL for both image upscaling and YUV to RGB color space conversion.

Currently in general the code using layers is not at a point yet where it can benefit from the hardware layers backend for all rendering. For this reason the OpenGL backend is not used yet for your normal browsing. However as many of you may know the performance of fullscreen HTML5 video is not fantastic at the moment for lesser CPUs. Since fullscreen video in particular needs that extra push over the cliff, we decided to enable the OpenGL layers backend by default specifically for the fullscreen video case. What that means is that we upload the Y, Cb and Cr planes to your GPU, draw them to a fullscreen quad and then combine them to create the RGB image on your monitor. Ultimately what matters is that for those of you using compatible hardware and software it should lead to a big improvement in fullscreen video performance when using our latest Nightlies (get them here) and the upcoming Alpha!

Compatible Hardware And Software?

So, currently there's a little bit of work that still needs to be done to make OpenGL layers work on Mac OS X and Linux, so first of all you'll need Windows. Second of all you will need OpenGL 2 compatible drivers for your graphics hardware. Performance may vary across both hardware and drivers.

Eep! I'm running into issues

If you do run into issues with fullscreen video, do go to http://bugzilla.mozilla.org/ and check if your issue has already been reported. If it hasn't, we're very interested in hearing from you so we can address it as quickly as possible. Don't forget to note your graphics hardware and driver version, as this is invaluable information when trying to diagnose issues.

If you're interested in hearing more about layers, Robert O'Callahan has a very informative blogpost here.

April 2010
Mon Tue Wed Thu Fri Sat Sun
 << < Current> >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Certain events have made me realise it's probably a good idea to have a 'blog' to share ideas and get feedback...

Search

XML Feeds

multiple blogs