So, at Mozilla we've been looking into more ways to improve our performance in the area of complex graphics. One area where Direct2D is currently not giving us the kind of improvements we'd like, is in the case of drawing complex paths. The problem is that drawing paths will re-analyze the path on every frame using the CPU, causing these scenarios to be bound mainly by the speed of the CPU. This is something we'd like to address in order to improve performance of for example dynamic SVG images, after all once you have analyzed a certain path once, you want to retain as much as you can from that analysis, and re-use it when drawing a new frame with only small changes.
Path Retention Support in Cairo
One of the things that needs to happen is we need to support retaining paths in cairo, in such a way that a cairo surface can choose to associate and retain backend specific data related to that path. Much like is already possible in cairo for surface structures. That is a task which has been taken up by Matt Woodrow and has been coming along nicely (see bug 555877) and I'm not going to spend a lot of time talking about this. What I am going to talk about is my investigation into how to put this to good use from a Direct2D perspective.
Tessellation Caching in Direct2D
When I started my investigation, I was hoping that perhaps ID2D1Geometry would have some level of internal caching. In other words, if I'd just fill the same ID2D1Geometry every frame, this would be significantly faster than re-creating the geometry each frame. For testing this I chose the following geometry, the geometry I chose here is fairly simple, but it has some intersections and some nice big curves, so tessellation should be non-trivial:
sink->BeginFigure(D2D1::Point2F(600, 200), D2D1_FIGURE_BEGIN_FILLED);
seg.point1 = D2D1::Point2F(1100, 200);
seg.point2 = D2D1::Point2F(1100, 700);
seg.point3 = D2D1::Point2F(600, 700);
seg.point1 = D2D1::Point2F(100, 700);
seg.point2 = D2D1::Point2F(100, 200);
seg.point3 = D2D1::Point2F(600, 200);
seg.point1 = D2D1::Point2F(1400, 300);
seg.point2 = D2D1::Point2F(1400, 1400);
seg.point3 = D2D1::Point2F(600, 1000);
Sadly there seemed to be no caching going on, the only speed improvement I could see was from not creating the geometry, the actual rendering showed no performance benefits. However, as we are determined to see if it is possible to do something else to get the desired effect, our eye was caught by another D2D interface.
The ID2D1Mesh and its limitations
So Direct2D has a Mesh object, this is a device dependent object which can be created on a render target, and then filled with the tessellation of an existing geometry (with a certain transformation applied). I should note here that since this Mesh is a collection of triangles, the level of detail is determined by the transformation passed into Tessellate. This means that if you simply zoom in on the mesh, at some point curves will no longer be curves. This is the first limitation of Meshes, however for the purposes of this investigation I'm going to assume we will not scale and I'm simply going to be drawing the same untransformed geometry over and over again. In any case, more often than not we won't be scaling up significantly, and this isn't really a limitation, it just means we have to re-tessellate in some cases.
Now there's another limitation which is more problematic, Meshes only work with Direct2D render targets which have Per Primitive Anti-Aliasing disabled (From here on PPAA). PPAA is an analytical anti-aliasing routine, which is most likely part of the reason why tessellations are not cached by Geometries internally. Anti-Aliasing is important to us, non-AA drawing in Mozilla is rare, and without it things would truly not look so good! There is another option though, when drawing to DXGI surfaces, as we do, you can set the GPU to use Multi-Sample Anti-Aliasing(From here on MSAA) to do anti-aliasing.
MSAA vs. PPAA
So, quality of MSAA is worse than that of PPAA, however it is also faster than PPAA on decent graphics hardware. But we'll get to analyzing the performance of several different solutions later, let's see about the quality. First of all, with no scaling:
Now for a bit more detail:
Notice the smoother transition from white to red on the left edge in the PPAA version. So there's most certainly a difference in quality, although MSAA isn't that bad either! (On some hardware it may be higher or lower quality due to hardware MSAA capabilities)
Another Limitation of MSAA
So at this point, we would be about ready to see about performance differences, except for one thing: MSAA is no longer used when you use PushLayer! The intermediate surface that gets created with PushLayer appears to not inherit the original surface's MSAA settings. Since we use Layers in order to do geometric clipping this poses another problem. We need to be able to do geometric clipping, while continuing to use our retained mesh, and with MSAA. To overcome this method in my investigation I've optionally used another method of clipping, I've created a texture with MSAA enabled (much like CreateLayer), and then I've created a non-MSAA texture, around which a SharedBitmap was created (so that it can be drawn to the main render target). When clipping, the geometry would be drawn to the MSAA texture, which could then be resolved to the non-MSAA texture, which was drawn into the clipping area using FillGeometry. The clipping area was chosen to be a single triangle, non-rectangular as to prevent any optimizations using scissor rects, but also to be trivial to tessellate so that the FillGeometry call for the clipping would not poison the measurement (optionally we could use FillMesh for the clipping area as well using this approach if we had a complex clipping path!)
- Core i7 920
- ATI Radeon HD5850
- Stand-alone skeleton D2D application
- MSAA x8 where MSAA is specified
- Surface 1460x760 pixels
- Drawn 100 times per frame
- 10 draws per clip where clipping is enabled
- All D3D multithreaded optimizations disabled
- Rendering as often as possible, no VSync, clearing once per frame
- No Mesh Measurements with PPAA (since it doesn't work)
As we can see there's a very consistent pattern: The CPU is consistently saturated for drawing the Geometry without cached tessellation. When we draw our existing Mesh, we can see a significant reduction in CPU usage and we supposedly become GPU bound.
We can see that using the retained tessellation through a ID2D1Mesh can offer a significant performance benefit over using an ID2D1Geometry. Also note that drawing to a clipping layer appears to be somewhat faster than drawing to the backbuffer surface directly.
What do we see?
So these are the numbers. The cause of drawing to a clipping layer being slightly faster is most likely that a DXGI surface render target needs to do some degree of syncing that an internal D2D render target (created by PushLayer) does not.
We can clearly see that we can free up a lot of CPU when retaining tessellations of some complexity, even while we produce higher framerates.
One thing I've noticed is that BeginDraw and EndDraw take a lot of CPU, not doing these calls when using the intermediate clipping render target seemed to significantly reduce CPU usage (although of course the results are no longer guaranteed to be correct since EndDraw ensures that all rendering commands are flushed, hence this method wasn't used). Additionally using Flush on the render target rather than EndDraw before resolving the MSAA surface (which should in theory produce correct results) seemed to also lower the CPU usage by some degree, however due to the correctness being hard to judge in these cases I chose not to do the latter either. However there is room for further analysis here and perhaps an even further decrease of CPU usage in the Mesh rendering with manual clipping approach.
Well, I can't really draw any conclusions from this at this point, there's a clear trade-off between performance and quality. It's certainly worth investigating further and possibly a 'mixed' approach could be used depending on the complexity of the path and quality requirements of the user. I realize this was a pretty long and technical post But I hope that for those of you interested in this sort of stuff I've been able to provide some interesting initial measurements in the area of complex geometry rendering in Direct2D. I'm looking forward to any opinions, criticisms, hints or other form of input on my methods and ideas!
So, I talked to you all a while ago about layers, and how we are going to be using it to accelerate composition of web pages across all platforms. There's more news on that front! Recently we landed a first version of the OpenGL layers backend onto trunk (See bug 546517). That backend included all the necessary code to use OpenGL for both image upscaling and YUV to RGB color space conversion.
Currently in general the code using layers is not at a point yet where it can benefit from the hardware layers backend for all rendering. For this reason the OpenGL backend is not used yet for your normal browsing. However as many of you may know the performance of fullscreen HTML5 video is not fantastic at the moment for lesser CPUs. Since fullscreen video in particular needs that extra push over the cliff, we decided to enable the OpenGL layers backend by default specifically for the fullscreen video case. What that means is that we upload the Y, Cb and Cr planes to your GPU, draw them to a fullscreen quad and then combine them to create the RGB image on your monitor. Ultimately what matters is that for those of you using compatible hardware and software it should lead to a big improvement in fullscreen video performance when using our latest Nightlies (get them here) and the upcoming Alpha!
Compatible Hardware And Software?
So, currently there's a little bit of work that still needs to be done to make OpenGL layers work on Mac OS X and Linux, so first of all you'll need Windows. Second of all you will need OpenGL 2 compatible drivers for your graphics hardware. Performance may vary across both hardware and drivers.
Eep! I'm running into issues
If you do run into issues with fullscreen video, do go to http://bugzilla.mozilla.org/ and check if your issue has already been reported. If it hasn't, we're very interested in hearing from you so we can address it as quickly as possible. Don't forget to note your graphics hardware and driver version, as this is invaluable information when trying to diagnose issues.
If you're interested in hearing more about layers, Robert O'Callahan has a very informative blogpost here.
So, the moment is finally there! I realize I've not posted on here for a while, my main reason being I wanted to be able to actually have something to tell you guys the next time I would talk about Direct2D. And it seems that now I do! After a lot of hard work from myself and many others at Mozilla who have been assisting me in getting Direct2D and particularly DirectWrite stable and functional enough for people to start trying it out, I believe our Direct2D support is now ready for the world to try their hands at!
If you download the latest nightly here you will be the proud owner of an official experimental build that has support for Direct2D. This build will automatically update to our new experimental nightly builds, giving you all the bugfixes we're working on for various problems that are still around.
Does that mean if I download the latest nightly I get Direct2D support?
Well, no. Since we don't want to regress any functionality in our browser for the majority of our users, by default your build will not use DirectWrite. There's a couple of steps you'll need to execute to get it working:
So once I've done this, surely I have this Direct2D support going!
Erm... sorry here folks, but no. There's several reasons why you could still not be having Direct2D support. First of all it's possible you have an extension interfering with Direct2D. Several extensions are known to somewhat disrupt our initialization order, meaning the preferences you've just set are not accessible during initialization of the render mode, which will cause GDI default render-mode.
Then there is another issue, if you're actually on Windows XP and not on Windows Vista or Windows 7, Windows XP actually has no support for Direct2D or DirectWrite, so it will fallback to being just a normal build. I suggest you upgrade .
Finally, if you do not have a high-end DirectX 9 graphics card, or a DirectX 10 graphics card, insufficient hardware support will be detected. This will cause the renderer to fallback to using GDI as well, in order to prevent you from suffering a slow browser, or even worse, a crash!
Now, all this reading, by now I must have Direct2D support?!
Right, if you satisfy all the above conditions, you're probably there! We don't have any way to verify at the moment (we're working on something), but one way is to go here. If it runs nice and smooth when you size photos up to fullscreen, it's working!
It is working! Now what?
You should enjoy your new browsing experience, in most cases you should have a noticeably faster and more responsive browser, particularly when using graphically intensive websites (not using flash, which will still be slow). If you find any bugs, please use our traditional reporting method of going to bugzilla and see if they've already been filed. If they haven't, file it, and we'll try to get to it and fix it as soon as possible!
There's a very cool forum thread that tracks the currently known issues, you should have a look there as well.
Well, that's about it!
I apologize to everyone for taking so long to get this ready for putting it out there, and for not providing experimental builds over the last months. The fact is its surprising how much you can do on the web that at least I didn't know about! And in how many interesting ways it can break . But I hope you can now all enjoy the latest and greatest of Direct2D builds. And no longer suffer any manual updating of your build! Happy Browsing!
So, it's been a while since I've given everyone here an update, and for that I do apologize. First of all, Direct2D support is coming well, we're working hard to land it on trunk. I've been getting swift reviews on all my code from a lot of different people, which is great, but it will still take some time since the code is complex and touches a lot of the Mozilla tree. Once it lands it will be disabled by default, but people will be able to easily enable it in their firefox nightly builds! Ofcourse you can follow the status of this work by keeping track of bug 527707. It's not perfect yet, but once again, thanks to all who've given me all the great feedback to get this to a point where it's quite usable!
So, now to the actual title of this post. Layers. Layers are another API we are designing for Mozilla, which can be used to hardware accelerate certain parts of website rendering. Normally this is where I would start a long rant on why hardware acceleration is such a good thing, but since I've already done that 2 posts ago, I'll just refer you there.
First of all it is important to point out that Layers is by no means a replacement for Direct2D. Direct2D accelerates all rendering, from fonts to paths and all such things, layers is intended to allow accelerated rendering and blending of surfaces. A layer could be accessible as many different type of cairo surfaces, possible D2D, but also any other. What this means if the layer system is designed to be easily implemented on top of different APIs, Direct3D (9 or 10) but also OpenGL. This means that we hope this to provide a performance increase for users on Mac and Linux as well.
So what are these layers?
Essentially, they're just that, layers. Normally a website will directly be rendered, as a whole, to a surface (say a window in most cases). This single surface means that unless the surface itself accelerates certain operations internally, it is hard to apply any hardware acceleration there. The layer system starts out with a LayerManager, this layermanager will be responsible for managing the layers in a certain part of screen real estate(say a window). Rather than providing a surface, the layer manager will provide layers to someone who wants to draw, and the API user (in this case our layout system) can then structure those in an ordered tree. By effectively rendering specific parts of a website (for example transformed or transparent areas) into their own layer, the layermanager can use hardware acceleration to composite those areas together effectively. There can be several types of layers, but two layers really form the core of the system: ThebesLayers and ContainerLayers.
So, container layers also live up to their name. They are a type of layer which may contain other layers, their children. These types of layers basically form the backbone of our tree. They do not themselves contain any graphical data, but they have children which are rendered into their area. It can for example be used to transform a set of children in a single transformation.
Now, these sound a bit more exotic. Thebes is our graphics library which wraps cairo. A thebes surface is a surface which can be drawn to using a thebes context. ThebesLayers are layers which are accessible as a thebes surface, and can therefor be easily used to render content to. This could for example also be a Direct2D surface. ThebesLayers form leafs of our tree, they contain content directly, and do not have any children. However, like any other layer, they may have transformations or blending effects applied to them.
Other types of layers
Why would we have other types of layers you might ask? Well, there's actually several reasons. One of the layers we will be having are video layers, the reason is that video data is generally stored not in the normal RGB color space, but rather as YUV data(luminescence and chromatic information, not necessarily at the same resolution). As with any pixel operation, our graphics hardware is especially good at converting this to the RGB type of pixels we need on the screen, therefor this layer can be used specifically for video data. Another layer we will have is a hardware layer, this type of layer contains a surface which exists, and is accessible, solely on the hardware. These would be useful for example for WebGL, where we currently have to do an expensive readback from the graphics hardware to get a frame back into software. Using the hardware layer could then blend that WebGL content directly on the graphics hardware, skipping the intermediate copy to the normal memory.
So, now that we know that, a layer tree for a website with a div in it, in which there's a video and some other content, may look like this:
Well, that explains the very basics of the layers system we're working on. Robert O'Callahan has already been doing all the hard work of writing the very first API for layers, as well as a preliminary integration into our layout system! See bug 534425 for our progress there. Additionally I've been working on an OpenGL 2.1 and D3D10 implementation of the API so far. I had a lot of feedback of people who were disappointed we were not offering something like Direct2D on other platforms. We've not given up on bringing something like that to other platforms. Layers should however, with less risk and work, bring a significant amount of hardware acceleration to other platforms already! I hope you enjoyed reading this and I've informed you a little more about our latest work in this area, and ofcourse reassured you that we're doing everything we can to bring performance improvements to as many users as possible.
So, it seems my little demo of a pre-alpha firefox build with Direct2D support has generated quite some attention! This is good, in many ways. Already users trying out the builds have helped us fix many bugs in it. So I'm already raking in some of the benefits. It has also, understandably, led to a lot of people running their own tests, some more useful than others, some perhaps wrong, or inaccurate. In any case, first of all I wanted to discuss a little on how to analyze D2D performance with a simple firefox build.
Use the same build
If you want to properly compare performance, use my build, and switch on the forcegdi pref. Go to 'about:config' and look for the font.engine.forcegdi pref, and set that to true. After that, you will have a build using GDI only.
Obviously I should have mentioned this in my previous post, so people would not have wasted their time on inaccurate performance analysis. I apologize for that, it is partially because I had not expected so much publicity.
Focus on what (should be) different
Know what you're measuring, and how you're doing it
If you are using an independent measurement benchmark. Be sure you understand well exactly what it measures, and how it does it. This is a very important step. Something spewing out a number and then listing 'higher is better' for a certain part of functionality is great, but it only becomes useful information when you know how the measurement is executed, what it's margin of error is, and what the overhead is it adds to the whatever it is testing. For a lot of all-round browser benchmarks, rendering is only a small part of what's tested. And the total test result differences between D2D enabled and disabled, may not reflect the actual difference in user experience.
Considerations on static page-loading measurements
Since obviously just a rendering improvement if it doesn't actually function better for the end user is practically useless, keep in mind that when you're timing how long it takes to load a page, you're timing all the aforementioned overhead. When interacting with the page without switching to another page, a lot of this overhead does not occur, and a large part of time might be spent in actual rendering. This means that during dynamic interaction (like scrolling), improvements may be more noticeable, although harder to measure.
And finally, thanks!
It's been great to see how many people are trying this, and as I mentioned earlier in this post, it has already greatly assisted us. It's also great that people are working on their own tests of performance and such things, it is always a good thing to have independent performance tests ran. And this too, will help us improve on the build in areas where our own testing may have been lacking. So, thanks, to everyone who is directly or indirectly helping to hopefully provide a great new feature!
|<< <||> >>|