Aeon Emulator Blog

September 24, 2009

Performance Considerations – Garbage Collection

Filed under: Aeon, Performance — Greg Divis @ 9:06 pm

I’m going to start my series on performance with something that other people have covered many times already, and it’s something that almost always comes up when you’re talking about performance in C# – garbage collection. From my experience, lots of developers don’t understand it, and some think they understand it but don’t. I don’t claim to be an expert on the subject, but I at least have a high-level understanding of how the .NET garbage collector works. If you’re developing something that may be CPU-bound in .NET, do yourself a favor and read a little bit about the garbage collector if you haven’t already.

Ok, now we should be on the same page. To sum up the parts relevant to this post: we have a generational GC with 3 generations and a large object heap. Allocations less than about 85k are taken from Gen 0 and are very fast. The GC suspends all of your application threads when it runs and compacts the generational heap (or part of it) to make it contiguous, and it’s adaptive so it will run more often if you allocate memory on the heap frequently.

This is a great design, because most applications only use the CPU for short periods, then sit idle. The GC works behind the scenes and you don’t even notice it. Aeon, however, doesn’t sit idle unless there’s no program loaded. During emulation, it’s constantly decoding instructions and emulating them, so I had to be careful not to create any garbage along this path.

The Garbage Collector and Aeon

One of the nice things about .NET is all of the instrumentation you get for free. I fired up PerfMon and set it to display the total number of Gen 0-2 collections as Aeon is running Wolfenstein 3D:
Aeon GC count
 

This is about as close to never running as it gets. There’s a Gen 0 collection every 10 seconds, and that’s about it. A look at heap size confirms this:
Aeon memory usage

Too Much Garbage

Next we’ll have a look at what would happen if I had created something on the managed heap for each emulated instruction. I added this line to the critical path so it runs on every instruction:

var rubbish = new object();

Now, System.Object is about as small as it gets for an individual allocation on the managed heap, but this is called millions of times per second. What effect does this have on performance? Let’s have a look at those Gen 0-2 collections again:
Too much rubbish

Hey, that doesn’t look so bad, right? I mean, it’s going up constantly sure, but not too quickly. Actually, take a look at the scale – it’s dividing by 1000 before plotting the graph. If it’s not scaled, the red line would go straight up. Now let’s look at the heap sizes again:
Lots of collections

Clearly there’s a lot of activity, and this isn’t the kind of thing you’d like to see happen in a tight loop. There’s one more bit of data to take a look at before we move on, and that’s the percentage of time spent in garbage collections:
GC Time

It’s a horizontal line, which we should expect since we’re creating garbage at a constant rate, but it seems a little too low to worry about, right? It’s only averaging 3.6% (without the inserted line of code, it stays at 0). Well… maybe, and maybe not. Here’s the actual instruction throughput in the normal and extra garbage cases of Wolfenstein 3D:

  Normal Garbage Mode % difference
Aeon MIPS 22 15 32%

So the GC running 4% of the time actually caused a decrease in throughput of 32% – pretty significant, but not necessarily a show-stopper. After all, before a bunch of optimizations I made a while ago, Aeon ran this game at around 15 MIPS and it played at full speed. But wait – when I run it in “garbage mode,” the game is noticeably sluggish and the music even plays back in slow-motion. What’s going on?

Nasty Side-effects

We really have two unsolved issues here. First of all, why does a GC time of 3.6% cause such a large decrease in instruction throughput? Look again at the graphs above – the number of Gen 0 collections is rapidly increasing at a constant rate, but the % time in GC is pretty low. So… we’re collecting a lot of garbage but not spending much time doing it. I guess time spent in the garbage collector isn’t our problem. Actually, the fact that it’s only 3.6% is pretty amazing given all the garbage being thrown at it. (One of the reasons it’s so quick in this case is because all of this garbage is short-lived and never makes it out of Gen 0).

This is where I have to enter guessing territory. I know that all of the managed .NET threads are suspended while the GC runs. Suspending and resuming a thread has some overhead which could add up if it’s happening constantly. Another consequence of the GC running and accessing different regions of memory than our program is concerned with could be polluting the CPU cache. When the collector runs infrequently, this isn’t a big deal, but we’re hammering on it here and could be causing way more cache misses than before. Nothing kills CPU performance like cache misses.

As for the game running sluggish, I suspect this is also due to the fact that all of the managed threads are getting constantly suspended – including input handling, WPF, FM music synthesis and everything else. I’m open to corrections on any of this.

Lessons Learned

I may be way off base in my wild speculations for why the performance penalty for excessive collections is so severe, but really that doesn’t even matter. This is an extreme case, and one look at the graph showing GC collections shooting up like that should be enough to indicate that there’s some excessive garbage produced by this code.

The cause of the garbage problem was pretty obvious in this case, not least because I deliberately added it. In real-world code with this problem it’s usually not that easy. Maybe the program is calling some library code that generates garbage; maybe there’s a line of code left in there from debugging that was never removed and all of the sudden your program takes twice as long to run. I’ve even seen cases where the author of a program generated a massive text file via string concatenation, not realizing that garbage was produced every time that + operator was used – just changing it to use a StringBuilder turned a 5 minute task into a 1 second task (it was a really big file).

Bottom line: If you have some managed code that’s running slowly and you’re not able to profile it, check its garbage collection rate with PerfMon, especially if you’re calling into some external code in a tight loop. Maybe whoever wrote that external code likes string concatenation.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: