Hitching and Jitters In Early Alpha AI War 2

We’re headed for alpha of AI War 2 on Monday, and that’s for the people who want to get into the game “earlier than early,” which I’ll write more about later.

One of the things I wanted to talk about right now, though, is some hitching and jitter that will likely still be present to some degree in that first version.  This is an issue that is cropping up this week due to two factors:

1. The simulation is becoming more complex, and thus more costly to run.  It runs every 100ms, so that’s a hitch with an extra-long frame there.

2. The visualization code runs every frame, so usually 60-140 times per second on my machine, and so there’s a huge difference in the timing between the two sides.  It’s only recently that this has become efficient enough for me to notice such a huge difference between the two groups of frames, though.

In other words, things were more balanced-out previously because we hadn’t finished certain optimizations AND because the simulation was still growing.  Now the simulation has grown more and other optimizations are in, and so the jitter rears its head.

I’m writing this so that hopefully nobody freaks out on Monday. 😉  It won’t take us long to resolve this, but that’s one of the casualties of being in “earlier than early.”  This is part of how the hotdog is made that you get to see.

Last post, I wrote about some improvements made to the fixed-int math as well as RNG calculations for noncritical bits.  Those help the jitters noticeably.

Overall what we’re finding, though, is that we simply need to move the main simulation off the main thread of the game (it’s already using a lot of worker threads, but now the core loop also needs to go).

That was already on Keith’s todo list, but it got bumped up in priority in the last day or so.  I don’t think either of us have any idea if that will be ready by Monday or not, but for the sake of safety I’m going to just assume not.

Occlusion Culling!?

The other thing that is causing periodic hitching is Unity’s own occlusion culling logic.  It occasionally jumps in with a 28ms frame, once or twice every two seconds.  That’s incredibly frustrating.

That said, I found something kinda-sorta similar when I was coding Release Raptor, and I implemented my own frustrum culling solution that I had intended to port over to AI War 2, anyway.  So that’s coming up soon, though likely not prior to Monday, either.

Occlusion culling, in this case, is basically just saying “hey, the camera isn’t pointed at me, so don’t send me to the GPU to render.”  Unity has to calculate that sort of thing for every individual mesh that is rendered if I use their version.  In my own version I can calculate that for things like entire squads.  That cuts down the number of calculations enormously, and I can also calculate AABBs for the squads vastly faster than unity can for individual meshes.

That right there is a good example of the differences between generalized solutions (which must work in any game without knowledge of context) versus specialized solutions (which really wouldn’t work in general cases, but are incredibly efficient in certain specialized ones).  It’s always about choosing a mixture of the two approaches, and right now the generalized approach is biting us in the rear more than I expected.

Particle System Updates!?

This one I don’t even know what it is, and I haven’t had time to investigate it yet.  But every so often – I’m not even sure with what frequency – Unity is throwing a slow frame (again just under 30ms) where is is doing a “particle system update.”

Yes we’re using their shuriken particle system, but why it suddenly has to go through some sort of cleanup or… whatever it’s doing… I have no idea.  So that’s another thing for me to investigate in the coming weeks.

GC? Nuh-uh!

In AI War Classic, the garbage collector running periodically was the biggest source of hitching.  Thanks to our massive refactoring, we’re generating very little waste in AI War 2.  There are some errant GC allocs that we’ll clean up later (you’d like an actual game first, we presume, heh), but those are all very minor and would fall under the banner of premature optimization right now.

At the moment, we’re keeping things clean simply by using clean coding practices and our hard-won knowledge of strange things that cause mono (but not .NET itself) to generate garbage (see: foreach operator).

TLDR

Figured this was worth a writeup since some folks are going to be getting into the game soon, and the first question I would have is “what’s with this hitching?”  Ironically, the worse your computer is, the less hitching you’ll have right now. 😉

But anyway, I figured an explanation of what it is as well as why it’s going to be going away very-soon was warranted.  Cheers!

E tu, Fixed-Int?

Spent a fair bit of time yesterday, and then this morning, looking at the speed of some of our most core alternate-math work.  Floating-point math is too slow as well as being inaccurate between various computer architectures.

So back around 2009, I switched to fixed-int math in AI War on the advice of my friend Lars Bull.  There weren’t any C# implementations at the time, so I wound up porting some professor’s Java implementation over for my purposes.

Over the years, Keith LaMothe (Arcen’s other programmer) and I have made a variety of improvements to it.  We also had to re-implement a lot of trigonometry functions and things like square root, but that’s okay because performance stinks on those anyway in general.  So we took a middle-ground between performance and precision.

Our versions give 100% reliable results, but they only give a certain number of significant digits – which is perfect for fixed-int math anyhow, because that inherently has a digit limit of (in our case) three places after the decimal.

Anyway, we haven’t really touched that code for years; we squeezed all the performance we could out of it long ago, and so we’ve been focused on multithreading, algorithm optimization, and a variety of other things.

Turns out that now things are SO efficient that some of the conversions and implicit-operators that we created for FInt (our fixed-int class) are now actually the bigger enemies again.  There’s always a bottleneck somewhere, and these still aren’t the biggest one, but I managed to shave 2ms per frame off of a 12ms process just by working on the fixed-int math and then also a new random number generator, too.

That’s very worth it!  The only reason it was that high is that we’re talking about a lot of math calls in these particular frames: tens or hundreds of thousands, depending on what is happening.

Regarding random number generation, there’s always a war there between accuracy (how random is it, really) and speed.  I’ve never been all that happy with System.Random in .NET, and in Mono the performance is even worse.  Though benchmarks suggest that the Unity 3D version of it is actually more performant than either the basic .NET or Mono versions.  So… way to go, guys!

Still – we’ve been using Mersenne Twister (algorithm by Takuji Nishimura, with some credit to Topher Cooper and Marc Rieffel, and with a C# implementation by  Akihilo Kramot) since around 2009, and that’s both slightly faster as well as a lot more random.  In the intervening years we made a few small tweaks to make it run even faster, but it’s incredibly minor and mostly doesn’t help much in the grand scheme.  Still, every little bit, right?

The problem is, sometimes you need quality randomness, and sometimes you need ultrafast randomness.  There are certain things (like delays on special effects appearing, or shots emitting) that really don’t need all that much quality.  We’re talking about a delay that is on the order of a hundred or thousand ms that we’re calculating randomly, and there’s just only so much randomness that you can see there.  It’s not like AI decisions, which need to be quality.

So today I ported in a version of Xorshift (discovered by George Marsaglia) based on an implementation by Colin Green.  I implemented that as an alternative for places where we want superspeed and don’t care if we’re getting super-duper high quality randomness as an output.

Well – it delivers!  As noted, between that and the FInt shifts that was 2ms off a 12ms process, which is huge in terms of percentages.