Code-Monkeys Case Study

Downloads

By Edward J. Correia

Perceptual Computing For the First Time

When Code-Monkeys entered the Ultimate Coder Challenge: Going Perceptual, the 12-year-old software development company had no experience with perceptual computing. But that was okay; part of the idea behind the seven-week contest was to educate and to encourage innovation around the use of non-touch gestures—mainly of the hands and face—as inputs for computer programing and control.

With the Intel® Perceptual Computing SDK in beta, Intel engaged with early innovators and thought leaders to encourage collaboration and build a community and knowledge base. Code-Monkeys took Stargate Gunship*, an orbital shooter game built a few months earlier, and modified it to accept movements of the head and hands, as well as voice commands, as inputs. "It felt like the perfect opportunity to expand into perceptual computing," said Chris Skaggs, who founded Code-Monkeys in 2000. "The controls already relied on a fairly basic touch interface using a single finger, so mapping that to head and/or hand tracking felt like a doable project."

Skaggs called on Gavin Nichols, main programmer of the original game that would eventually become Stargate Gunship. "My main priority in this contest was to redesign the game itself for the camera," said Nichols. "This meant writing code so that our GUI system, which was meant for touches and clicks, also worked with perceptual input."

The Control Schema Issue

Stargate Gunship’s pre-existing control schema, which used touch input to move a target reticule to an area to be fired upon, proved unstable for perceptual input. The original control scheme was based on standard orbital-shooter paradigms, and the weapon always fired toward the center of the screen, explained Nichols. Changing where you aimed meant changing where you looked. "The first time we started playing with the camera, it was shaky," said Nichols. They knew that even with marginal shaking, a constantly moving camera was not going to be conducive to an enjoyable playing experience. They needed a way to decouple camera angle and firing angle, which would create the ability to fire anywhere on the screen at any time without having to change camera angles. But sampling the data frame by frame created jitters. The reality was that the camera was too sensitive to motion even with the steadiest of hands. And the problem worsened as the player's arms got tired.

John McGlothlan, who has been programming since age 7, was called in to tackle the issue. He created a buffering system that averages the target hand's position and gesture over a certain number of frames. This resulted in a smoothing of motion as the program displayed a moving average of the hand's position. If the average reflects no movement, the reticule isn't moved. Using McGlothlan's nickname, the team dubbed the algorithm “Lennie's Custom Averages”; sampling takes place at 60 frames per second (FPS) with no perceptible lag.

Figure 1:Developers discovered that players have an easier time controlling the action using head and hand gestures if their perceptual inputs are displayed on-screen during play.

Perceptual Input Class

Lennie’s Custom Averages was a major breakthrough. This led to the development of the Perceptual Input Class—a full input class to process perceptual input—and ultimately to an event- and variable-driven input class that could be accessed at any time and in any script. "This was done at first to match how Unity input interaction was done," said McGlothlan, who's responsible for most of the program debugging. But the ultimate benefit was the ability to contain all perceptual coding in a single file that when inserted into other programs would make them "perceptual-ready."

Modular code development is a huge benefit for efficient multi-platform porting. "I can translate all of my different inputs into game-related terminology," said Nichols, standardizing terms such as "jump,""shoot," or "look along this vector." This lets all of a game's code work from the same set of inputs. "My camera is looking for a vector. Whether or not I'm firing is a single boolean." Then all of the different control schemes feed into this singular place—a uniform data type—and the program can deal with each input system's unique strengths and weaknesses in its own space without messing up another system's code.

To be portable, the new input class had to be verbose, giving more data than needed. This meant that if a different control schema was to be used, only simple changes were necessary to complete rather than rewriting major portions of the Perceptual Input Class. The code below shows half of the computed public variables used across the application. They can be accessed in real time and represent only those for face tracking; there are about 30 accessible variables in all.

public Vector3 facePosition;
public Vector3 facePositionTemp;
public Vector3 leftEyePosition;
public Vector3 rightEyePosition;
public Vector3 deltaEyePosition;
public Vector3 leftMouthPosition;
public Vector3 rightMouthPosition;
public Vector3 deltaMouthPosition;

public Vector3 faceDelta;
public Vector3 leftEyeDelta;
public Vector3 rightEyeDelta;
public Vector3 deltaEyeDelta;
public Vector3 leftMouthDelta;
public Vector3 rightMouthDelta;
public Vector3 deltaMouthDelta;

It wasn't long before Code-Monkeys realized that there was more to perceptual interface building than changing the inputs; the visual feedback also needed to change. "Gavin [Nichols] had such a hard time representing where his head was," said McGlothlan. "Through many conversations, Gavin decided to add the code to see the current inputs in the GUI. With that, we knew we made a huge stride."

Figure 2:A debug view designed to reveal the results of a raycast and the expected orientation of the head. The information this provided for the developers proved to be even more valuable to the player.

According to Nichols, his aha moment came while he was taking a mental break by experimenting with some shaders on the characters and he saw a ghost head. "I made some changes to the models and suddenly I had icons that showed what they were meant to represent." Nichols still had much more to do, including development of new control paradigms and reticules, and tweaking practically everything else related to gameplay. Some of his greatest challenges were in making changes to the GUI feedback icons and getting the reticule to stand out more by adding rotating crosshairs to the edge, and adding some functionality to the GUI to allow hand gestures to activate buttons. Here's how he dealt with some of these things programmatically:

public class ReticlePlacement : MonoBehaviour {




	public CameraStuff myCameraStuff;
	public AimGuns myAimGuns;
	public Gun myGun;
	public LayerMask myLayerMask;
	public MeshRenderer myRenderer;
	public bool showing;
	public float damping;
	
	private Ray ray;
	private RaycastHit hit;
	
	void OnEnable(){
		Fire.changeWeapon += changeWeapon;
		if(myCameraStuff == null) myCameraStuff = (CameraStuff)GameObject.FindObjectOfType(typeof(CameraStuff));
		if(myAimGuns == null) myAimGuns = (AimGuns)GameObject.FindObjectOfType(typeof(AimGuns));
	}
	
	void OnDisable(){
		Fire.changeWeapon -= changeWeapon;
	}
	
	public void changeWeapon(Gun newWeapon){
		myGun = newWeapon;
		transform.localScale = Vector3.one * Mathf.Max(myGun.damageRadius, 5f);
	}
	
	public void LateUpdate(){
		if(Application.platform == RuntimePlatform.IPhonePlayer){
			if(Input.touchCount == 2) showing = true;
			else showing = false;
		}
		if(showing && myRenderer != null){
			myRenderer.enabled = true;
			if(myCameraStuff != null) ray = myCameraStuff.firingVector;//new Ray(myGun.transform.TransformPoint(Vector3.zero),  myGun.transform.TransformDirection(Vector3.forward));
			if(Physics.Raycast(ray, out hit, 10000f, myLayerMask)){
				transform.position = Vector3.Lerp(transform.position, hit.point, Time.deltaTime * damping);
				myAimGuns.transform.LookAt(hit.point);
			}else{
				Vector3 oldposition = transform.position;
				transform.position = ray.origin;
				transform.LookAt(ray.origin + ray.direction);
				transform.Translate(Vector3.forward * 150f);
				if(myAimGuns != null) myAimGuns.transform.LookAt(transform.position);
				transform.position = Vector3.Lerp(oldposition, transform.position, Time.deltaTime * damping);
				//myRenderer.enabled = false;
			}
		}else if(myRenderer != null){
			myRenderer.enabled = false;
		}
	}
}




public class PerceptualStateVisualizer : MonoBehaviour {
	
	public SwitchSprite spriteSwitcher;
	public SpriteColorCycle myCycler;
	
	public void Update(){
		Vector3 outputPosition = Vector3.zero;
		string newSprite = "";
		if(PerceptualInput.instance.collectingData){
			myCycler.Cycling = false;
			if(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Closed){
				newSprite = "ClosedHand";
			}else if(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Open){
				newSprite = "OpenHand";
			}else if(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Peace){
				newSprite = "HandPeace";
			}else if(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Missing){
				newSprite = "HandPeace";
				myCycler.Cycling = true;
			}else if(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Unrecognized){
				newSprite = "OpenHand";
			}
		}
		if(spriteSwitcher != null){
			if(!spriteSwitcher.CompareSpriteName(newSprite)){
				spriteSwitcher.SwitchTo(newSprite);
			}
		}
		
	}
	
}




public class PerceptualGuiClick : MonoBehaviour {
	
	public Camera myGUICamera;
	public delegate void broadcastClick(Vector3 clickPosition);
	public static event broadcastClick alternateClick;
	private bool lastFrameOpen;
	
	public void Update(){
		//if(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Closed || PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Open){
			if(lastFrameOpen && PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Closed){
				if(alternateClick != null) alternateClick(myGUICamera.WorldToScreenPoint(this.transform.position));
			}
			lastFrameOpen = !(PerceptualInput.instance.currentGesture == PerceptualInput.gesture.Closed);
		//}
	}
}




public class AlternativeGuiClicks : MonoBehaviour {
	
	public Camera GUICamera;
	public LayerMask GuiLayerMask;
	
	public void OnEnable(){
		PerceptualGuiClick.alternateClick += Click;
	}
	
	public void OnDisable(){
		PerceptualGuiClick.alternateClick -= Click;
	}
	
	public void Start(){
		if(GUICamera == null){
			GUICamera = this.GetComponent<Camera>();
		}
	}
	
	public void Click(Vector3 screenPoint){
		Ray myRay = new Ray();
		if(GUICamera != null) myRay = GUICamera.ScreenPointToRay(screenPoint);
		RaycastHit hit = new RaycastHit();
		if(Physics.Raycast(myRay, out hit, 100f, GuiLayerMask)){
			//Debug.Log(hit.transform.name);
			hit.transform.SendMessage("OnClick", SendMessageOptions.DontRequireReceiver);
		}
	}
}

From a UI perspective, one of the key takeaways for the team was the importance of visual feedback about what the user's movements were doing within the program. When testing the app on first-time users, people initially had a difficult time making the mental connection that their hand was moving the on-screen target. "The core conversation while people were playing was that they weren’t quite sure what was happening and they found that frustrating," said Skaggs. That kind of user feedback led directly to a hand-and-head calibration tool at the start of the game and the ghosted head and hand images throughout gameplay. "Although I think we’ll make that a temporary thing that will fade out and only reassert itself if we lose track of the user," Skaggs said.

Figure 3:When starting a game, the player is presented with a short series of tasks and instructions that allow the camera to calibrate its input parameters to the player. But equally important is the player’s experience of calibrating his or her physical motions with the ghosted image on the screen.

The Bandwidth Problem

In the real world, military operations for decades have been using lasers to track eye movement for weapons sighting in the heads-up displays of fighter jets. Computers controlled by hands waving in the air have been part of sci-fi pop culture for years. One film that often springs to mind along with perceptual computing is “Minority Report,” in which police use perceptually controlled computers to solve crimes before they happen. But first-time users on the Code-Monkeys team met with a few surprises. First were the physical limitations such as arm fatigue and the limited head movement when playing on a small screen. Then came the technical restrictions such as the raw computing power needed to drive perceptual systems along with the graphics that surround the UI.

The amount of data collected by the camera is enormous and can require a large percentage of the CPU to process and identify as game inputs. However, only about 5 percent of that data was relevant to the controls of Stargate Gunship; the rest was wasted data. The next challenge was to parse the data and figure out how to determine which part of the stream to keep, and to turn the stream on and off as needed and otherwise filter it. It was that knowledge, said Skaggs, which became one of the team's key insights for saving CPU cycles—to identify the player's hand and keep track of where it is. "For example," he said, "a typical person’s hand [usually] winds up in the lower-right corner of the screen. As soon as we can identify that, we just ignore the rest of the screen." They ended up ignoring roughly three-quarters of the pixel data coming in from the two cameras, which he said equates to a savings of about 300 percent. "If it’s not where the hand is we don’t care. That is a key insight.

"Further challenges arose when decoding what the user's hand was doing. Because of the way the camera interprets bones in the hand, certain gestures are more prone to misidentification under a broad range of circumstances. For example, one of the gestures they tried to use for calibration was a simple thumbs-up. But for some reason it was often misunderstood. They brainstormed ideas for gestures they thought might be easier to recognize, settling on the peace sign, which delivered a better and much faster success rate.

Figure 4: Pass or Fail - A key to hand gestures that Code-Monkeys developers found to work best and worst.

McGlothlan's biggest challenge was trimming and optimizing the data stream, and the epiphany came in removing the m_Texture.Apply(); function call. “With the SDK as a starting point we naturally assumed that every line was important. But when we hit a wall in squeezing out performance we started looking deeper and trimming fat.” According to McGlothlan, that single line of code caused the input class to take 90 percent of the processor's time. "In a game, any input class has about 3 percent available to it, maybe 10 percent if the graphics are small," he said. But 90 percent is laughable. By removing that single line of code, McGlothlan said CPU usage for the class dropped from 90 percent to just 3 percent. The app was then able to run at 60 FPS versus between 5 and 15 FPS. The test environment was a high-powered laptop with an Intel® Core™ i7 processor and dedicated GPU.

Figure 5:A selection of accepted gestures, their graphical cues, and the code that identifies them.

Development Process

Stargate Gunship was developed using the Unity game development system from San Francisco-based Unity Technologies. Unity provides a development ecosystem complete with a 3D rendering engine, terrain and object editors, thousands of pre-made objects, and tools for debugging and process workflows. The environment targets 10 major computing, mobile, and console platforms.

A key tool in Unity's debugging arsenal is the memory profiler. "We used this to track any sort of performance hang-ups and see where [the program] was spending most of the time," said Nichols. "Unity's built-in debugger has a wonderful deep analysis tool showing which lines in [which] classes are using the most processing time. The company creates a development environment that makes processing outside inputs so much easier," he said.

In the context of processing perceptual inputs, Unity also proved adept. The environment permitted the creation of control objects with constraints and filters on the object itself. This gave Skaggs and his team a method of throttling the huge streams of data flowing without having to build the logic themselves. "Tools inside Unity helped us get what we wanted at a relatively cheap cost in a processor sense and in terms of man hours," said Skaggs. The Unity user forum also was a good source of developer insight and was visited most frequently after brainstorming new concepts or approaches.

Voice Processing Fail

In addition to the application's visual inputs, Stargate Gunship also accepts spoken commands. As the game's main sound designer, John Bergquist, a confirmed Macgyver nut, was ready to introduce the game to a wider audience. And what better set of beta testers could there be than the hordes of developers roaming the aisles of the Game Developers Conference 2013(GDC)? But as Skaggs recalls, things didn't go quite as planned. "At GDC, we just never really thought about what happens when you’re on a really busy [show] floor and it’s really noisy."

“Ambient noise is the killer,” said Skaggs. In a quiet room with limited background noise, most words were recognized with about the same effectiveness. But the app was unable to decipher commands from cacophony or if the player spoke too rapidly, and Skaggs was ready to yank the whole thing out. But cooler heads prevailed and the team began to explore possible reasons for the failure. In the evening following the first day at GDC, John McGlothlan reflected on the day’s experience and proposed a tweak that proved to be brilliant. "John had a trick," recalled Skaggs. "He said, 'Wait a second, if I do a simple change, we’d shave off the big problem of [vocal command] input getting confused.'”

Preferred:	Decreased Performance:
Words beginning with B, D, K, P, or T	Words beginning with F, H, L, R, or S
Words with long vowel sounds. Example: Aim, Open	Words with soft vowel sounds. Example: Adjust, Engage
Single-syllable words. Example: Fire, Go, Back, Burn	Multi-syllable words. Example: Retreat, Upgrade, Afterburner

They ultimately discovered that plosives—words that begin with P, B, or other popping consonants—were far more recognizable than words starting with vowels. So they began to cull their verb set to include only words that were easily recognizable—ideally to limit the set to single-syllable plosives. This one change improved recognition not only in crowded settings, but also in quiet ones.

Figure 6:Unity 3D’s IDE has been a “go to” technology for Code-Monkeys over the last four years and saved dozens of man-hours with its built-in physics and performance tools.

Conclusion

Despite about a dozen years building software and five years building games, the developers at Code-Monkeys were in uncharted territory when it came to perceptual computing. When faced with the time pressures of the Ultimate Coder Challenge, the team tapped into a deep pool of resources when it could and learned the rest on-the-fly. Some of the team’s most daunting challenges included coping with enormous amounts of data coming from the perceptual camera, deciphering head and hand gestures from busy backgrounds, fine-tuning voice recognition and audio commands, eliminating jitter when displaying visual feedback of gestures, and making visual feedback more useful to the player. Beneath it all was Intel’s support for grassroots app development through coding initiatives, and the resources and support network it makes available to developers.

Resources

The team relied heavily on the Unity Forum, a community consisting of thousands of developers the world over. The team also tapped extensively into the NGUI forum. The Next-Gen UI for Unity includes an event notification framework developed by Tasharin Entertainment. Code-Monkeys also engaged the help of Nat Iwata, a long-time art director and visual resource for Code-Monkeys and Soma Games. Another key partner and Flex/Flash* expert was Ryan Green, whose current Unity 3D project, “That Dragon, Cancer” will touch your heart. And of course there's the Intel Perceptual Computing SDK and Intel Perceptual Online Documentation, which above all else were the team's most useful resources. The documentation provided enough insight to allow the team to move forward using their own backgrounds as a guide. The cumulative knowledge that results from the Ultimate Coder Challenge: Going Perceptual is intended to enrich and supplement Intel's own documentation and help improve its SDK.

Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of third-party vendors and their devices. For optimization information, see software.Intel.com/en-us/articles/optimization-notice/. All products, dates, and plans are based on current expectations and subject to change without notice. Intel, the Intel logo, and Intel Core are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2013. Intel Corporation. All rights reserved

ultimate coder challenge