Voice Quality and Mobile Phone Position
White Paper, May 30th, 2012
In a cellular network, the user’s environment and equipment have a significant impact on the quality of the communication. The same can be said of a VoIP communication. Although packet loss and network bandwidth are important, the user’s environment and equipment directly impact voice quality in many ways.
Figure 1‑1 – Cellular Phone Call
In addition, mobile phone users are becoming less tolerant of voice problems as the whole market becomes more sophisticated. There is ample evidence that consumer expectations of voice quality are increasing as the communications industry matures. The mobile phone is now seen and used by many as a replacement of the fixed phone. This has resulted in a marked reduction in tolerance for voice quality issues such as noise, echo, and level fluctuations.
This reduction in tolerance is directly related to churn. Service providers and manufacturers have become acutely aware of this trend and are now investing significant resources to (a) evaluate and understand voice quality and (b) improve it at every step in the communication path.
Recognizing the importance of these trends, the industry has deployed new technology that addresses some of the quality issues inside the mobile phone. Specifically, technology providers and mobile phone manufacturers have implemented a slew of signal processing features that reduce ambient noise, eliminate acoustic echo, apply automatic volume adjustments and equalize the signal, and so on.
These new features utilize advanced identification/discrimination algorithms to determine which parts of the signal should be preserved or enhanced (e.g. user’s voice, music on hold), and which parts should be removed or attenuated (e.g. ambient noise, echo, reverberation).
This increase in signal manipulation complexity has led to a new set of concerns: Making sure that these new sophisticated features in mobile phones are stable and robust, i.e. that they never degrade quality instead of improving it.
2. New Problems
When mobile phones transmitted the signal as captured at the microphone, there was little potential for “artificial” signal degradation. The full content of the signal was sent as captured, including echo, noise, and volume fluctuations. The only thing manufacturers needed to do was to correctly select and assemble the audio path components (e.g. microphones, amplifiers, etc.).
Now, as intelligent functions in the mobile phone attempt to improve the signal by manipulating components of it, the algorithms that power them are growing in complexity, and relying on an increasingly complex combination of information streams to make their decisions, including:
- A priori geometric information
- In single or multi-microphone devices, it is generally safe to assume that the user’s voice is directed toward the main microphone, and that the user’s mouth is close to that microphone.
- In multi-microphone devices, it is possible to estimate expected arrival differences between microphones for the user’s voice, ambient noise, knowing the relative positions of the microphones.
- Similarly, in multi-microphone devices, it is possible to estimate expected volume differences between microphones for various signal sources, knowing the relative positions of the microphones
- Signal properties, e.g. frequency content, volume, periodicity, etc. Typically algorithms look for characteristics that can be associated with certain types of signals to classify them. For example, voice signals are periodic most of the time, have a specific frequency spectrum shape and power envelope. By contrast, most noise sources have a broader frequency content, and are less periodic.
- Context information, e.g. long-term inferences on the stage of the communication, the ambient noise environment, the main user’s voice characteristics, etc.
In this context, the potential for classification errors has become a major concern. Competition and the natural drive for higher performance are pushing these algorithms to become more and more aggressive, which means that the wrong processing decisions can lead to severe malfunctions, such as amplifying the noise instead of reducing it, or worse, removing the user’s voice (white-out) from the signal. For instance:
- An automatic gain adjustment algorithm could trigger on background noise fluctuations and cause the user’s voice to fade in and out.
- An acoustic echo canceller could mistakenly identify the user’s voice as an echo and remove it from the signal.
- Similarly, a noise reduction algorithm could mistakenly classify the user’s voice as noise and attempt to eliminate it from the signal.
A majority of catastrophic classification errors occur when the a-priori geometric information changes. Examples of scenarios that can cause this to happen:
- A person is on the phone in a crowded train and turns to the side to look out the window. Suddenly her mouth has moved away from the microphone, and her neighbor’s voice may seem louder (and closer) than hers, causing the algorithm – which expects the main user’s mouth to be close to the primary microphone – to select the neighbor’s voice as the main signal to be transmitted.
- A noise reduction feature uses a directional 2-mic beamforming algorithm to focus on the user’s mouth and capture his voice. If the user moves the phone up or down as he’s talking, the directional beam may now “lose” the mouth, causing the user’s voice to become badly attenuated or even completely muted.
- An automatic gain adjustment function expects the user’s voice to be one of the loudest components of the signal, and to originate from a certain angle (where it expects the mouth to be). If the talker starts whispering while pulling the phone away from her face, the function may “lock” onto a different component of the captured signal, thus applying the wrong gain curve.
Due to the highly unpredictable and complex nature of human interaction, it is not realistic to address most of these issues at the design stage. Instead, comprehensive test environments must be set up, in order to fully evaluate the mobile device.
As the mobile device is subjected to “real world” usage, where the user will move the phone around, speak under her breath at times, go in and out of crowded (and noise) environments, algorithms that attempt to classify signal components are stretched to their limits. It is absolutely crucial to evaluate and test them thoroughly throughout their lifecycle – early design, development, implementation and deployment. Tests must be designed that:
- Provide useful feedback to designers, in order to quickly improve performance.
- Are repeatable, allowing test teams to demonstrate issues and re-test for them once they are addressed.
- Are deterministic and precise, allowing test labs to compare results across multiple devices and/or algorithms.
- Are easy and quick to run (automated) – otherwise they will be regularly skipped.
3. Test Strategy
Current audio test setups are based on Head and Torso Simulators (HATS) mannequins. These mannequins include an active “mouth”, which contains a small speaker and simulates a user’s voice. Mobile phones are placed on the HATS using a device called a positioner, which allows the test operator to place the phone at a specific angle and distance relative to the HATS’ mouth.
Test files are them played out through the mouth, and the output of the phone is recorded and analyzed.
If the test calls for testing at multiple phone positions relative to the mouth, the operator has little choice but to manually rotate the positioner to the next specified position. He must then play the files again and record the output. This goes on until all specified test positions have been executed.
This procedure is flawed in multiple respects:
- It forces the operator to interrupt the testing for every different phone position, in order to modify the setup.
- It relies on manually placing the phone in the specified position, which can lead to measurement errors and inconsistencies.
- It is not automated, and does not allow the execution of the full test in one “push button” run.
- It makes the operator’s workflow inefficient, as he must sit around waiting for one phase to complete before he must re-position the phone on the HATS.
- It makes it difficult to reproduce an issue that is seen during a test run, as the phone must be manually re-positioned in the same exact spot.
The solution to all these issues is to modify the test environment – and more specifically the positioner – to allow for automated, deterministic, and repeatable testing.
By automating it and making it controllable through a standard PC interface, the positioner can simply become another set of program lines in a fully integrated test script. Its position can be adjusted programmatically, through a precise motorized engine. Angles can be selected precisely, and since tests follow a series of programmed angles and positions, they are fully repeatable, deterministic, and automated.
With this change, an operator can use small test scripts to run sanity checks at a few selected positions, or run a full test suite that cycles through a large number of different positions, without ever needing to manually position the mobile phone.
The Voice Quality Labs DHP-5xx series of Dynamic Handset Positioners provide exactly this combination of automated, highly controllable and precise functionality. The DHP products allow a test operator to set up a test environment once, and run many tests or one large test suite programmatically, from a remote PC, with no need for manual intervention.
VQL’s DHP products allow technology providers, mobile phone manufacturers, telecom operators and audio test labs to run fully automated, deterministic, repeatable tests in a very efficient manner. With this state-of-the-art technology, it is possible to fully characterize and evaluate mobile phones, easily compare performance across multiple devices, and accelerate algorithm development.
As the mobile market continues to grow, the variety of devices and their capabilities increases exponentially. In order to make sure users are satisfied, it is crucially important to improve the breadth and quality of the audio quality testing at every stage of the production and deployment process.
VQL’s Dynamic Handset Positioners allow major industry players to dramatically accelerate and improve test cycles for mobile devices. By allowing position robustness to become a simple programmatic step, the DHP products make it possible to run fully automated tests across a variety of devices, positions and test conditions.
Voice Quality Labs
© Voice Quality Labs 2012