Abstract

Motivation

Since the advent of computers, innovators have looked towards them as a new system for communications. As such, over the last 40 years, the world has witnessed many amazing technologies, starting from TCP/IP, the bread and butter of most networks, to technologies such as streaming video casts and VOIP. With a change in backend technologies, one would expect a corresponding change in the front end systems to reflect these changes. However, upon a more detailed inspection, it is clear that front-end technologies for mutli-user communications have not changed much in the last decade beyond basic implementations of backend multimedia capabilities.

Long Term Mission Statement

Our research team aims to investigate the issues with the status quo in multi-user communications, and to implement front-end changes that correspond to new capabilities presented on the backend. A key backend that wefve chosen to focus on is VOIP (Voice over IP), which has revolutionized user-to-user communication by allowing real-time voice communication. However, on the front-end, most companies have resigned themselves to simply re-implementing a phone, giving the capabilities to dial a contact, voicemail and call forwarding. While these backend to front-end technologies made sense to develop in the physical world, with market instituted limitations on what users were willing to pay for; with software one can implement new solutions at little to no fixed costs to the end user. As such, we choose to focus on our development work upon currently existing backend technologies such as XML and NAT, and to build a better software front-end to them.

Short Term Goals

Innovation for Listeners

For the fall semester, we will focus on adding new ways to make VOIP more intuitive to the user in multi-user environments by the addition of spatial information. The status quo is to simply show a list of users being dialed upon the screen, and with maybe an indication on screen of who is talking, and to present the audio in a mono-channel environment. If two people have similar voices, especially after compression on the network, it is extremely hard to tell them apart if they speak at the same time. However, as previous research done has shown, human beings are able to recognize audio spatially with their two ears, allowing them to differentiate between similar sounds by adding relative positional data. This is used in meetings, where even if people in a room have similar voices, the listener can process them as two separate audio sources, with distinction achieved by the spatial information. By implementing this within software, and presenting spatial information through graphical and audial means, we hope to bring similar benefits to the virtual multi-user environment.

Innovation for Speakers

While the layering of spatial information upon output provides a good means for feedback to the listener, how can we also enhance the environment for the speaker? If the listener has spatial information available to them, then so should the speaker. By using sensors to read the orientation of the head of the speaker, we can identify their intended audience within the room. By adding the ability to the speaker to set aside private sub-spaces of users which will have a spatial representation reflected graphically, we can add the ability for the program to automatically detect when the user only intends to speak to a specified sub-group by the orientation of their head. This allows the speaker to choose whether to speak to the entire audience or only a sub-group, which mirrors real life situations, where sub-conversations between members of a room are not presented to the whole room. Possible applications of this include business meetings between multiple sub-groups, or the relationship of a lawyer to their legal team as opposed to the entire courtroom.