2. Motivation

Usability is a critical factor in the design of an audio editor on any personal computer, because the user interface of such devices is strongly geared towards visual manipulation of data objects. Digital audio editors in general exhibit a number of usability problems. Firstly, the visualisation of audio data is often poor, providing little indication of the audio content. Secondly, although the data being edited is audio, existing software provides few opportunites to actually hear it beyond a fixed speed playback that cannot be invoked during editing or even navigation. Lastly, the lag between audio and its visualisation introduced by buffering in the audio device is often poorly managed by editing software, introducing a disconcerting delay between user interaction and audible response. The user interface for Sweep was designed to avoid such shortcomings.

Computer editors for text and visual media such as video allow the user to visually scan and navigate through a work, and to immediately see the outcome of any editing operation. Audio editors, on the other hand, generally provide only a rough outline of the waveform in the form of a graph depicting peak values. This is sufficient to discern major differences, such as that between silence and loud sounds, and to notice the effect of large edits, such as cutting. Such a graph can be used to perform simple editing operations such as topping and tailing, the removal of silence at the start and end of a recording. This representation, however, is completely inadequate for depicting the operation of more subtle operations like noise reduction or reverberation, which can greatly change the sonic texture of a sound with little effect on loudness.

Whereas navigation through a text document in a visual text editor implicitly provides an indication of the content at the cursor position, in an audio file it commonly takes a number of seconds of listening simply to find the context of one's place. Precise placement of the audio cursor often requires tedious juggling with fixed-speed playback and transport controls such as fast-forward and rewind. However, in the world of analogue audio editing, such as with tape reels, the user experience is far more tactile. The tape can be moved at an arbitrary speed back and forth past the playback head, allowing the user a detailed scan of the material being edited, and a precise search for suitable edit points such as the onset of musical pauses or the completion of syllables in speech.

The latency introduced by a non-realtime multi-tasking operating system is another crucial factor in the design of interactive audio software. Due to the requirement of fair scheduling, it is not possible for such a system to guarantee that data written to the audio device will be heard at exactly the right moment; if scheduling delays cause an audio application to be starved of access to the audio device up to the time when sound is due to be played, an audible glitch will be heard. Although very brief this sound is often extremely jarring, may cause damage to speakers and if not detected in software can cause a loss of synchronisation between the audio and video or other applications. In order to compensate for unpredictable scheduling, applications can increase the size and number of the audio driver's buffers. Larger buffers can go a long way towards ensuring that no glitches are heard, however this degrades interactivity. The size of the buffers is directly proportional to the time delay between the application writing to the audio device and the sound being heard, and for sounds triggered by interactive events this introduces a delay between user input and the expected sound. For audio editors this delay manifests itself during playback as a discrepancy between the cursor position on screen and the sound heard by the user. A delay in responsiveness of more than about 10ms is easily noticed by the human ear, and can be quite off-putting in musical applications as it interferes with rhythm of a musician's performance.

These shortcomings are not present when editing or navigating audio in the analogue domain, as is done with recording on analogue tape reels or cueing songs on vinyl records; in fact the responsiveness of vinyl is so precise that it is regularly used as a performance artform in its own right. Thus the motivation in improving Sweep's usability was to make it comparable to editing in the analogue domain, and in turn to extend its usefulness as a tool for live performance.