This project visually displays the frequency content of an audio signal using an 8X3 grid of LEDs. A microphone and amplifier collect analog audio input which is sampled digitized using the MCU’s analog-digital converter. The samples are buffered and and a 32 point discrete cosine transform is performed in real time to obtain 32 frequency coefficients. These coefficients are then displayed in real time on the LED grid, with the eight columns corresponding to increasing frequency and the three rows corresponding to increasing energy at that frequency. Several modes are available to customize how the frequency content is displayed. Our goal was to allow for the user to interact with audio in a visual way, which has practical applications to fields like audio engineering and sound editing, as well as aesthetic appeal and potential for decorative applications.
High Level Design
Sound is a vibration that propagates as a typically audible mechanical wave of pressure and displacement, through a medium such as air or water. Due to this, sound is only heard and cannot be seen. Light, however, is the opposite; it can be seen and not heard. isn’t there a way to combine both? This question served as our rationale for this project: to design a system that would present a visual representation of sound.
In order to accomplish this, our group decided to separate this project into three subsystems: sound acquisition, frequency analysis of the sound, and conversion to light. After acquiring sound through a microphone, it was cleaned and amplified to an audible range. We then used the discrete cosine transform (DCT) to analyze the 32-point sample. The DCT helps separate the audio signal to parts (spectral sub-bands) of differing importance with respect to the sounds volume and frequency. The DCT is similar to the because it transforms a signal from the spatial domain to the frequency domain. We focus on the DCT coefficients because they depict the energies of the signal at different frequencies.
The DCT transform results are then translated into controlling the array of LEDs. Upon receiving a new sample’s 32 DCT coefficients, we partition the coefficients into groups that correspond to one of the 8 rows of LEDs. First, the values of each coefficient in the group are summed. The sum represents the total “energy” for that group of coefficients. Then the row of 3 LEDs are toggled on according to certain thresholds; the greater the sum, the more LEDs are turned on. No LEDs on signifies almost no energy in that group, while 3 LEDs signal almost all frequencies in that group have equally high energy. When we replicate this grouping and LED-toggling across the other coefficients simultaneously, we can visually observe a given signal sample’s frequency content, with the LEDs representing increasingly greater frequencies from left to right. By changing the mode of operation, we can use our 8 rows of LEDs to display all 32 coefficients (8 groups of 4 coefficients), the upper or lower 16 (8 groups of 2 coefficients), or the odd or even coefficients (also 8 groups of 2 coefficients). This effectively increases the “resolution” of visualizing each coefficient.
The button located on the device allows the user to toggle among the various modes of operation. By pressing the button, the program rotates through the set of available modes and displays the current mode on the terminal. The modes of operation are: All 32 coefficients, the upper 16 coefficients, the lower 16 coefficients, the odd coefficients, and the even coefficients.
The hardware/software tradeoffs of this implementation mainly focused around feedback to the user. While it was a definitive decision to use software to perform the DCT, there were many options in displaying the frequency transform on hardware and how the user could change and display the mode of operation. The team weighed the possibility of using an LCD screen to perform the same function as the array of LEDs, with each block in a 16×4 LCD being analogous to a single LED. We ultimately chose not to use an LCD because of the additional software overhead required to set up and control the LCD, whereas direct port control to toggle LEDs would make our device more responsive and more visually appealing. We also considered using the serial connection/terminal to both display the mode of operation and take in user input to change the mode. While this would have been an intuitive way of unifying all of the user’s interaction and feedback with the device onto one medium, taking in user input while simultaneously running the frequency transforms and LED output on the MCU would have required implantation of concurrent processes. We again decided to use the more efficient approach of polling and debouncing a button as the user interface while using the UART to display the current operation mode.
We obtained sound samples through the use of an omnidirectional microphone (CMA-6542PF). Based on the data sheet, we connected this microphone to an RC circuit (R3 and C2), which served as a low pass filter to help reduce background noise. This signal was then sent through an LM358 Op-amp with a gain of 200 to get a clear and distinct output from the microphone. The gain was calculated by dividing resistor R1 (on the output of the op-amp) by R5 (on the inverting terminal of the op-amp). The output of this was connected to the ADC on board of the MCU (pin A0). The gain applied helped amplify the voltage of the output (from the microphone) while the resistors R2 and R4 biased the output to a range of +/- 2.5V, which we easily corresponded to the 0-5V signal range of the ADC. Here, the ADC converts the analog signal to a digital signal, which is then processed by the frequency subsystem.
We used the discrete fourier transform (DCT) method (written by KR. Rao and P. Yip and revised by Bruce Land) to analyze the samples in the buffers.The DCT allows us to express a sequence of data points (obtained from ADC) in terms of cosine functions oscillation at different frequencies. We decided to use a DCT rather than a Fast Fourier Transform (FFT) for two reasons. First, the DCT uses cosines rather than sines (in FFT) because fewer cosine functions are needed to approximate a typical signal. And second, the DCT uses strictly real numbers, which made processing easier.
Since our project was designed to run in real-time, deadlines had to be met. Analyzing the entire sound signal was not possible as it would have required more memory and time than available. Also, since DCT calculations take time, there is a possibility that the ADC begins to overwrite data before the DCT calculation has completed, causing an error in the calculation. Therefore we created buffers to sample the incoming audio signal. We created two 32-bit buffers (bufferIn and a32). Upon conversion, we store the output of the ADC into bufferIn. As soon as bufferIn is filled, we copy the data into buffer a32, normalize it to DCT standards, and perform the DCT on a32. This method allows us to analyze all the data in bufferIn without worrying about overwriting data.
The DCT-to-light process operates in an overarching state machine which indicates which mode of operation (All, upper 16, lower 16, odd, even) the user is in. By pressing the button (the mechanisms of which will be explained in more detail), the user is able to rotate this within the state machine. See Figure 2 for the diagram of the mode of operation state machine.
Once the incoming signal has been processed and its frequency representation coefficients are placed in the output array, the spectrum is analyzed by calling the function DCT_light(). This function determines how to group the coefficients depending on the current mode of operation. After determining its mode using switch and case statements, DCT_light sums each of the groups, sets the thresholds in that state that will determine what sums trigger what LEDs to toggle, turns off all the LEDs, and then calls turn_on_LED(). Turn_on_LED() compares the sum of each group to the thresholds to determine how many LEDs in its row it will turn on, and then turns on those LEDs based on hard-wired port values. The function then exits and the whole process repeats for the next 32-point sample.
Implementing the button used to rotate among the modes of operation was a challenging part of the project, mainly due to the real-time nature of the implementation. The button state is polled at a frequency of around 4 kHz, where upon every other timer 0 overflow interrupt, the button’s state is checked and moves through a state machine to debounce the button push and release. This is the same 1-button debounce code provided by Bruce Land in our cricket call generator lab. On the transition from a possible push (MaybePush) to pushed (Pushed), the state transitions and a message is printed to the console when the program returns to main(). A flag is used to signal when the state transition occurs and is turned off when the print occurs so the message is only printed once per transition. Figure 3 shows the debounce state machine that the button adheres to.
Overall, this approached well because of the consistent timing at which the button was checked for a state change. Previous unsuccessful attempts at debouncing and recognizing button pushes included polling and attempt a debounce through every iteration of the main while loop. This was not effective because it required the button to be recognized at a very specific part of the while loop (which is dominated mostly by the DCT function) and would only be able to recognize a very short button press without affecting the performance of the rest of the device’s functionality.
The hardware schematic for connection among the LEDs, UART, and MCU are shown in Figure 4 below. The aforementioned sound acquisition circuit is shown as a block. Only some of the LED array is shown below for clarity, but it repeats for 24 rows. See the source code for specific port wirings.
Threshold selection was done mainly on a heuristic basis based testing down with different audio samples (including our own voices and test sine waves) at various distances from the microphone. We initially set the low, medium, and high thresholds to a logarithmic scale that also models how humans perceive sound (such as 256/512/2014 or 128/256/1024). After determining a reasonable distance to be from the microphone (around 8-12 inches), we played various audio tracks to get the best representation of range of frequency content. This meant that for very low-volume/amplitude content, we should see few to no LEDs lit. We refined our thresholds with methods such as subtracting out threshold values from the lower frequency groups’ sums when the device ran in an ambient noise (no deliberate input) environment. For example, if under no input the first group of LEDs displayed only 1 LED, we would subtract the “low threshold” term from the group’s sum. This technique worked well in tuning the performance of the device.
Our project is easily able to keep up with real time requirements without any flickering or audio hiccups. Our buffered design does introduce a small amount of lag between the audio and the display on the LEDs. At any given time, the current audio samples are being entered into a buffer, while the DCT coefficients of the most recently filled buffer are being displayed. This means that while the audio is sampled at 8kHz, the LEDs only update at 250Hz, and furthermore that the DCT coefficients being displayed will always lag the audio input by up to 4ms. We found that this was a small enough delay that it was not noticeable for any audio signals that we tested.
Our project, as a fairly rough visualization of frequency content, did not require pinpoint accuracy in order to be effective. We verified the effectiveness of our microphone circuit by comparing the results of inputting CPU generated sine waves with playing those same sine waves into the microphone. We found significant agreement between them with very little circuit noise. Our project is, however, very sensitive to ambient noise, which does lead to some discrepancies in measured frequency content when performing tests in a noisy environment such as a busy lab. Overall we found that our project is extremely accurate at displaying the frequency content of the audio that it receives, although the audio that it receives may contain unintentional background noise.
The design is extremely easy to use. The ideal volume and range of the microphone is a user sitting in front of the device speaking at a normal volume, however it is also functional at lower and higher volumes. This device could be easily used by somebody with special needs, in particular this could actually be of particular interest to somebody with hearing impairment as a way of interacting with and experiencing sounds and music.
Several representative tests can be viewed below.
There were few safety concerns with our project as our hardware design was fairly simple and involved only a standard 5V supply for power. We did ensure that wires were not exposed and that all components had their leads trimmed to avoid falling off and potentially becoming choking hazards. We did use several fairly large capacitors, and we made sure to place them in locations where their leads were not exposed to avoid any potentially dangerous short circuits.
Our design met our basic expectations, but also revealed some interesting features of DCT transforms that would be interesting to explore further. We saw strong linear correlation between coefficients that manifested itself in higher pitched signals lighting up more LEDs and not just LEDs corresponding to higher frequencies. Building on the device itself, it would be very interesting for further projects to explore how the visualization of DCT coefficients can be used for actual analysis of the audio signal or for gaining information about the DCT process itself. Further projects could also focus on increasing the scale of the project- perhaps using more lights or incorporating motorized elements to create an even more aesthetically appealing way of visualizing music.
Our project met all applicable standards and requirements. We were well within budget with a total cost of $44, almost all of which was related to rentals of lab stock. All software design was done in C using the GCC compiler, and all calculations were performed on the MCU.
Our design does include code written by others, which we have cited both in the code itself and within this report. We used code provide in class for communicating with PuTTY via UART, written by Joerg Wunsch. We adapted code written by Bruce Land for debouncing a single push button. Our DCT process makes use of code originally written by Yann Guidon and adapted for GCC by Bruce Land, making use of previous work done by Luciano Volcan Agostini, Ivan Saraiva Silva, and Sergio Bampi (for the 8-point DCT) and KR Rao and P Yip (for combining two 8-point DCTs into a 16-point DCT and combining two 16-point DCTs into a 32-point DCT). We did not modify this code at all, although we did write code to incorporate it into the structure of our software design. All other code and all hardware design are of our own creation. We do not anticipate any patent or publishing opportunities for this project, as it is intended as a fun and interesting visualizer rather than an analyzer. It could certainly be used to demonstrate mathematical principles or features of an audio signal, but does not lend itself directly to publication.
We made every effort to remain consistent with the IEEE code of ethics, and we believe we were very successful in this regard. During our entire design process we remained aware of section 7.8.1 and made sure not to endanger ourselves or other studens in the lab, or to make any design decisions that could endanger the user or the public. We were very open to criticism and advice from course staff and peers, and we have credited all aspects of our design that relied on work done by others, in accordance with section 7.8.7. We also believe that this design could be extremely useful in achieving the goals of section 7.8.4 and improving the user’s understanding of the DCT and the characteristics of audio signals.
We do not believe that our design could incur any legal issues, other than perhaps copyright claims which could arise if a user chooses to film the device in action with copyrighted music playing.
Source: Frequency Visualizer