Projects Tutorials Code Library for Atmels Atmega32 AVR > Video - Camera - Imaging Projects > Sound-localizing Camera

Sound-localizing Camera

Wednesday May 11, 2022 / Muhammad Bilal

Categories: Video - Camera - Imaging Projects
Tags: Camera, Cellphone, Hardware

Contents hide

1 High Level Design

1.2 Background Math

1.3 Logic structure

1.4 Hardware and software Trade-offs

2 Hardware Design

2.1 Microphone Circuit

2.2 Motor Control

2.3 Camera Control

2.4 Alternative Way to Control Camera: Bluetooth

2.5 Mechanics Design

3 Software Design

3.1 Determine the turning angle

3.2 Servo motor control

3.3 Alternative method to servo control: PWM

4.1 Speed of Execution

4.2.1 Correctly detect impulses

4.2.2 Safety Concern for Human Body

4.2.3 Safety Concern for Personal Property

5.1 Our expectations

5.2 Further Improvement

5.4 Intellectual property considerations

5.5 Ethical considerations

5.6 Legal considerations

What did we do?

The iPhone camera platform designed in this project can turn its direction to face wherever a nearby hand-clapping or other similar sharp impulse comes from. If a person claps hands for more than once in the same direction in reference with the mobile phone, this platform will instruct the camera to take a picture for each claps detected. If the person moves to a different location and claps his hands, the camera platform will adjust its direction accordingly. The system can distinguish between hand claps and most surrounding background noise such as normal talking.

Why did we do it?

When we travel alone, it can be inconvenient to take pictures with yourself in it. Often, we have to hold the camera with our hands and capture a close photo with barely background. In a better case, we still need to pre-set a short timer, which requires us rushes back from the camera. Therefore, we came up with platform that uses microphones to detect clap direction and ATmega1284p microcontroller with servo motor to control the camera and Apple remote earpod to shutter. This design makes selfie much more easier and convenient since all you need is to choose your favourite scene and clap the hands. It will also help friends or family in which case no one should be left out of the group photos. Additionally, the platform could also be modified as an “intelligent” monitor that automatically turns on, adjusts the direction and records upon suspicious sounds.

High Level Design

Rationale

The direction of the clapping source can be calculated based on the time delays between microphones, which is comparably long enough for the microcontroller to detect and measure the difference accurately.

Three microphones are placed at three vertices of a equilateral triangular among which the first two microphones that detects the impulse are used to calculate the direction from which the clapping generated from. Then, the calculated angle is compared with the current direction of the platform. If the difference is within six degrees, a shutter signal is sent from the microcontroller to the iPhone camera through the earpod connection . Otherwise, the chip controls a servo motor mounted in the center of the triangular to adjust the direction of the camera towards the clapping source.

Background Math

Approximate speed of sound in dry air, in meters per second:

Vs = 331.3 + 0.606*T

where T is the temperature in degrees Celsius and 340 m·s−1 is used in our calculation.

Angle calculation in degrees:

Theta = (180/pi) * arcsin((delay * Vs) / microphone distance)

where Vs = 340 m·s−1 and microphone distance = 0.2m. Assuming zero degree to be the central line of two microphones, the calculated theta is within 90 degrees in either direction.

Servo motor control:

width = servo_min + Theta*(servo_max – servo_min)/360;

where servo_min is the minimum pulse width and servo_max is the maximum pulse width for servo motor corresponding to 0 degree and 180 degree. The formula could determine the value of pulse width to different rotation degree.

Logic structure

Three microphones are built into the analog circuit in exactly the same way. The signals are amplified, high passed, low passed, amplified again and finally converted to digital signals and directly connected to the external interrupts of the microcontroller. The RC circuits filters out background noise of low frequencies and passes the clapping impulse which is in high frequencies. Because an inverting Schmitt trigger is used at the last stage of the circuit, a falling edge followed by a rising edge represents a valid sound pulse detected by the microphone.

Three interrupt service routines are programmed for three external interrupts from the microphones. The first ISR executed zeros the timer2 and stores its corresponding microphone number. When the second one comes in, it captures the timer2 as the time delay, stores its number and set a flag to be high to indicate that it is ready to calculate the angle. The third ISR is ignored since the stored time delays and two microphone numbers are the only data needed for the calculation.

The relative angle is calculated with the formula described above and then converted into an absolute angle between 0 and 360 degrees of the entire platform according to the stored microphone numbers. Finally, if the absolute angle is roughly equal to the angle calculated previously, the microcontroller toggles the output port to shutter the camera. Otherwise, the angle is converted into a calibrated PWM signal, which outputs directly to the servo motor. The servo motor therefore points in the direction of the sound pulse corresponding to the control pulse signal.

Hardware and software Trade-offs

Instruction	Function
Jump N	Immediately jump to routine N
Outer Loop Header K	Serves as the top of a loop and performs initialization to do K iterations of the repetend
Inner Loop Header J	Same as above, but for the inner loop
Routine Header	Denotes the beginning of a routine (index inferred by parsing) and displays it for 1 second
Display Clock	Displays the time of day in hours:minutes format
Set Clock	Prompts the user to set the clock in hours:minutes format
Routine Prompt	Prompts the user to select a routine
Scheduled Start	Prompts the user to set a workout start time,displays clock until that time, then proceeds
Programmed Pause	Halts and blinks the colon until user advances
Outer Loop Footer	Handles looping back to the top of the outer loop if necessary
Inner Loop Footer	Same as above, but for the inner loop
Screen Test	Demonstrates the screen by cycling through numbers and the short segments
Display Cycles Remaining	Displays the number of cycles remaining in a loop for <1 second
Millisecond Timer	Starts the 10-millisecond timer (counts up only)
Minutes Data	Encodes minutes (should be 0..59) in BCD and a count-up/count-down bit
Seconds Data	Encodes seconds (should be 0..59) in BCD

Hardware Design

Microphone Circuit

Microphone is used to detect the impulse voice. In order to detect sound and use this analog signal to trigger interrupt in microcontroller, the signal processing is really important. The circuit for microphone is shown in Figure .. The signal from microphone is processed with following seven steps:

Step 1: Initial microphone biased

The 2K ohm resistor is used to provide a biased voltage to microphone. The capacitor connected between the microphone and first amplifier because the output signal should be AC-coupled to the amplifier.

Step 2: Non-converting amplifier with 100 gain.

Since most of the microphone generate very low signal, on the order of a millivolt or less, especially when clapping in a distance, the signal need an amplifier before other signal processing.

Step 3: Low-pass filter with a cutoff at 800Hz.

The low-pass filter is used remove the unwanted frequency, since we just want to detect the clapping sound and avoid environmental sound interference the signal, and as we search on the Internet, the frequency of clapping lower than 800 Hz. So here we implement a low-pass filter with cutoff frequency at 800Hz.

Step4: High-pass filter with cutoff frequency at 150Hz

The two filter combine a bandpass filter allows signal from 150Hz to 800Hz pass through.

Step5: Tunable amplifier

Considering there could be a loss of energy after two filter, the second amplifier is used to increase the strength of the signal. Here we use a tunable amplifier because this could help us to adjust the gain of final amplification. This is very useful when we use three microphones at the same time. The tunable amplifier could coordinate the output of the three signal.

Step6: Peak detection circuit

The peak detection circuit is used to monitors a voltage of interest and retains its peak value as its output.The circuit generates a peak when it detects an impulse signal which could determine the time when the impulse signal comes. The diode is used to thresh the voltage, capacitor and resistor are used to determine the slope to rise and the slope to drop. The larger the resistor, the slower the dropping. In order to maintain the peak signal for enough time, here we use capacitor with 0.01uf and resistor with 1M ohm.

Motor Control

In order to guarantee the accuracy of turning angle, we need to use a motor whose turning angle and direction are controllable, so we use servo motor to control the turning direction of the platform. There are three lines connected with servo, one connects with power supply (red), one connects with ground black) and the other one is for control signal (yellow). The control signal is generated from microcontroller.

Camera Control

In order to get a picture in the direction where a sound impulse comes from, this design choose smart phones to be the image capture device. Since nowadays most people carry light weight smart phones which have fairly high quality built-in cameras, the platform might gain some popularity if it can flexibly support a large variety of mobile devices. With the users providing the phone, the design of the system only need to focus on the standard interface to smart phones instead of other image-capturing devices. At the same time, the cost of the camera in our system can be omitted. Besides, the communication to the phone can be implemented with higher-level APIs provided by Android or iOS operating system and the functions of camera can be easily extended.

Apart from using wireless transmission to control the camera, iPhone specially provides another way to release the camera shutter using wired hardware, more specifically the Apple EarPods. As is commonly known, Apple EarPods includes a build-in remote which is more often used to answer or end calls and control the playback of music and video. Surprisingly, it can also be used to trigger the shutter – simply connect it to an iPhone, open the camera app and press volume-up or volume-down button on the earbud.

In order to “hack” iPhone headsets, it is important to notice that iPhone’s phone plugs are TRRS connectors with 4 contacts instead of TRS connectors with only 3 contacts. (T stands for “tip”, R stands for “ring” and S stands for “sleeve”). The picture below shows what kind of signal each of the contacts are in charge of. The two contacts which are in use when controlling the camera are the sleeve for microphones and the ring for ground. After breaking into the switch circuit of an iPhone headset (the picture is hard to retrieve), we identified the two leads connected to microphone and ground by eliminating two others connecting to the left and right earbuds. We noticed that the buttons are mechanically shorting these two leads and output a zero voltage between microphone and ground.

Based on these knowledge, it is very simple to simulate the mechanism of pressing button using the microcontroller. To take a picture, just pull down the voltage at the microphone’s lead to zero; while for the rest of the time, keep the voltage at constant 2.5 volts. 5 volts, double the voltage of standard microphone output, works as well, but might cause damage to the phone. Note that if the duration of the low signal is too short, the shutter won’t be triggered. We feed a low signal for around 50 ms, which proves to be long enough.

Alternative Way to Control Camera: Bluetooth

A wireless communication between the platform and the phone would avoid the earbud wires from twisting around the axle when the platform is turning. Bluetooth technology is an easy solution to realize such short-distance communication.

All version of Bluetooth specifications from v1.0 to v4.2 are downward compatible. Bluetooth low energy (Bluetooth LE), marketed as Bluetooth Smart, is a subset of Bluetooth v4.0 aiming at very low power applications such as smart home, health and fitness. As an alternative to Bluetooth standard protocols introduced in Bluetooth v1.0 to v3.0, however, BLE has an entirely new protocol stack for rapid build-up of simple links, and thus it is not backward compatible with the previous, often-called Classic Bluetooth protocol. Still, Bluetooth v4.0 specification permits devices to implement either or both of the LE and Classic.

The Bluetooth serial module purchased for this design project is model HM-10 (check Appendix for data sheet). This transceiver module is a Bluetooth LE device that can realize transparent data transfer to another host Bluetooth device. If a phone can be equipped with iOS or Android version 4.3 or above, it indicates that the Bluetooth hardware of the phone supports Bluetooth LE. During the project design period, our group was not able to develop with an idle Android smartphone which is updatable to Android 4.3; while developing an Apple App requires a Mac with iOS, which was also not available to our group. So we suspended this approach to control the camera after merely realizing the transparent data transfer between the module and a borrowed phone.

Mechanics Design

The mechanical part for our system is constructed with a triangle cardboard, the three microphone are mounted on the three vertices and the distance to each other is 0.2m. The servo motor and cell phone mount are located at the center of the triangle platform.

As shown in figure .., the gear on the servo motor is 64 tooth with 32 pitch. And on the cellphone mount the gear is 32 tooth which is connected with a shaft. The ratio of the teeth between the two gear is 1:2. Since our motor could just rotates between 0 to 180 degree, the ratio between the gear could achieve the cellphone rotation in a round. On the top of the shaft, a phone mount is used to fasten the cell phone. Since the noise of servo motor is really low, so the motor mounted in the center of the three microphone will not interference the judgement of source sound direction.

Software Design

In order to determine the direction of the sound source, the system first needs to decide the section of the surrounding area (see Figure below) from which the impulse comes. By listening to the first 2 microphones that receive impulses, the system can judge that the source locates at the area nearest to these 2 microphones. Then the angle of the direction can be calculated from the time delay between the impulse arrivals at the 2 microphones. The first step, logging the necessary input information, is handled by the external interrupt service routines in cooperation with the timer; while the second step, interpreting the input information and execute accordingly, is done by the main loop.

Logging and interpreting the input signals can be done using a state machine, which was used in the other course final project Acoustic Impulse Marker. In this software design, the code for this part is written in another way. Here, ISRs are signed the logging task which requires very little amount of execution. The logic of each external ISRs are identical and it can be best expressed using the following pseudocode. The three microphones are labeled respectively with “A”, “B”, “C”. The ISRs marks the 1st and 2nd impulses with corresponding microphones’ label on which these impulses arrives. A timer is used here to clear the logged impulses after a certain timeout.

Actually, the code written in C is not written in such encapsulated style, less readable but much more efficient to be executed in the interrupt service.

Determine the turning angle

As introduced in last part, each time the first two interrupt occurs, the two interrupt would also record which interrupt firstly logged and which one is latter one. Here we set microphone connected to interrupt 0 as zero degree. Other angle are the result relative to the zero point. The relative angle to the first two microphone could be calculated from

Theta = (180/pi) * arcsin((delay * Vs) / microphone distance )

Assuming zero degree is the central line between the two microphone. Theta is the angle relative to zero degree. By determination of the first two microphone, Theta could convert to the absolute angle on platform. Since there are six possible conditions on receiving microphone considering the receiving order. So the angle could be separated into six part. But Theta is the result relative to the central line between the two microphone so the converting processing is shown in Figure..

Servo motor control

The servo motor is controlled by a pulse signal with the width ranging from 1ms to 20ms. The interval between two impulse signals is strictly 20ms. The turning angle of servo motor is relative with value of pulse width. The minimum pulse width, 1ms, is corresponding to 0 degree, while the maximum pulse width is corresponding to 180 degree. The turning angle of servos is absolute, which means when a fix width pulse sending to servo, the servo will hold that position even the same signal sends again. The constraint of the pulse width is very important, if pulse width is out of range it could destroy the inner structure of servo motor.

In order to control the rotation angle accurately, the control signal could generate more value between 1ms and 2ms. Since TCNT0 is an 8-bit register, the overflow occurs each 16us. This gives us enough precision to specify a pulse between 1ms and 2ms, and also divides 180 degree into more sections. So here we use overflow interrupt as time base, and use a counter to record the time. As is shown in Figure..

Parameter “width” is the width of pulse signal, “interval” is the interval between the two pulse signals. When counter is less than width, set output as high while counter is larger than width, outputs low. Width could change according to the turning angle. Since here we just use interrupt to generate pulse signal, we could set any port as output without the limitation like PWM.

Since the interval of each interrupt is fix, we could calculate how many times the interrupt occurs during 20ms, 1ms and 2ms. Then we could determine parameter interval, servo_min and servo_max. So for each angle, the width of the pulse could be calculated with the following formula:

width = servo_min + angle*(servo_max – servo_min)/180

Since the tooth between servo and cell phone mount gear is 2:1, so the turning angle of the platform (Theta) is two times larger than servo turning angle. So when we get the angle of the cell phone mount, we could compute width with

width = servo_min + Theta*(servo_max – servo_min)/360

Alternative method to servo control: PWM

Our initial method is using PWM to generate control signal. Since PWM could generate fix interval signal without interrupt, and for servo control the interval is fix. So PWM is a easy way for servo control. Comparing with frequency of fast PWM mode, the frequency of control signal is relative low, so we choose phase correct PWM mode. To generate a 20ms interval signal, we set WGM2:0 = 5 and TOP is defined as OCR0A. We could set the value of OCR0B according to the turning angle. This method could control the servo. But it decreases the accuracy of control signal. Since we set OCR0A as 156, which means the value of OCR0B should less than 156. The range of OCR0B is 128 to 151, there are only 21 value in the range, so the turning position could only divided into 21 part, each is 8.5 degree, for the cellphone mount, the minimum turning angle is 17 degree. This will limits the turning angle of the cell phone mount.

Results

Speed of Execution

This system demonstrates no hesitation to execute the required tasks after correctly having detected a valid clapping. Since there is no need to layer any operating system in the software to schedule multiple concurrent tasks, the main loop updates the angle and takes action immediately after the ISR puts up a flag indicating two of the three mics are triggered by an impulse. In this way, the system ensures that the critical task of time acquisition is done in ISRs; while the time-consuming analysis and calculations are done in the main program.

The only factor that may diminish the interactiveness of the system is the notable amount of time needed to accomplish the action of mechanically adjusting the direction of the motor. However, for an application not demanding high urgency, it is fast enough to have a maximum of 0.5 second to rotate a full 360 degrees and stabilize itself from vibrating after deceleration. This characteristics does require that the user makes the second clap to instruct a shooting after visually confirming that the phone has turned stable in the right direction; otherwise, the photo taken by the camera might be blurring. Additionally, roundabout 50 ms is spent on generating a low level signal to fire the shutter; plus the time needed for the smartphone to save a image, about 10 photos can be taken in one second, which is fast enough for human’s usual clapping frequency.

Accuracy

Correctly detect impulses

The platform could always detect impulse sounds such as clappings or bursts of shout. It is also efficient enough to exclude smooth sounds such as music or human speech.

But we know that the power of the sound would decrease with the distance increase, so when we clap in a relative long distance, the power of the sound would be not loud enough to the microphone, which could lead to the miss detection of sound. But in our system, the circuit could detect impulse sound in about 1.5m.

Since our bandpass filter is from 150 to 800Hz, so the sound with greater frequency or lower frequency could will not affect the circuit. This could help to eliminate the influence of environmental sound, but on the other hand, if clapping frequency is not in the range will also lead to the miss of detection.

Safety Concern for Human Body

Considering the centrifugal force of the platform, the cell mount should fasten the cellphone well, or cellphone could be throw out and hurt people. But here we use a phone mount that could stuck on the platform, even the speed of turning is fast, it could also make sure that this accident would not happen.

Safety Concern for Personal Property

For the current models of iPhones that have been released into the market, the platform is mechanically rigid enough so that it won’t be hurled out of the cell phone mount when the servo motor stops with a large deceleration force. The key mechanical components have been bond and the loose connectors have been glued. However, if a bigger and heavier smartphone would be put on the servo, the base of the platform needs to be further fixed on horizontal surfaces or needs additional amount of weight. Before putting into real-life application, all the wires need to be neatly tied to the frame to eliminate the possibility of the system components being torn apart.

Conclusions

Our expectations

Our goal of the design is an iPhone camera platform which can turn its direction to face wherever a nearby hand-clapping or other similar sharp impulse comes from. And when a person claps hands for more than once in the same direction in reference with the mobile phone, this platform will instruct the camera to take a picture for each claps detected. We achieve most of our goal in the end, our current system could detect the sound source even in a noisy environment, and we could control iphone taking a picture if we clap twice in the same direction. But accuracy of the detection is still not good enough, sometimes iphone could not face to the right direction with one clapping, it needs another clap to correct the direction. And it is not that sensitive to the difference between old angle and new angle, so sometimes when we changing the clapping direction, it could not detect the change and fire a photo again.

Further Improvement

Our design is still insufficient in some area, for example, the material for the platform is cardboard, which is relative light. When the cellphone is turning, the centrifugal force could move the whole platform, this would change the position of the platform. Also, the phone mount is not that stable, the stable of iphone mount completely depends on the glue between the mount and shaft. So we need to change type of phone mount which could enlarge the area to the shaft. Furthermore, the servo motor generates pulse signal back to the circuit, to avoid the interference of the signal, another circuit for servo controlling should be built to separate servo motor from the microphone circuit.

Standards

Our project design is not based on any major standards from IEEE, ISO, or any others.

Intellectual property considerations

All code was written by ourselves independently, we did not reuse code from any other one or code in the public domain. But when we built our platform for cellphone mount and design the algorithm for detecting the sound, we has reference to a previous final course project Acoustic Impulse Marker done by Adam Wrobel and Michael Grisanti.

In our project, we did not sign non-disclosure to get a sample part. But in order to control an iPhone taking a picture with microcontroller, we “hack” a iPhone headset. Since iPhone specially provides a way to release the camera shutter using wired hardware on earphone, we broke into the switch circuit of an iPhone headset, and connected two leads connected to microphone and ground to microcontroller. So our design is based on the technique of Apple company. But other part of design are based on the assistant of TA and professor Land in lab. For patent or trademark issue, I think it is our responsibility to avoid it. When we are reference some material, we should mark them out. And if we want to apply their work in our own application, we should firstly get the permission from them. So I think it is necessary to point out that what we use and protect the detail of their technology from copy.

For the patent opportunity, we have not considered about that. Since there are so many works on acoustic detection in recent years, we are certain that there are many patent in this area which is relative to our project. And we could search many papers or report about detection and recognition of impulsive sound signals, their method and algorithm are more advanced than our projects. And their methods are more applicable to practice application. So we think there is little publishing or patent opportunity for our projects.

Ethical considerations

According to the list of IEEE code of Ethics, we think we did our projects consistent with the code of Ethics. For example, we are “honest and realistic in stating claims” according to our real data. When we built the circuit, we record the parameters of each component carefully, and we use oscilloscope to record the output of the circuit. And we are honestly state our insufficiency in our projects, for example, sometimes the reflection of the platform are not correct, we point this out and analysis our shortcome according to our result. Secondly, we honestly states the reference material in our projects. Even though some of the outcome looks similar with other, we develop and solve the problem by ourselves or the aid of TA and Professor Bruce Land. Professor Bruce Land provided many good suggestions for our projects, especially the circuit for microphone. This helps us a great improvement in our project. Finally, we try to figure out the problems during our projects, this helps us a further understanding of the relative technology.

Legal considerations

Since there is no communication system in our project, so there is no transmitter. And our microphone are just used as a sensor to detect sound, the transmitting signal is the sound of clap, so we did not transmit any other signal or save signal. And all of our component are get with legal way, some are purchased online, and other component are get from lab or our own stuff.

Source: Sound-localizing Camera

About The Author

Muhammad Bilal

I am a highly skilled and motivated individual with a Master's degree in Computer Science. I have extensive experience in technical writing and a deep understanding of SEO practices.