Detecting a visual target
The main hardware component that you need to detection a visual target is some form of a camera. For the FRC 2013 contest you have several choices. These include the Axis IP Camera, Microsoft Kinect, or other webcams such Logitech HD and Microsoft. In fact, any webcam can be used as long as it is within the purchase price restrictions placed on any COTS (Commercial Off The Shelf) part added to your robot.
If possible, we recommend using the Kinect in its IR mode to detect targets. The Kinect is typically used for Depth reception and while this could be used for detecting the targets its depth range is limited. The Kinect perceives depth by projecting a spotted pattern and determining the displacement of those dots. You cannot see these dots as they are in the IR (Infrared) wavelength that is not perceptible to the human eye. These dots are then captured by the IR camera built into the Kinect and then processed in order to determine depth.
While calculating depth is useful, we are more in interested in using the IR space to detect the retro-reflective tape and thus the visual targets. The IR space is typically less noisy than the visual spectrum especially if you can eliminate any visible light from the IR camera (which the Kinect does). By using the IR space to detect the visual targets, you will get a much cleaner image with very few visible objects which requires a lot less processing in order to understand the location of the target.
As detecting displacement of the dots is used for depth perception, the IR image capture from the Kinect's IR camera is already calibrated. This means that distortions caused by lens warping has already been dealt with before you even have access to the image. This is convenient as it is yet another step that you don't need to do or worry about. This is not the case for the Axis camera and some webcams.
The disadvantage of using the Kinect is that you will need an onboard netbook/laptop/etc. on your robot in order to plug the Kinect into. Because the Kinect is not a typical webcam, using smaller embedded devices to create an image rely (like the Raspberry PI, BeagleBone, RockChip) is possible but beyond the scope of this tutorial.
Axis IP Camera
The provided Axis camera can also be used to detect visual targets. Ideally, you would use this camera along with a high powered led ring light that will help to increase the intensity of the retro-reflective targets. Again, the goal is to isolate just the targets from the rest of the image such that identification is easier and quicker.
Note that while camera's are typically more sensitive to green light, the JPEG encoding performed within the Axis camera will cause precision problems as color is heavily compressed within JPEG and can lead to inaccurate artifacts. The recommended light color is white as that is best preserved by JPEG compression and will provide the best results.
The advantage of the Axis camera is that images can instead be processed off the robot at the driver station. While this can be easier than installed an onboard laptop it has some serious disadvantages. See below for those concerns.
If you are unable to acquire a sufficient LED light for the camera, you can still use the RGB image as is and process that for the target. This will work but may be slow and more prone to invalid detections as compared to an IR or LED enhanced technique.
While the Kinect or the Axis camera provides the ability to capture images that can be used for detection a webcam (esp. a HD one) can provide some desirable features. For example, a webcam can be much smaller than the Kinect or the Axis and thus mounting it in a strategic place (such as on the end of launching mechanism) can make the math behind the targeting system a lot easier. While using a webcam does require some form of host PC embedded systems to relay the image back to the driver station can be used. But perhaps the best feature of using a webcam is in its ability to zoom in and make the target much larger given a specific image size. For example, the Kinect can use a maximum image size of 1280x1024 but you still have to process that entire image. The Axis camera can only go up to 640x480 so you are limited to just that resolution. A HD webcam can exceed these resolutions BUT you would want to keep the image size to 640x480 and use the digital zoom to access the same details at a further distance. This means that the image you capture at 15ft can be as large and as detailed as what you capture at 30ft assuming you have zoomed in.
Note that care must be taken not to assume that a digital zoom will always produce more detail. In many webcams that stop at 640x480 you can still "zoom" in but this is simulated by simply scaling the image within software. In cameras that can produce very high resolutions, this simulated zoom only starts once you have exceeded the possible hardware resolution.
Combining this high resolution with a high powered LED light around the webcam can offer a very competitive solution to the Kinect device.
Where to process?
Sending images (even at 320x240) back to the driver station for processing is network dependent. While your tests in your isolated network may work just fine, keep in mind that during the actual contest, you are not the only robot on the field streaming back video. This increase in network reliance can cause hiccups or delays in images that may cause the robot to behave in unknown ways at critical points.
Lets suppose that the network at the contest is 10x fast than what everyone needs. With this you could be streaming at 30fps (frames per second) which is great. The problem is that each frame will be delayed by as much as a second from reality. This is caused by the time it takes to capture an image, compress it to some sort of image format, send it over the network and then finally view it on your laptop. For slow moving robots this should not be a huge concern but if you plan to perform tracking while moving or want the best reaction time processing on the driver station will create an unacceptable lag. This lag will typically manifest itself as an oscillation of the robot or targeting device.
For example, if we think about a specific moment while tracking a target you can gain a better understanding of why this is an issue. Say your camera has just taken a picture. The picture at that moment represents reality and would indicate that the robot needs to move more to the left. Compressing it take about 100ms (millisecond). Transmitting it to your driver station takes about 300ms. Processing it takes about 200ms (that would equate to 5fps), sending back the commands based on what you processed takes about 100ms (the commands are just variables and much smaller than the image). Now your robot reacts to those commands ... but that means in this example that the robot is reacting to 700ms old data. If the target has now moved during that time and now instead should cause the robot move right the robot will always be moving in the wrong direction. As the system will eventually catch up to the correct movement, it will tend to oscillate or zigzag as it moves. Reducing this lag from reality will create a much smoother movement and also allow you to move quicker (alternatively, you can slow your robot down to reduce the zig zagging).
Keep in mind that the times above are only for example purposes and do NOT reflect actual times that you may experience. The best way to experience this lag is to setup the camera on the robot streaming back to the driver station and while looking at the driver station open and close your fist in front of the camera and see how long the image takes to show the correct open or closed fist. This test allows you to 'feel' when you hand is open or closed and watch how long the video takes to update. See if you can create a frequency of open and closed fists that will be completely out of sync of what you are seeing in the camera ... and yet you may still be getting 30fps.
If possible, adding a netbook/laptop on the robot will provide the fastest reaction times and allow you to
eliminate any network bandwidth issues regardless of the image size you plan to use. Images can still be
fed back to the driver station for viewing purposes, but reaction based processing would be done onboard