Stereo Depth Mapping
Anonymous from United States  [28 posts]
12 year
I've been doing more and more research trying to find the best alternative to Stereo depth mapping. I've come across a couple of things and have just built a camera rig that houses two cameras at 4" (101.6mm) apart.

I'm able to use the Mosaic, Stereo, and other modules to grab both images, but trying to do any type of matching between the two I'm at a loss.

I'd like to do a feature map between then, SURF or SIFT style, just to get the major points, and then I could use my algorithm to calculate the offset of the other pixels to apply an 8bit gray scale to get the depth map.

I'm not trying to track one object, as that has been pretty simple to determine the distance to it by flipping between camera one and camera two and checking the COG_X value between that item, however I am now needing to depth the entire scene, so I believe finding the feature points that match between the images, and then doing the triangulation between those matches will be the way to handle it...

I could write an offsite plugin for the pipes or API using OpenCV but I was hoping that someone had some insight for doing this straight with RoboRealm.

Steven, I noticed after reading a bit that the Stereo module is not supported. I couldnt get it work as I believe you are set up to run two cameras differently from my positioning (4"), and I believe I could get it to work better if I knew the offset I needed.

Currently I can use this to figure depth:

distance = tan((pi/2) - tan^-1(offset/factor)) * camera_separation
factor = offset / tan((pi/2) - tan^-1(distance/camera_separation))

I had to build a table of factors for multiple distances, and am working on an algorithm to process it from there. This really depends on one object though, and I would like to do the entire scene.

My camera_separation = 101.6mm, and here is a list of my factors for distance, the value is the pixel offset between the object in the two frames:

200 mm - offset = 190
300 mm - offset = 135
400 mm - offset = 100
500 mm - offset = 80
600 mm - offset = 67
700 mm - offset = 57
800 mm - offset = 47
900 mm - offset = 42
1000 mm - offset = 40
1200 mm - offset = 32
1500 mm - offset = 25
2000 mm - offset = 15

This was very rough estimates, but I believe it'll be close enough to do what I need it to.

I think that by feature matching between the two images, I can run those pixels through my algorithm by checking the offset between the features. Doing this across the image with a given window size and search should yield pretty good results, I just need a way to run it through a feature scanner to match camera 1 to camera 2.

Any ideas are greatly appreciated!

I'm including a side by side of the images that I last captured. Again, the two cameras are parallel, and 4" (101.6mm) apart.


Anonymous from United States  [28 posts] 12 year
Did more calculating, and it seems that my table is a little off as I had thought.

Going back to the Society of Robots site:

I've tracked back to figure my focal length as a value of 393.7

Being as my cameras are separated at 101.6mm, I can derive the Z_Actual by the following formula:

Z_actual = (b * focal_length) / (x_camL - x_camR)

Doing this backwards with the table I have above, I was uncertain what my focal_length was, but was able to find it with the tape measure against the offsets that I recorded between all the distances. It seemed that 393.7 was the number that kept popping up.

My final formula would be:

Z_actual = 40000 / (x_camL - x_camR)

So as I'm doing my feature matching across the scene from two images, I can calculate the offset in the X values, and then figure out the millimeters I am from that object. Working the way across the entire scene will give me a whole scene of depth.

However, I'm still lacking a way to check features from one camera to the next.

Hopefully what I've done so far will help, but maybe someone could give me insight for the two images matching features.


Anonymous 12 year

While valuable for later use the above does not help much with the actual determination of depth from an image. I've tried a couple things with mixed success and need more information as to intended purpose in order to make the best recommendation. You mention that you need a full depth map. Is this to be used for obstacle avoidance? Indoors or outdoors?

The reason for my question is that most likely your 4" separation is too large for close purposes (note the tape is completely off in the second image which is causing matching issues). Are you able to change the rig such that the cameras are as close as is physically possible and take an image then? For close objects even 1 cm is sufficient and left/right matching is much improved by that.

If not and your intent is for distant objects, perhaps you can try taking images in that situation ... the close proximity of the floor is causing a lot of errors.

Note that you probably did get the stereo to work (just keep increasing the offset and check the left/right switch to see which one is better) but the best result I can get looks like crap anyhow! If you don't know what to expect its hard to fine tune the module to the best result. If you need to use the rig as is I can forward the best image I could come up with ... but again, in my opinion, it looks really bad and is not very usable. Closer images would improve it ...

Anonymous 12 year
You can see what I mean by the attached.


Anonymous from United States  [28 posts] 12 year

This is for an outside rig, and yeah, will need to do the entire image for object avoidance mostly.

I've been playing around with some of the other features trying to second guess my attempts. I've redone your tutorial of object detection/avoidance but I can't get it to work well with a edge detector outdoors with grass and side walks as it thinks there is an object in the way.

I *can* move the cameras closer together. I was hoping for about 6" to around 84-96" of depth off of this rig, thats why I went with the 4" spacing.

The stereo image that you have is a VAST improvement from what I was looking at, LOL.
Anonymous from United States  [28 posts] 12 year
I will take the rig outside and get a snap shot of each camera and get em up here as soon as I can.
Anonymous from United States  [28 posts] 12 year

Here is 4 pictures taken outside at two different scenes with a fair amount of 'objects' in them.

I had to snap them individually while holding the rig, so the Y may be slightly off from me moving a little bit, but the basis idea is there. Two are from the left camera, two are from the right camera.

I must say, i LOVE this webcam for outdoor use.

Anonymous from United States  [28 posts] 12 year
I've got the pipe function running over into C++ for now, at least the data is there and I can grab/modify the pixels.

Still playing with the routine to push that data into an openCv buffer to see if i can do any processing from there, and then back into RR format.

I'm using the 3d viewer to correct the alignment of my right image, and setting it up as a side by side, then I'm piping the whole image (1280x360) over to C++, and then piping it back. It's really fast and I'm impressed! Right now I'm just overwriting the pixels with a solid color to make sure I can at least grab the left and right frames correctly out of the solid buffer that RR ships over. Seems easy enough, now to flip it into OpenCV will take me a few days, gotta read up on all of that... :)

Steven: Is there a reason you always start with the bottom left pixel instead of the top left pixel? Just curious...

I'd love to see a quick and easy way to merge the data between RR and OpenCV and back, so I plan to publish whatever I can get going to you guys.
Anonymous 12 year

Sounds great ... I would skip a couple steps right now and test your image directly in OpenCV. We already did that and didn't like the results due to the large disparity (sorry, back to the 4" spacing again). So be sure you get good results in OpenCV before doing this integration.

If you can't get the spacing to be smaller can you point the cameras towards each other slightly as in pivoting your eyes to focus on a closer object? This will help to align the images better.

What you will probably notice in OpenCV is that a lot of noise is being generated by the near ground areas so unless you plan to just ignore that entire area you will need images with closer alignment. Does not make much sense if you can detect a 10ft away object when you are about to hit a 1ft away object that you cannot see!

Anyhow, we are working to improve our current Stereo module so that the results will be better defined with less blurring and errors ... but that will take additional research/testing.

Anonymous from United States  [28 posts] 12 year

Sounds great. I can easily redesign the bracket for less spacing between the cameras. The speed that we're hoping to travel is the reason I was hoping to shoot for objects farther away, but I don't mind testing and testing and testing, it's what this is all about.

I think the next bracket I will put them as close together as I can and do more testing. I'll post a couple of images with that set up to give you something to play with as well.

I did manage to play with your module a bit more, but I had to decrease min disparity to -20 or so, and then I couldn't tell what was directly in front of me, it only picked up distant objects.

Really wish the Kinect would work for this application as it'd be a ton easier to implement :) Stupid sunlight....

If I can do anything to help you guys out, please let me know, completely willing to open up my rovers to your testing.
Anonymous from United States  [28 posts] 12 year
Just changed my rig, will try to get images from the cameras tonight.

Anonymous from United States  [28 posts] 12 year
Here's the latest indoor shot. The new rig i posted a pic of above has a 1.5" spacing between cameras.

Anonymous from United States  [28 posts] 12 year
Going off of the example C++ program for PIPEs for Roborealm, I've made some changes to be able to convert the data into a usable OpenCV structure, pretty simple basically.

This assumes you are sending a side by side image using the Mosaic to grab two cameras (or Stereo or 3dview and outputting side by side).

This will take the data and convert it to a left and right frame for OpenCV into a IplImage.

This also assumes all of your OpenCV structure is set up correctly for your compiler to find all of the links.


#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "opencv2/calib3d/calib3d.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv/cv.h"

// size of the pipe buffer used to communicate to/from RR
#define PIPE_BUFFER_SIZE 4096

// buffer size used when reading in variable data
#define DATA_BUFFER 1024

// the maximum length of a variable name to read

// hold the image data received from RR
struct ImageDataType
  // hold the name used in RR to identify this module
  char name[16];
  // image processed count ... sent back to RR
  int count;
  // dimensions of the received image
  int width, height;
  // holds the image data
  unsigned char *pixels;

// returns an error message to RR. This message is displayed in the "messages" list within
// the RR Pipe Program interface.
void ReturnError(HANDLE hPipe, char *txt)
  unsigned long bytes;
  int len = 5;
  WriteFile(hPipe, &len, sizeof(int), &bytes, NULL);
  WriteFile(hPipe, "error", 5, &bytes, NULL);
  len = strlen(txt);
  WriteFile(hPipe, &len, sizeof(int), &bytes, NULL);
  WriteFile(hPipe, txt, len, &bytes, NULL);

// returns a byte string variable to RR. The returned variables can be used
// elsewhere in RR for continued processing.
int ReturnBytesVariable(HANDLE hPipe, char *name, unsigned char *data, int len)
  unsigned long bytes;
  int l = strlen(name);
  WriteFile(hPipe, &l, sizeof(int), &bytes, NULL);
  WriteFile(hPipe, name, l, &bytes, NULL);

  WriteFile(hPipe, &len, sizeof(int), &bytes, NULL);
  WriteFile(hPipe, data, len, &bytes, NULL);

  return bytes;

// returns an int variable to RR
int ReturnIntVariable(HANDLE hPipe, char *name, int num)
  char strTmp[32];
  sprintf(strTmp, "%d", num);

  unsigned long bytes;
  int l = 5;
  WriteFile(hPipe, &l, sizeof(int), &bytes, NULL);
  WriteFile(hPipe, name, strlen(name), &bytes, NULL);

  int len = strlen(strTmp);
  WriteFile(hPipe, &len, sizeof(int), &bytes, NULL);
  WriteFile(hPipe, strTmp, len, &bytes, NULL);

  return bytes;

// Performs the image conversion/processing/etc that you want to perform
void ProcessImage(struct ImageDataType *imageData)

  unsigned char *pixels = imageData->pixels;

  int width = imageData->width;
  int height = imageData->height;

  IplImage *limage = cvCreateImage(cvSize(width/2, height), 8,3); //set up left frame buffer
  IplImage *rimage = cvCreateImage(cvSize(width/2, height), 8,3); //set up right frame buffer

  int w = width*3;
  int currentpix;
  int i;

  // sanity check
  if ((pixels==NULL)||(width==0)||(height==0)) return;

  currentpix = w*height;
  int x = 0;
  int y = 0;
  int rightx = 0;
  int leftx = 0;

  for (i=0;i<currentpix;i+=3)

     if(x >= width/2) { //if the x pixel is past center, it's the right frame we need to grab

      rimage->imageData[rightx+2] = pixels[i+2]; //shove data into right buffer
      rimage->imageData[rightx+1] = pixels[i+1];
      rimage->imageData[rightx] = pixels[i];
      rightx = rightx + 3;

      pixels[i] = 0; //manipulate the stream going back to roborealm for the right frame
      pixels[i+1] = 255;
      pixels[i+2] = 0;

     if(x < width/2) { //if the x pixel is less than center, it's the left frame

      limage->imageData[leftx+2] = pixels[i+2]; //shove data into left buffer
      limage->imageData[leftx+1] = pixels[i+1];
      limage->imageData[leftx] = pixels[i];
      leftx = leftx + 3;

      pixels[i] = 255; //manipulate the stream going back to roborealm for the left frame
      pixels[i+1] = 0;
      pixels[i+2] = 0;

      if(x==width) {
        x = 0;


  cvFlip(limage, 0);
  cvFlip(rimage, 0);
  cvShowImage("LeftImage", limage);
  cvShowImage("RightImage", rimage);

  // note this this is an inplace filter so you don't need an additional image array
  // but you've now corrupted the original image ...

// Parses the variables sent by RR into the appropriate structure. You can add
// your own processing routines here to handle other variables that may get sent.

void ProcessVariable(HANDLE hPipe, struct ImageDataType *imageData, char *name, char *data, int len)
  unsigned long bytes;

  // determine what we've got
  if (stricmp(name, "name")==0)
    strncpy(imageData->name, data, 16);
  // determine what we've got
  if (stricmp(name, "width")==0)
    imageData->width = *(int *)data;
  if (stricmp(name, "height")==0)
    imageData->height = *(int *)data;
  if (stricmp(name, "image")==0)
    if ((imageData->width==0)||(imageData->height==0))
      ReturnError(hPipe, "Error - missing image dimensions before image data!");

    if (len!=(imageData->width*imageData->height*3))
      char buffer[256];
      _snprintf(buffer, 256, "Error - length of data and dimensions of image\n        disagree! (width:%d height:%d len:%d)\n", imageData->width, imageData->height, len);
      ReturnError(hPipe, buffer);

    // we only need to allocate once! The program will remain
    // active for as long as processing continues ...
    if (imageData->pixels==NULL)
      if ((imageData->pixels=(unsigned char *)malloc(len))==NULL)
        ReturnError(hPipe, "Error - out of memory.\n");

    // we did not read in the image data yet since it is always > 1024 ..
    ReadFile(hPipe, imageData->pixels, len, &bytes, NULL);
    // skip this variable
    if (len>DATA_BUFFER)
      // only need to skip bytes if we have not read it in yet
      SetFilePointer(hPipe, len, 0, FILE_CURRENT);

// This is where the program first starts
int main(int argc, char *argv[])
  // the pipe name to use ... default is rrpipe but can be changed in RR interface
  LPTSTR lpszPipename = TEXT("\\\\.\\pipe\\rrpipe");
  // holds the variable name being sent by RR
  char varName[MAX_VARNAME_SIZE];
  // holds the received and prehaps processed image data
  char varData[DATA_BUFFER];
  // used to hold return value of number of bytes read from pipe
  unsigned long bytes;
  // variables data length
  int varLen;

  struct ImageDataType imageData;

  memset(&imageData, 0, sizeof(imageData));

  HANDLE hPipe;

  cvNamedWindow("LeftImage", -1);
  cvNamedWindow("RightImage", -1);

  if (argc>1)
    printf("Started with \"");
    int i;
    for (i=1;i<argc;i++)
      if (i>1) printf(" ");
      printf("%s", argv[i]);

  // Create the pipe used to communicate with RR
  hPipe = CreateNamedPipe(
          lpszPipename,             // pipe name
          PIPE_ACCESS_DUPLEX,       // read/write access
          PIPE_TYPE_MESSAGE |       // message type pipe
          PIPE_READMODE_MESSAGE |   // message-read mode
          PIPE_WAIT,                // blocking mode
          PIPE_UNLIMITED_INSTANCES, // max. instances
          PIPE_BUFFER_SIZE,         // output buffer size
          PIPE_BUFFER_SIZE,         // input buffer size
          NMPWAIT_USE_DEFAULT_WAIT, // client time-out
          NULL);                    // default security attribute

    printf("CreatePipe failed");

  while (true)
    printf("Waiting ...\n");

    if (!ConnectNamedPipe(hPipe, NULL)? TRUE:(GetLastError() == ERROR_PIPE_CONNECTED))
      printf("ConnectNamedPipe failed");



    int len=0;

    while (true)

      printf("Processing %d\r", imageData.count);

      while (true)
        // read in variable length
        ReadFile(hPipe, &len, sizeof(len), &bytes, NULL);
        // if length <=0 on the variable name then we're done
        if (len<=0) break;
        // read in variable name but if the name is longer than 64 characters
        // then grab the first 64 chars only
        if (len<MAX_VARNAME_SIZE)
          ReadFile(hPipe, varName, len, &bytes, NULL);
          // read in first 64 chars
          ReadFile(hPipe, varName, MAX_VARNAME_SIZE-1, &bytes, NULL);
          // skip over remaining chars
          SetFilePointer(hPipe, len-MAX_VARNAME_SIZE+1, 0, FILE_CURRENT);
        // don't forget to terminate this string with a zero

        // read in the variable's data length
        ReadFile(hPipe, &varLen, sizeof(varLen), &bytes, NULL);
        // if the data is less than 1024 read it in now ..
        if (varLen<sizeof(varData))
          ReadFile(hPipe, varData, varLen, &bytes, NULL);

        // handle this variable
        ProcessVariable(hPipe, &imageData, varName, varData, varLen);

      // termination signal -1 on attribute length
      if (len==-1) break;

      // process image

      // Write out the processed image back to RoboRealm using stdout.
      // You can also write back any other variables to use in
      // other parts of the program.
      // The format is the same as the input.
      ReturnBytesVariable(hPipe, "image", imageData.pixels, imageData.width*imageData.height*3);

      // Send back the count as an example of how to feed back variables into RoboRealm
      ReturnIntVariable(hPipe, "count", imageData.count);

      // write out end of message
      len = 0;
      WriteFile(hPipe, &len, sizeof(int), &bytes, NULL);
      if (bytes==0) break;

      // continue by waiting for next image request



    if (len==-1) break;


  // free alloced array now that we're done


Anonymous from United States  [28 posts] 12 year
First attempt


This forum thread has been closed due to inactivity (more than 4 months) or number of replies (more than 50 messages). Please start a New Post and enter a new forum thread with the appropriate title.

 New Post   Forum Index