Automatically image learning
Le Zhang from United States  [11 posts]
12 year
Hi guys,

  I want to build a robot capable of recognizing objects in its surrounding. I see that both the the object_recognition module an AVM navigator has this function, but which one has a better object recognition module? My robot is fixed but can rotate on its base. Seems like AVM has something that works automatically adjusts rotation to center the object? the Object_recognition module seems to have more advanced features however.

I need for my robot to recognize a class of objects. i.e not just any specific mouse, but mouses in general.

In object_recognition, it is stated that a template is needed with only the object and a blank background. Does this mean that training here requires me to edit every image? How does AVM train? I am trying to get the training automated so that when the robot sees something it doesn't recognize it will create a random name and remember the new object as that name. Is this possible?

Sorry for all the questions,

I'm new to RR and still deciding on if I should make the purchase
EDV  [328 posts] 12 year
The AVM Navigator makes easy a training process on new images:

Also AVM Navigator provide several navigation templates such as:
*Follow me

*Walking by gates

*Navigation by map

And there is really "automatically image learning" in AVM Navigator for "Nova gate mode" or "Marker mode" where images associates with current robot location.

For more information see:

So, I think that you should just solve what you needed in your project ;)
Anonymous 12 year

The issue that you are talking about is one of segmentation. Most OR techniques require just the object to be in view that it needs to train on. If the object still has a background image then the system will not know what is relevant (i.e. what part of the image is to be trained on) and will learn the entire image. Then on recognition, a large part of the background would need to be shown again in order for anything to be recognized. Otherwise, once the object to be recognized is moved from the current background it would no longer be recognized.

The way around this is to make the background planar (i.e. against a wall) so that only the object to be trained on is in view.

Humans use a couple tricks to get around this. One way is to use movement. I.e. take two pictures, one with the object and one without to be trained on and the subtraction of one against the other will segment the object to learn from the rest of the scene.

Depth is another, if you have the Kinect handy you can use that to segment just the object (foreground) from distant objects and then use that as a mask to create only an image with the object to be recognized.

Either way you somehow have to remove the background from the image and leave only the object to be trained on.

This assumes that you don't want to just train on the entire scene ...

Le Zhang from United States  [11 posts] 12 year

thanks for clearing that up for me. I was under the assumption that a clean background wasn't needed for the AVM. Since in my setup I cannot use the kinect I guess my only method is to train using the movement method. For my purposes I'm primarily concerned with the active training of moving objects. This movement method sounds problematic however, since it requires my camera to be fixed at a position at all times. But for an actively moving camera this method doesnt seem possible. Are there other smaller alternatives to depth perception?
Anonymous 12 year

Just to clarify ... AVM doesn't require a clean background assuming you indicate the area that you want to lean from. This is done by having you select the square area of importance. Effectively this is the same as a clean background since the selection ignores everything else around it.

Some alternatives would be color or texture of the moving object. Another possibility is to subtract the background and compensate for its movement. That is really tricky and does not produce ideal results.

Perhaps you have a quick video/images of what you'd like to accomplish? That may be easier than guessing at what might work.

Le Zhang from United States  [11 posts] 12 year

   I don't have any pictures of what I'm trying to do, most of this is all just written down plans. What I plan to do is to have a robot with a wireless camera that essentially rotates on its base and can look around in a room. The robot is stationary but it can view all 360 of its surrounding.

What I want to achieve is to have this robot be able to train new objects that it sees on its own. I want to be as little involved as possible in the training as possible. The robot doesn't even need to care what an object it sees is. Say it sees a cup, it doesn't need to know that its called a cup, instead it creates a random name for the cup and identifies any future similar cups with this random name. The same functions should be applied for moving objects, notably people.

Essentially anything is fair game here for object recognition. And there are an infinite variety of backgrounds since the camera can see the same things at different angles. Also there is no guarantee that the background will remain stable.

I mentioned before that I was fine with just training moving objects, but if all objects can be trained that would be best.

I hope I was clear in what I want to do, does this seem possible? Or what will I have to sacrifice to get part of it working?


Anonymous 12 year

Assuming your scene stays relatively the same (i.e. the camera rotates but does not move) you can "scan" in the entire 360 rotation and use what is currently in the background as the mask to eliminate in order to see new objects when added into the scene. Since you would have a full 360 scan at any point you could use background subtraction to reveal new objects and focus training on that object.

Note this is a complex task since minor movements will always happen even to stationary objects. Plus you'd have to take an image and store it for every rotation step. So if your stepper motor does 360 steps that you'd need to save 360 images. This also assume that you are interested in new objects ...

The problem is that no database exists that has all the templates needed to understand a complete scene esp when that is a random scene. There are competitions similar to your task where they use a very clean scene with textured objects (books, etc) and use the web to lookup what those objects might be. But the initial job is segmenting the objects from the background ... which again is plain as they they sheets/plain carpet/ etc.

The only way to simplify this task is to make some basic assumptions (i.e. no moving background, some known objects in the scene, stable lighting, etc.) otherwise you are in for a long project.

Perhaps knowing why you need this kind of behavior may help to simplify the task? What is the purpose or desired result of this system?

Le Zhang from United States  [11 posts] 12 year

   Thank you for the help. I would prefer to not be limited to the same area in training. The scanning of the entire 360 rotation sounds pretty difficult as well, I'm not sure if I can give it the stability and accuracy it needs.

   I'm mainly doing this to experiment with robotic vision.  I want to have a dynamic robot capable of observing other objects and their interactions. Having autonomous training allows for the robot to remember any new object it observed for future reference.

So say if I have a cup thats already in the avm database so its recognized. Then next to the cup we see a hand griping its handles.  Assuming that the AVM has not been trained to recognize hands, it would not know and be able to determine what it is on the handle. I need someway for the robot to recognize the hand as a separate object distinct from the background due to its interaction with the cup. Thus the clenched hand becomes trained in the avm and will become recognized next time.

Of course the roudabout way is to simply train the avm with all sort of hand positions then the robot would see the hand clench and the cup, end of story. But this requires me to train literally everything the robot sees, which isn't what I'm looking for.

However seeing as there doesn't seem to be much alternatives it looks like I have little choice :/
Le Zhang from United States  [11 posts] 12 year
I've been doing some more searching and it seems like there are various algorithms out there for depth perception using a single camera. Doesn't seem like RR has that capability unfortunately. Though I did come across an interesting point about motion parallax. If, using one camera, I somehow perform motion parallax, I should be able to tell, using edge detection, which edges are closer than others. Basically I would have the bot rotate a bit and using edge detection determin how far each edge has traversed.The edge that has moved the most is the closest.

Thus I would focus on the edges that are closer , or if possible on specific edges of a depth. This would require that all objects be stationary in training mode, something I'm definitely willing to allow if this is possible.

Does this seems remotely possible? Or practical? Are there some modules in RR that could assist me?


Anonymous 12 year

Yes, this is possible to some extent. The issue is that with rotation you will not get the same as motion parallax which occurs with forward/backward movement or side to side movement. The tracking of edges is possible but has its own issues. What you can do is look into optical flow which may be more what you are looking for.

Alternatively, two cameras can be used for stereo ...

But you can try it out. Generate two images based on your thought above and we can see if there is something that can be done. Note that if you rotate the camera off center then you will get some motion parallax. Worth a try ...

Anonymous 12 year

About rotational parallax instead of motion parallax, wouldn't both of them achieve the same result? I should still be able to tell which object has displaced less in the same way. Some animals bob their head up and down to perceive depth, essentially vertically rotating, so wouldn't a horizontal rotation achieve the same thing? Maybe there are no functions that deal with this kind of parallax?

What technique of optical flow are you referring to specifically? Does RR have the functions to support my needs? If I'm understanding you correct, I should use optical flow as an alternative to edge tracking(as opposed to optical flow replacing the rotational parallax+edge tracking)?

Anonymous 12 year
You have to be careful not to confuse rotation with translation. Head bobing is a translation and not a rotation.

Give it a try ... again, it also depends on how close to the rotation point the camera is. The further away it is the more translation you will get along with rotation.

Play around with


and note that when translating things further away have less longer lines versus something closer.

Anonymous 12 year
I was going over this thread again and I realized that I didn't fully grasp what you meant when you were talking about boxing off the object during training. Now I realize that essentially this is all I need to do, depth is a moot point, though definitely a plus. If I want to train in a dynamic environment, as long as I can box off the object, I'm fine.

However the question remains, how do I do this in RR efficiently? I would imagine this as a feature detection related issue. Should I do edge detection and simply box around individual bundles of edges?
Anonymous 12 year
Yes, that may work assuming the object you want to train on is relatively speaking the only thing moving. That's a trick we use. Pay attention to the moving object ... a very useful trait.

You may need to threshold things a bit but that should be possible to just get a bounding box on an object that is moving and then feed that box into an OR module.

See if you can create a video with that movement and post it here.


This forum thread has been closed due to inactivity (more than 4 months) or number of replies (more than 50 messages). Please start a New Post and enter a new forum thread with the appropriate title.

 New Post   Forum Index