Why do you want to average the images? Not sure how that would really improve the wall object avoidance ... unless the images are really noisy. But you seem to be using a simulator?
Yes, depth is possible with two images but it will not really work in the simulation environment as there is not nearly enough texture. What you could do instead is just track the green floor similar to our obstacle avoidance tutorial at