Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects

Abstract

Human vision greatly benefits from the information about sizes of objects. The role of size in several visual reasoning tasks has been thoroughly explored in human perception and cognition. However, the impact of the information about sizes of objects is yet to be determined in AI. We postulate that this is mainly attributed to the lack of a comprehensive repository of size information. In this paper, we introduce a method to automatically infer object sizes, leveraging visual and textual information from web. By maximizing the joint likelihood of textual and visual observations, our method learns reliable relative size estimates, with no explicit human supervision. We introduce the relative size dataset and show that our method outperforms competitive textual and visual baselines in reasoning about size comparisons.

Paper and Citation

Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects
In AAAI16 (oral)

Dataset and Code

[Download Zip]

The dataset contains 486 object pairs between 41 physical objects. Size comparisons are not available for all pairs of objects (e.g. bird and watermelon) because for some pairs humans cannot determine which object is bigger.

Dataset contains only object pairs that people have consistently agreed which one is bigger. Objects appear in 24 pairs on average, window with 13 pairs has the least, and eye with 35 pairs has the most number of comparisons.

More Results

Examples of visual observations for the edges of the size graph. Co-occurrence of objects in images provide visual signal to estimate relative sizes of objects. Examples of relative size comparisons is shown in this figure. Erroneous detections (e.g. tree and butterfly in the bottom row) results in wrong relative size estimates.
The size graph: The thickness of each edge represents the number of images in which both objects are detected successfully (the more, the bolder). The topology of the size graph reveals interesting properties about transitivity of the size information. For example, the size of chairs would be mainly affected by the estimates of the size of cats while the size of trees are affected by several other objects.

Acknowledgement

This work was in part supported by ONR N00014-13-1-0720, NSF IIS-1218683, NSF IIS- 1338054, and Allen Distinguished Investigator Award.