Hirerachical Attentive Region for Automatic Browsing of Large Image in Small Screen
How to improve the user experience of browsing large image on small screen? Microsoft Research Asia has provide a pretty solution for this problem (ACM MM03). It utilize attention model to find a set of attentive regions and order regions according to their information and generate automatic browsing path. One student of my supervisor implement this browsing application in his final year project.
Now I try to think over to improve this solution. I argue that there are two questions about this problem. One is how to decide the size of each region which corresponds to the zoom in scale for this location? The model of MSR method designs a "fuzzy growing" extraction algorithm for this. But growing the region from saliency map is not reasonable because the assumption of similar region is not equal to that their contrast is similar (saliency map). And for large object, MSR attention model only detect boundaries and not suitable for growing region. Another question is that the region sequence is linear, different from human vision (We assume human vision will hirerchical view object from-coarse-to-fine).
So possible idea is to generate a hirerachical structure of a series of attentive region and applying it to coarse-to-find image browsing. Is it reasonable? and interesting?
Now I try to think over to improve this solution. I argue that there are two questions about this problem. One is how to decide the size of each region which corresponds to the zoom in scale for this location? The model of MSR method designs a "fuzzy growing" extraction algorithm for this. But growing the region from saliency map is not reasonable because the assumption of similar region is not equal to that their contrast is similar (saliency map). And for large object, MSR attention model only detect boundaries and not suitable for growing region. Another question is that the region sequence is linear, different from human vision (We assume human vision will hirerchical view object from-coarse-to-fine).
So possible idea is to generate a hirerachical structure of a series of attentive region and applying it to coarse-to-find image browsing. Is it reasonable? and interesting?