Re-Searchin Vision PatternRecognition DataMining MachineLearning Increamental Focus of Attention (IFA)Mistracking in visual tracking has been recognized as being unavoidable. Instead of improving the tracking algorithm itself, this idea use the attention mechanism to improve the robustness of the tracking system. Lossing the tracking target is tolerant in this kind of system. However, the IFA offers the automatic initialization and reinitialization when the environmental conditions momentarily deteriorate or target motion is temporally unexpected. The Computational Interaction and Robotics Lab of Johns Hopkins University has proposed a framework of IFA tracking (http://www.cs.jhu.edu/CIPS/ifa/). The system includes two modules: selector (search for possible configuration of target) and tracker (tracking the locations of target). When the target losts, the attention mechanism is invoked to reinitilize the target. Multiple tracking algorithms with different precision are applied. If the current tracking algorithm fails, a less precise algorithm take over. The whole system is organized as different layer where processing occurs only in one layer. However, without reading the paper, I don't know what kind of attention mechanism is designed in IFA. (Any new increamental attention model can be designed?) Here, the most important factor different from the attention model in images is the increamental, how to make attention increamental and more efficient. IFA is somehow related to top-down attention mechanism because at the beginning, the system tracks the target and have a model of the target, the attention in this situation in IFA is defined by this kind of priori knowledge instead of bottom-up data-driven data. How to design new IFA for tracking? String Matching vs Point Set Matching vs Graph Matching vs Pattern MatchingMy current problem is given two data set (can be represented as string, tree, graph or any other pattern representation), how to efficient measure their approximate, inexact homology (isomorphism)? I want the techniques that can find the matching with the following characteristics:
Only the matching score and scale information is required, can we design efficient algorithm to achieve these information? It's related to string, tree, point set, graph matching! The whole area of these problem is named as structural pattern recognition. It is also relatd to combinatoral optimization. But until now, I haven't found a good self-study material about these area.
Motion consistency without CorrespondenceMeasure motion consistency of two video segment is the core part in many computer vision applications. One example is view invariant human action recognition. The basic idea for view-invariant action recognition is to utilize the relationship between the optical flows (space-time gradient) in two views. There is a linear relationship between the pair of optical flows of two views. The interesting thing is that the fundamental matrix doesnot need to be explicitly calculated. The motion consistency measure can be estimated by the rank constraint of the observed matrix from corresponding optical flows. However, from our best knowledge, the state-of-the-art researchs inevitablely assume the correspondence be achieved. This is not true because the visual data is dynamic and the correspondence is difficult to estimate. Even worse, in the situation of multiple view environment, the oclussion often make the correspondence complex:
The problem can be define mathmetically as follows
Given two data set. Ideally, if each data in one set corresponds to exact one data in another set, there is a linear relationship between them. However, now only part of two data set have correspondence as well as the associated linear relationship. But for the data in the collection that exists correspondence in another, the matching is approximately and several data points in one set can correspond to one data point in another set. It is related to the problem of matching of two point set with the difficulty of homology the 1-M (one-to-many) and M-1 (many-to-one) matching. Maximum Graph matching theory (bipartite graph) (related to spectral clustering) can be utilized for solving this problem. Keyword: graph matching, spectral clustering, bipartite graph matching, pattern matching for point set Main Journals I focusI mainly focus on IEEE Transaction of Comuer Vision, Multimedia, Learning, and Knowledge Discovery Vision, Pattern Recognition, Multimedia
Learning, Data Mining and Knowledge Discovery
Main Conferece I FocusThere are many conferences held annually. For my topic, I am interested in multiple areas such that I will read the papers from the top-rank conferences in several area like computer vision, pattern recognition, multimedia application, machine learning as well as knowledge discovery and data minning. Computer Vision:
Multimedia
Machine Learning
Knowledge Discovery and Data Mining
Video Attention Model (1): Rough ideaAs the second part of my Ph.D work, I would like to do my attention research in video part. My plan is: (1). survey and literature review exist work; (2). Find out and extend some assumptions about video attention; (3). Come out new idea for video attention from PR, Information theory and Statistical Learning; (4). Applications First roughly idea: Based on the rarity assumption of attention, temporal complexity (entropy) can be acted as the measure of video attention. Failure Idea from Locality Preserving Projection (LPP)Recently, because of the assignment of the course of "Computational Intelligence", I investigate the method of Locality Preserving Projection (LPP) in more detail. I suppose I found some defects of LPP. It only preserve the local geometric structure of the original data but loss the global geometric relationship. Although the authors of LPP argue that the local geometric structure is more useful than global structure, especially in the application of information retrieval (k nearest neighbors retrieval), the fact is that the global geometric structures reflect the clustering information and can be used to discriminate each class from others. So LPP is useful to reveal the structure of small difference like human pose, expression etc. but will mix the data from multiple classes. Compared with nonlinear method like ISOMap, it's objective function only consider large weight (local structure) and the adjacency graph contains only non-zero weight in local neighbors. LPP has less discriminant power than ISOMap which try to preserve all distance nonlinearly. My first thought is that if we can find a linear projection that preserving all distances to approximate Multidimensional Scaling (MDS) given a graph constructed similar in ISOMap representing geometric structure in original space. Sooner, I lost in this direction. Two questions I need ask before I am sure I can go deeply in this idea.
So I FAIL and STOP in this idea. Other ways to extend this idea are like how to apply the idea of LPP with other subspace method like GPCA.
View Invariant Action Recognition (1)It's very interesting!!!
Can we recognize similiar human actions captured from different view angles? Firstly, let's check whether this kind of action recognition across multple views is necessary and useful. In multiple camera system, we may be given visual data from multiple views, so it is necessary to equip computer vison system with the ability of recognizing actions invariant to view. Also in the application of query by action should find not only similiar action segment from the same view angle but also those from different view angles. Now what's the open issues of this problem!!!! 1. How to design similarity measure on space-time segment that is invariant to view? Already have a rough idea extended from CVPR 2005 paper ("Space-time Behavior Based Correlation"). 2. How to make the similarity measure robust in the noisy environment (error in space-time gradient, error by occlusion ...)? Generalize the binary measure to continuous one!!! 3. How to decide the recognization granularity?? Given a long action segment, the global action may be different from local action? How to find whether global action and/or local action are consistent? Solviing Vision Problem by Inference using Graphical Generative ModelMany difficult vision problems cannot be easily solved using traditional methods. A research group in the university of toronto lead by professor Brendan J. Frey and Nebojsa Jojic (Microsoft Research) proposed a set of theories and models based on probability inference for these hard problems from 1999 to date. This topic is known as "Inference and Learning in Graphical Generative Model". The basic knowledge of this framework is as follows:
Interesting things, need further study!!!! Reference Copyright ?2007 Practical Web. All rights reserved.
|
|