Re-Search

in Vision PatternRecognition DataMining MachineLearning

Test backlink



Interesting post


Increamental Focus of Attention (IFA)



Mistracking in visual tracking has been recognized as being unavoidable. Instead of improving the tracking algorithm itself, this idea use the attention mechanism to improve the robustness of the tracking system. Lossing the tracking target is tolerant in this kind of system. However, the IFA offers the automatic initialization and reinitialization when the environmental conditions momentarily deteriorate or target motion is temporally unexpected. The Computational Interaction and Robotics Lab of Johns Hopkins University has proposed a framework of IFA tracking (http://www.cs.jhu.edu/CIPS/ifa/). The system includes two modules: selector (search for possible configuration of target) and tracker (tracking the locations of target). When the target losts, the attention mechanism is invoked to reinitilize the target. Multiple tracking algorithms with different precision are applied. If the current tracking algorithm fails, a less precise algorithm take over. The whole system is organized as different layer where processing occurs only in one layer.
However, without reading the paper, I don't know what kind of attention mechanism is designed in IFA. (Any new increamental attention model can be designed?) Here, the most important factor different from the attention model in images is the increamental, how to make attention increamental and more efficient. IFA is somehow related to top-down attention mechanism because at the beginning, the system tracks the target and have a model of the target, the attention in this situation in IFA is defined by this kind of priori knowledge instead of bottom-up data-driven data. How to design new IFA for tracking?


String Matching vs Point Set Matching vs Graph Matching vs Pattern Matching



My current problem is given two data set (can be represented as string, tree, graph or any other pattern representation), how to efficient measure their approximate, inexact homology (isomorphism)?
I want the techniques that can find the matching with the following characteristics:
  • between different size of data set. Some points donot have correspondence;
  • approximate & inexact match; some points have not exact but approximate correspondence;
  • many-to-many correspondence instead of injective matching;
Only the matching score and scale information is required, can we design efficient algorithm to achieve these information? It's related to string, tree, point set, graph matching! The whole area of these problem is named as structural pattern recognition. It is also relatd to combinatoral optimization. But until now, I haven't found a good self-study material about these area.


Motion consistency without Correspondence



Measure motion consistency of two video segment is the core part in many computer vision applications. One example is view invariant human action recognition. The basic idea for view-invariant action recognition is to utilize the relationship between the optical flows (space-time gradient) in two views. There is a linear relationship between the pair of optical flows of two views. The interesting thing is that the fundamental matrix doesnot need to be explicitly calculated. The motion consistency measure can be estimated by the rank constraint of the observed matrix from corresponding optical flows. However, from our best knowledge, the state-of-the-art researchs inevitablely assume the correspondence be achieved. This is not true because the visual data is dynamic and the correspondence is difficult to estimate. Even worse, in the situation of multiple view environment, the oclussion often make the correspondence complex:
  1. Only part of data can find its correspondence in another. The selection of data whose correspondence is not trival;
  2. Because of the template scale variance, each data may not exact correspond to one data in another, but several data points approximately match one point in another;
The problem can be define mathmetically as follows
Given two data set. Ideally, if each data in one set corresponds to exact one data in another set, there is a linear relationship between them. However, now only part of two data set have correspondence as well as the associated linear relationship. But for the data in the collection that exists correspondence in another, the matching is approximately and several data points in one set can correspond to one data point in another set. It is related to the problem of matching of two point set with the difficulty of homology the 1-M (one-to-many) and M-1 (many-to-one) matching. Maximum Graph matching theory (bipartite graph) (related to spectral clustering) can be utilized for solving this problem.

Keyword: graph matching, spectral clustering, bipartite graph matching, pattern matching for point set


Main Journals I focus



I mainly focus on IEEE Transaction of Comuer Vision, Multimedia, Learning, and Knowledge Discovery

Vision, Pattern Recognition, Multimedia
  • PAMI: IEEE Transaction on Pattern Analysis and Machine Intelligence
  • Image Processing: IEEE Transaction on Image Processing
  • CSVT: IEEE Transaction on Circuit System and Video Technology
  • Multimedia: IEEE Transaction on Multimedia
  • IJCV: International Journal of Computer Vision

Learning, Data Mining and Knowledge Discovery

  • TKDE: IEEE Transaction on Knowledge and Data Engineering;
  • Data Mining and Knowledge Discovery (DAMI)
  • Journal of Intelligent Information Systems (JIIS)
  • IEEE Transaction on Information Theory


Main Conferece I Focus



There are many conferences held annually. For my topic, I am interested in multiple areas such that I will read the papers from the top-rank conferences in several area like computer vision, pattern recognition, multimedia application, machine learning as well as knowledge discovery and data minning.

Computer Vision:
  • ICCV
  • CVPR
  • ECCV (ACCV)

Multimedia

  • ACM Multimedia

Machine Learning

  • ICML
  • NIPS (Neural Information Processing Systems)
  • American Association for Artificial Intelligence (AAAI)

Knowledge Discovery and Data Mining

  • ACM SIGMOD
  • KDD
  • WWW
  • ICDM
  • CIKM


Video Attention Model (1): Rough idea



As the second part of my Ph.D work, I would like to do my attention research in video part. My plan is: (1). survey and literature review exist work; (2). Find out and extend some assumptions about video attention; (3). Come out new idea for video attention from PR, Information theory and Statistical Learning; (4). Applications

First roughly idea:
Based on the rarity assumption of attention, temporal complexity (entropy) can be acted as the measure of video attention.


Failure Idea from Locality Preserving Projection (LPP)



Recently, because of the assignment of the course of "Computational Intelligence", I investigate the method of Locality Preserving Projection (LPP) in more detail. I suppose I found some defects of LPP. It only preserve the local geometric structure of the original data but loss the global geometric relationship. Although the authors of LPP argue that the local geometric structure is more useful than global structure, especially in the application of information retrieval (k nearest neighbors retrieval), the fact is that the global geometric structures reflect the clustering information and can be used to discriminate each class from others. So LPP is useful to reveal the structure of small difference like human pose, expression etc. but will mix the data from multiple classes. Compared with nonlinear method like ISOMap, it's objective function only consider large weight (local structure) and the adjacency graph contains only non-zero weight in local neighbors. LPP has less discriminant power than ISOMap which try to preserve all distance nonlinearly.
My first thought is that if we can find a linear projection that preserving all distances to approximate Multidimensional Scaling (MDS) given a graph constructed similar in ISOMap representing geometric structure in original space. Sooner, I lost in this direction. Two questions I need ask before I am sure I can go deeply in this idea.
  • Is it possible to linearly achieve all distance preserving that approximate MDS?
  • If possible, why need this linear projection given MDS exists?

So I FAIL and STOP in this idea. Other ways to extend this idea are like how to apply the idea of LPP with other subspace method like GPCA.



View Invariant Action Recognition (1)



It's very interesting!!!
Can we recognize similiar human actions captured from different view angles? Firstly, let's check whether this kind of action recognition across multple views is necessary and useful. In multiple camera system, we may be given visual data from multiple views, so it is necessary to equip computer vison system with the ability of recognizing actions invariant to view. Also in the application of query by action should find not only similiar action segment from the same view angle but also those from different view angles.

Now what's the open issues of this problem!!!!
1. How to design similarity measure on space-time segment that is invariant to view?
Already have a rough idea extended from CVPR 2005 paper ("Space-time Behavior Based Correlation").
2. How to make the similarity measure robust in the noisy environment (error in space-time gradient, error by occlusion ...)? Generalize the binary measure to continuous one!!!
3. How to decide the recognization granularity?? Given a long action segment, the global action may be different from local action? How to find whether global action and/or local action are consistent?


Solviing Vision Problem by Inference using Graphical Generative Model



Many difficult vision problems cannot be easily solved using traditional methods. A research group in the university of toronto lead by professor Brendan J. Frey and Nebojsa Jojic (Microsoft Research) proposed a set of theories and models based on probability inference for these hard problems from 1999 to date.
This topic is known as "Inference and Learning in Graphical Generative Model". The basic knowledge of this framework is as follows:
  1. Model the target problem using probability and then build the generative model such as (Bayes Network); In details, the modeling process begins with analysis of the dependence relationship of each (hidden) variable and observation, then model the probabilities of independent variables according to some assumption (such as gaussian). In the next step, the conditional probability of dependent variables are modeled also according to some assumptions. Finally the joint distribution of overservation as well as all variables are modeled.
  2. According to the target problem, the posterior probability of some interested hidden variables given the ovservations are infered to come out the fomulas;
  3. The problem is transformed to parameter estimation. Once the parameters appearing in the fomula of posterior probability are estimated, the values of hidden variable can be known; This can be achieved by using such as EM algorithm;

Interesting things, need further study!!!!

Reference

PSI group of University of Toronto

Brendan J. Frey's Publication

Nebojsa Jojic's Homepage



Copyright ?2007 Practical Web. All rights reserved.
Privacy Policy - Terms of Service