Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation
This paper investigates a framework of search-based face annotation (SBFA) by mining weakly labeled facial images that are freely available on the World Wide Web (WWW). One challenging problem for search-based face annotation scheme is how to effectively perform annotation by exploiting the list of most similar facial images and their weak labels that are often noisy and incomplete. To tackle this problem, we propose an effective unsupervised label refinement (ULR) approach for refining the labels of web facial images using machine learning techniques. We formulate the learning problem as a convex optimization and develop effective optimization algorithms to solve the large-scale learning task efficiently. To further speed up the proposed scheme, we also propose a clustering-based approximation algorithm which can improve the scalability considerably. We have conducted an extensive set of empirical studies on a large-scale web facial image testbed, in which encouraging results showed that the proposed ULR algorithms can significantly boost the performance of the promising SBFA scheme.
PROJECT OUTPUT VIDEO:
A large portion of photos shared by users on the Internet are human facial images. Some of these facial images are tagged with names, but many of them are not tagged properly.Instead of training explicit classification models by the regular model-based face annotation approaches, the search-based face annotation (SBFA) paradigm aims to tackle the automated face annotation task by exploiting content-based image retrieval (CBIR) techniques in mining massive weakly labeled facial images on the web. The SBFA framework is data-driven and model-free, which to some extent is inspired by the search-based image annotation techniques for generic image annotations. The main objective of SBFA is to assign correct name labels to a given query facial image. In particular, given a novel facial image for annotation, we first retrieve a short list of top K most similar facial images from a weakly labeled facial image database, and then annotate the facial image by performing voting on the labels associated with the top K similar facial images.
DISADVANTAGES OF EXISTING SYSTEM:
1.Facial images are tagged with names, but many of them are not tagged properly.
2.Classical face annotation approaches are often treated as an extended face recognition problem.
3.This not effectively exploit the short list of candidate facial images and their weak labels for the face name annotation task.
We propose a novel unsupervised label refinement (URL) scheme by exploring machine learning techniques to enhance the labels purely from the weakly labeled data without human manual efforts. We also propose a clustering-based approximation (CBA) algorithm to improve the efficiency and scalability. As a summary, the main contributions of this paper include the following:
1.We investigate and implement a promising searchbased face annotation scheme by mining large amount of weakly labeled facial images freely available on the WWW.
2.We propose a novel ULR scheme for enhancing label quality via a graph-based and lowrank learning approach.
3.We propose an efficient clustering-based approximation algorithm for large-scale label refinement problem.
4.We conducted an extensive set of experiments, in which encouraging results were obtained.
ADVANTAGES OF PROPOSED SYSTEM:
1.Its machine learning techniques enhancing the labels purely from the weakly labeled data .
2.Improved the efficiency and scalability.
- Data Collection
- Face Detection and Feature Extraction
- Feature Indexing for High Dimensional Data
- Weakly Labeled Data learning
- Similar Face Retrieval and Face Annotation
- Data Collection :
In this module, the first step is the data collection of facial images, in which we crawls a collection of facial images from the WWW by an existing web search engine (i.e., Google) according to a name list that, contains the names of persons to be collected. As the output of this crawling process, we shall obtain a collection of facial images; each of them is associated with some human names. Given the nature of web images, these facial images are often noisy, which do not always correspond to the right human name. Thus, we call such kind of web facial images with noisy names as weakly labeled facial image data.
- Face Detection and Feature Extraction:
The second step is to preprocess web facial images to extract face-related information, including face detection and alignment, facial region extraction, and facial feature representation. For face detection and alignment, we adopt the unsupervised face alignment technique proposed. For facial feature representation, we extract the GISTtexture features to represent the extracted faces. As a result, each face can be represented by a d-dimensional feature vector.
- Feature Indexing for High Dimensional Data:
The third step is to index the extracted features of the faces by applying some efficient high-dimensional indexing technique to facilitate the task of similar face retrieval in the subsequent step. In our approach, we adopt the locality-sensitive hashing (LSH), a very popular and effective high-dimensional indexing technique.
- Weakly Labeled Data learning:
Besides the indexing step, another key step of the framework is to engage an unsupervised learning scheme to enhance the label quality of the weakly labeled facial images. This process is very important to the entire search-based annotation framework since the label quality plays a critical factor in the final annotation performance.
- Similar Face Retrieval and Face Annotation:
In this module, we describe the process of face annotation during the test phase. In particular, given a query facial image for annotation, we first conduct a similar face retrieval process to search for a subset of most similar faces (typically top K similar face examples) from the previously indexed facial database. With the set of top K similar face examples retrieved from the database, the next step is to annotate the facial image with a label (or a subset of labels) by employing a majority voting approach that combines the set of labels associated with these top K similar face examples.
- System : Pentium IV 2.4 GHz.
- Hard Disk : 40 GB.
- Floppy Drive : 44 Mb.
- Monitor : 15 VGA Colour.
- Mouse : Logitech
- Ram : 512 Mb.
- Operating system : Windows XP/7.
- Coding Language : net, C#.net
- Tool : Visual Studio 2010
- Database : SQL SERVER 2008.