I’m always coming up with ideas me, some of which I’m working towards commercializing at the moment. For instance, I’m working towards an initiative which will see the gamification of public transport, through the use of smart phones, location based services, QR codes, social media and other cool stuff. I’m also working towards establishing a new payroll software system. One of the ideas that I had, yet ultimately decided not to pursue however, is that of a face recognition app. The main reason why I didn’t bring it any further was because that it has been done. Check this videos out, very impressive stuff!

I initially looked at the product concept which entailed designing and subsequently bringing to market a functioning facial recognition product to be used by smart phones. The application, which would be 100% opt-in, using Augmented Reality to scan an individual’s face, retrieves that person’s identity and returns back links associated with that individual. Whilst computers do not possess an innate ability to recognize and distinguish between faces the company must embark on various techniques to best match an inputted image, to a specific profile of an individual. Techniques such as Gabor Wavelength Transform combined with various algorithms facilitate the mapping of face images based on key features of that face. It is such concise measurements which would allow for accurate retrieval of profile information. I would have intended on developing a fresh dataset for face recognition, to assess their algorithm’s recognition rate and refine accordingly.

I began thinking about the idea after attending a seminar; the Search Marketing Summit in Bangalore, India. At this seminar, I had taken hand written notes based on each speaker’s presentations, including speaker’s professional background, contact details, twitter accounts, blogs and company websites. I stored such information in a centralized location; a hand written notepad. After losing my notepad, I began sifting through my wallet to try and find business cards and also began searching through the event’s twitter hash tag. With a keen interest in Augmented Reality, I began to contemplate ways of easily accessing individual’s information. The concept then is based around a face recognition platform which offers a snapshot into a subject individual’s online presence and also the opportunity to easily connect via e.g. email, social networks or other websites. A quick Google search told me that this was actually already being developed. Even still, it intrigues me and I wanted to investigate…

The overall objective of the system then, is to adequately design a functioning facial recognition product to be used by smart phones.  The system would also facilitate a call to action via various “click to follow” links. Such activities would be conducted in real time and unlike various “life-logging” techniques, does not aim for data recall. As such it would be used, not as a remembrance agent but rather act as an information retrieval tool, based on live encounters with individuals.  Anyway, I had conducted some research, found it really interesting and rather than letting it go to waste, I thought I’d put it all together and post a blog. The following blog is thus based on research into image based human and machine recognition of faces.

The proposed process then relies on the ability to detect a face (face detection) and then measure the various features (face recognition.) Face detection software is well developed and is already installed in most cameras and indeed smart phones.

The product functionality would allow for excellent facial recognition accuracy. In-built face detection seeks to detect a face and then extract only that face from the image.

Humans have always had the innate ability to recognize and distinguish between faces, yet computers/smart phones do not posses such innate capabilities. As such various challenges and constraints exist.

The existence of False Accept Rate (“FAR” – falsely indentifies an individual) and False Reject Rate (“FRR” – enrolled individual is not found) can reduce the accuracy of results. The aim then lies in designing this system to offer a highly accurate means of identity verification.

Identification problems may be encountered where the scanned face (input) is unknown (does not correlate to any stored faces) yet the system reports back a false identity from its database. This could be seen where a user scans the face of an individual who has not signed up to the opt-in service.

Verification problems may arise where the system must confirm or reject the claimed identity of a user’s face. It should seek to eliminate any avoidable fraudulent behaviour and as such may perform random validity tests. For example, a linked website may be perform a command, like momentarily adding a line of code to a website, uploading an image onto a social network site or simply responding to a validation message sent via email.

Limitations may be encountered in the form of functional constraints; e.g. inadequate sampling, variances in face composition.  Challenges would be encountered where the level of tolerance of specific variable exist. For example:  facial expression and pose, accessories such as glasses or jewellery and the level of controlled lighting (illumination).

With the diffusion of any new product, especially technical, various adoption hurdles may be encountered.  The lack of a critical mass user base may itself hinder its adoption. The book; “Technology Analysis and Strategic Management” highlights four variables critical for gaining market acceptance; 1.) technological competence 2.) strategic use of technological competence 3.) market resistance to adoption and 4.) perceived value of the innovation.

My proposed system would have operated on an Augmented reality platform e.g. Layar and would require both individuals; “the Searched” and “the Searcher” to have an opt-in account.

To best present a clear functional description, the process would be described in light of all input and outputs required by both parties; the Searched and the Searcher.

The Searched

Smartphone and Layer users can log onto the website. The new visitor then selects “Create New Profile.” The User would be asked to provide five different face shots. Using a device with an inbuilt camera or plug in webcam, a target space would open in which to place the face for the photos. User would take and submit one face-on expressionless photo and one face-on with a smile. Next the user would upload left and right profile shots.

From here, a series of complicated algorithms would not only crop and align the images but also extract the facial region rendering it ready for further analysis. Algorithms begin to map each of approximately 80 nodal point of the face. For example, the proportional distance between the eyes, and the length and definition of the jaw line. This data would then be stored as a unique numerical code. Finally the user would be asked to take another face shot, this time to act solely as a profile photo. (Sample photographs would be provided throughout each stage.)

Research by Hesher et al. [2003] explored the merits of using various images as discussed above. The back end system would analyze images of different sizes and with different facial expressions. The existence of additional images resulted in an improved recognition rate as it offered the probe image more chances to make the correct match.

The aforementioned photographs would represent the individuals “Biometric Signature” (D.M. Blackburn 2001) and would be stored in a database.

An approached named Elastic Bunch Graph Matching (EBGM) would then be utilised. Gordon et al. [1992] discuss using algorithms to conduct a curvature-based segmentation of a face.  The EBGM concept then works on the understanding that each face profile would have many non-linear characteristics, some of which were earlier mentioned as challenges (e.g. expression, pose, illumination). To target such challenges, the system would use a technique named; Gabor Wavelength Transform. Taking the aforementioned nodal information from each of the four photographs, the algorithms would project the face onto an elastic grid. As such it would use depth and an axis of measurement after extracting properties which offer curvature and metric size properties. Such data would then be stored in the systems database. This technique would result in proportionately lower false reject rates and crucially to Smartphone users, would also facilitate non frontal face recognition.

Source: National Science and Technology Council (2006) Face Recognition

The algorithm used can be continuously fine tuned for best accuracy of performance. For example, the above technique would use distinctive features such as; where rigid tissue is most apparent, as seen in the nose, chin or eye sockets. Such features are not only unique to an individual but also are unlikely to change over extended periods of time.

Next, the user would provide links for associated websites; e.g. personal website, twitter account or blog site. The user would then be given 160 characters to insert a biographical description. This profile would not affect the user’s privacy settings for each linked site and it is up to the user to decide what level of content is made available. The option to pause an account would also be included; this may be utilised where the user would prefer to remain incognito. Once the setup is complete the user would be asked to download the application and can now also use the service for searching others.

The Searcher

The “Searcher” would scan an individual’s face, with their smart phone camera lens. The application would detect the face and extract it from the background. After recognizing and analyzing various features, the computer algorithm would “normalize” the image so it represents the same format (e.g. size/resolution) as the images held on the database. This data would then be sent to the database as a search query. A graph matching algorithm incorporating relational constraints would be utilized to find the corresponding profile for the probe image.

The main advantages of the system lie in its accuracy and efficiency of use. Previous face recognition systems have faltered as they relied on one 2D image to compare against another 2D image which lay in the database. As discussed the system, which incorporates graph matching, allows for greater variances in image conditions. This is especially pertinent for Smartphone users who would use the application in different and non-controlled environmental settings yet would still be ensured of achieving effective and accurate recognition.

Additional functionality would allow users to pause and resume their account upon a click of a mouse or a touch of a screen.

Incremental improvement could be realised in terms of information retrieval. Collateral information such as demographics – age, gender, race as well as speech may be incorporated to refine search results. Furthermore, as both parties are Smartphone users, the search algorithm could call upon GPS/Location Based Data to better match queries. 

Some disadvantages may include time lag for enquires and the incremental expansion of the database. The discussed “elastic grid” process required for 3D face recognition requires more computational effort then standard 2D-2D recognition systems. Although this could imply more accurate results, a pay off may be made with process time. A key selling point would be in attaining real time information retrieval so the load time may pose a challenge.

Also, the lack of an extensive database may hinder the system’s functionality. Alternative means of expanding the database could be achieved by altering algorithms to scan images e.g. across social networks, again on an opt-in basis.

Furthermore, this system requires, for privacy reasons, both parties to have signed up to this opt in service. Therefore both parties must be users of Augmented Reality platform, Layer.

I would now evaluate the concept. Facial Recognition has been studied for years, and its potential applications are wide spread. This said, many functional and operational constraints exist and generally such technologies have failed to reach a critical mass (Zhao et al. 2005).  This discipline has become one of the most prominent areas in computer vision, leading to the development of numerous face recognition algorithms.

There is a range of databases which can be used for the evaluation of such face recognition algorithms. It is the case however, that many of these databases offer a wide range of images per subject but have a limited number of subjects. On a positive note, it does appear that the use of various algorithms can offer a high quality match rating. (In a study carried out by the National Institute of Standards for the American Government, it was found that facial recognition systems using EBGM showed an accuracy of between 87% and 90%.)

One means of assessing the efficacy of a face recognition algorithms then would be experimentation with pre-defined and publicly available face datasets. These include JAFFE, ORL and FERET.

The FERET program for example, was first introduced to the face recognition community, as far back as 1993, as an evaluation tool for algorithms. This dataset includes 2,413 still facial images which represented 856 independent facial images from different individuals.

There exists however, a deficit of appropriate datasets for 3D face recognition. I would propose to approach Academic partners (e.g. Clarity, DCU) to help in this regard. It is then proposed that work begins putting together a fresh dataset for 3D face recognition. Indeed, the student population could offer an excellent test-bed for assessing such research.

When designing such a dataset I would seek the following:

  • Extensive number of subjects with excellent demographic variances
  • Small amount of sensor specific artefacts
  • Excellent spatial resolution e.g. depth resolution of 1mm+
  • Images of a subject with various facial expressions
  • Images of a subject recorded over extended intervals of time

It is clear that extensive research is ongoing in this particular field. However, it appears that research published comes from disparate sources and often does not offer forward the mathematical algorithms with which it discusses.

For this reason I would have considered publishing the dataset, making it available to the overall research community. A collaborative approach could facilitate the assessment of the state of the art in face recognition. This approach could also have helped me understand and better interpret any statistical significance of other’s research. This could allow better understanding of the different performance measures of various algorithms and refine the process accordingly.

It was mentioned earlier that the website would ask for four subject images and an additional profile image. One such image was to map the smile of an individual. This is due to the observation that non-convex facial regions (mostly bottom half of face) are more likely to change shape when considering variances in facial expression. The implication could then be that it is more difficult to achieve a match where the subject is smiling. The action then taken was to record data of that individuals face whilst smiling.

However, research by Chua et al. [2000] suggests an alternative method to approach expression change. This would see, only the more rigid elements of the face being analyzed (from below the nose up through the forehead.) This would form a key hypothesis whilst conducting an evaluation and assessment of the process.

I will now discuss The Business Case of such a system… Recent times have witnessed surging interest in mobile augmented reality. The ability to retrieve information and display it as virtual content overlaid on top of an image of the real world is a natural extension to a mobile device equipped with a camera and wireless connectivity.

The primary aim of this project would be to adequately design a functioning facial recognition product with the use of a Smartphone. The target market therefore would consist of any individual who wants to appear more accessible online or those whom find out and communicate with such individuals through Augmented Reality.

Ultimately, this offering could be used for those whom are not yet close contacts e.g. – eliminating the need for business card exchanges or alternatively close contacts e.g. who may use it for faster access for communication, consider a faster way of sending an email, without inputting email addresses.

There are currently approximately 1.3 million operational smart phones in Ireland. As of Q1 2010, the highest users of smart phones can be found in the 18-35 year old bracket.  This particular group is said to boast the highest share of ownership of the android and IOS devices. This is significant as Android and IOS are said to be the two most prominent operating systems used to run Augmented Reality Platforms. (Kellogg, D. 2010).

A secondary aim would be in profit generation. Despite the widespread proliferation of smart phones and the gradual adoption of Augmented Reality, I would imagine any company designing such a system would experience a period of unprofitability, maybe over two years or more.

If I did go ahead with the concept, I would ideally like to to launch it (in three months) during its beta stage and offer it as a free download. This would not only incite the active building of a critical mass but would also facilitate continuous user feedback. This strategy would be applied for twelve months. After one year, the improved product would be sold at a nominal once off download price of €10. The Vendor would likely take approximately 10% of this fee equating to €1 per download. After a further nine months all commercialisation, research and development costs incurred would have been covered and the company would begin to operate as a profitable enterprise. 

There would exist many opportunities for future collaborations via strategic partnership or joint venture initiatives. Such initiatives would see this process be integrated with such offerings as Foursquare or other social networks which operate via mobile.

Many high potential start ups become the subject of high profile acquisitions and should an offer come along, it may be considered. Indeed, as mentioned, a company named Polar Rose was recently acquired by Apple. Polar Rose main operations where concerning a face tagging plug-in application for social networks, however, they have recently been reported as working towards a similar facial recognition system.

Also with an envisaged increase in Augmented Reality users, the potential for Revenue generation via advertising could emerge. However, advertising could add undesirable clutter to the user interface and negatively affect user experience.

Further research, as outlined previously, would be needed in order to ascertain the feasibility of such a concept. There exists, like most innovations, many risks which would be further outlined in later research. One threat could be seen where other commercial entities, e.g. Polar Rose, may use a first mover advantage to gain a highly dominant market share. Such competitive forces would act as a challenge and not necessarily a restrictive force. It’s a really interesting space, can’t wait to see how this one plays out…

Advertisements