I recently connected with David Rostcheck, an NLP expert from Chile who is taking our Coursera Course. David told me some interesting thing about personality testing using text inputs I wanted to share here, as I was not aware of this field.
Here is guest blog post from David:
When we think of hot Data Science tools, we usually think of Machine Learning and Data Visualization. Another area that does not get quite as much attention - but can produce amazing results - is Natural Language Processing (NLP). These algorithms can give powerful insight into the subject’s education, power dynamic, tone, and other personal traits. I recently completed an engagement in which I did quite a bit of this work and got to use some cutting edge APIs. In this article, I will explain how NLP functions and discuss using IBM Watson and Receptiviti to analyze personality from writing.
Natural Language Processing uses statistical techniques to extract insight from text. Consider assessing the grade level of a piece of text. When we say that something is “written at a 10th grade level,” we are speaking about how it scores on a standard metric. There are several, but one of the most common is called the “Flesch-Kincaid” score [1]. It was developed in the 1970s through a statistical study of language. Flesch-Kincaid is a simple formula using two variables: the average number of words per sentence and the average number of syllables per word.
Natural Language Processing uses statistical techniques to extract insight from text. Consider assessing the grade level of a piece of text. When we say that something is “written at a 10th grade level,” we are speaking about how it scores on a standard metric. There are several, but one of the most common is called the “Flesch-Kincaid” score [1]. It was developed in the 1970s through a statistical study of language. Flesch-Kincaid is a simple formula using two variables: the average number of words per sentence and the average number of syllables per word.
Although it seems simple [2], this heuristic can quickly and accurately assess the grade level of a written sample. Flesch-Kinkaid, like all Natural Language Processing algorithms, is specific to a language (in this case, English). Assessing writing in another language, such as Spanish, requires a different formula.
In today’s world of Deep Learning algorithms that extract features and relationships, NLP’s statistics-based approach can seem unsatisfying. The algorithm knows nothing about what the words mean. Feeding meaningless long sentences with multisyllabic words to the Flesch-Kincaid formula will produce a high grade score. But the methods work because the relationships hold true, given enough data – which, because text is information-dense, can be fairly little.
If Natural Language Processing is “just a bag of statistical tricks,” then why do we care about it? Advanced techniques, such as personality analysis, can give powerful, immediate insight to deep traits of the writer - or at least to the persona exposed by this sample. Is the author a Type-A person? How hostile is the message? Is it brooding? Impulsive? Emotionally distant, or accessible? Is the author in a high or low-status power position relative to the recipient? Does the writing show signs of mental instability? We can assess all those parameters. Furthermore, the fast execution of these heuristics allows us to push high volumes of data through them quickly.
We can directly implement simple algorithms, like the one above. More complex analysis requires a set of correlation coefficients established by painstaking academic research. Vendors generally acquire the research and wrap the algorithm in a web service API. For extracting insight about each subject, I used IBM Watson’s Personality Insight API and Receptiviti.
IBM’s service consumes text and returns scores along the axes (“traits”) of three different psychological models: Big 5, Needs, and Values. For example, using the Big 5 model, it will evaluate the openness, conscientiousness, extraversion, agreeableness, and emotional range of the input.
Do you obtain real objective traits from the output? IBM carefully cautions that to give valid absolute scores, Watson needs very long samples. It is better to think of the response as an assessment of persona - the voice in which the piece is written - rather than absolute personality. In test experiments I found, though, that for a given persona I could cut the sample down considerably and still get reliably consistent results. Within a defined problem domain - such as help desk ticket messages - comparison proved valid even with much shorter pieces than required to assess absolute personality. And the assessment held up to examination - when it reported a message as having high or low emotional range, a human reader concluded the same.
Receptiviti productizes research from James Pennebaker, a major academic figure in linguistic personality analysis. It produces Big 5 model traits too, but also gives more directly usable outputs such as the emotional warmth, impulsiveness, and depression of the persona. And with direct access to Dr. Pennebaker’s research, Receptiviti continuously adds new types of analyses.
It may seem incredible that one can obtain accurate psychological markers from writing - even more so because these tools operate based on comparing words against special purpose dictionaries established through psychological research. Watson will return the same results if you sort the text alphabetically, losing all sentence structure. The statistical relationships are word-based, not sentence-based - but they do hold.
What do companies use NLP-based personality analysis for? Early adopters span a wide range of industries. Telecommunications company Telefónica and human resources firm Adecco have both begun using Watson (embedded via SocialBro’s social media marketing tool) to segregate their customers by personality traits for Twitter marketing campaigns. I worked with educational software vendor Learning Machine to surface added insight in university applications. Design studio Chaotic Moon has explored ways to improve user interaction by shaping application behavior to the user’s personality, and fantasy football research firm Edge Up Sports uses the technology in its sports analysis. Receptiviti has put out a series of blog posts analyzing the personalities surfaced by candidates in the U.S. election debates, so it seems likely political consultants may begin using these tools as well.
You do need to bring a true scientific approach to the use of these tools - they are easily misused. For example, Watson’s three models are tuned to specific lengths and styles of writing: blog posts, Facebook messages, and tweets. Testing with a variety of samples from known sources revealed that it gave consistent results with the model that best fit the data, and inconsistent results with the others. It took carefully thought-out experimental tests to qualify the limits and precision of the tools. But they work, and they can give surprisingly deep insight.
The explosion of blogging and social media has opened new opportunities for the field of linguistic researchers. NLP may have a lower profile than other skills, but it can produce solid insights and holds a place in the Data Science toolbox.
[1] There are actually two Flesch-Kincaid scores, one for grade level and another for readability, with different formulas.
[2] It’s always simple after someone does all the analysis to extract the relationship, now, isn’t it?