Interpreting and Visualizing Neural Networks for Text Processing

Introduction

Neural networks have become the go-to approach for text processing tasks, but they are also notorious for their obscurity. We recently applied a neural network to the task of predicting numerical ratings for text-based consumer reviews. We trained the model to learn the ratings directly from the words in each review. It turns out that our model had decent prediction accuracy, but this wasn't our objective, especially since we already knew the ratings for these reviews. Instead, we wanted to understand why those particular ratings were assigned. Interpreting what a neural network has learned from data is a separate endeavor from applying it to make predictions. In this post, we'll explore some strategies for bringing the inside of a neural network to light, using our ratings prediction model to demonstrate.

For this task we used a simple feed-forward neural network called a multilayer perceptron. This image shows its basic architecture. It has an input layer (in this case, the reviews text), one or more hidden layers, and an output layer that makes predictions (here, ratings). There are weights connecting each of these layers, and the values of these weights are updated during training as the model observes reviews and tries to correctly predict their ratings. Mathematically the network is represented as $y=g(x^TW_1)^TW_2$, where here the input $x$ is, for each review, a vector of indicator features for words, the output $y$ is for predicted ratings, and $g$ is the nonlinear sigmoid function. What makes a neural network unique from other machine learning models is that the hidden layers allow it to learn complex interactions between variables (in our case, words) that can more precisely predict output.

title

Data

For this work, we examined two different reviews datasets, both publically available. The first consists of 50,000 movie reviews that appeared on imdb.com around the time period of 1998-2009. Ratings for these reviews range from 1-10, with 5 as the average rating. The second dataset is composed of approximately 15,000 reviews of Scotch whiskey posted in the /r/Scotch forum on reddit.com. The reviews were curated by this subreddit community into a Google Spreadsheet that contains the item names, ratings, and links to the review text. We then scraped the review text from the links. These reviews are rated from 1-100, and the average rating is 84. In both datasets we found that many reviews mentioned ratings within the text itself, so we replaced all numeric characters with a placeholder (0) to prevent the model from "cheating" using this direct mention. The trade-off was that a few meaningful numbers were replaced (e.g. "100 proof" scotch became "000 proof").

Model

For each set of reviews, we trained a multilayer perceptron with one hidden layer to predict ratings based on review text. We recently open sourced our implementation of this model, which was developed in Python with TensorFlow. Each review was tokenized into individual words using the segtok Python library. Words occurring at least 10 times across all reviews were added to the model's vocabulary. We also threw in phrases in addition to words by using the gensim library to identify frequent collocations like "top shelf" and "thumbs up", as well as proper full names like "eva langoria" and "johnnie walker red label". These phrases were subsequently represented as individual words (e.g. "top shelf" became "top_shelf"). To simplify the vocabulary, we converted all words to lowercase. We then encoded each review as a "bag-of-words", meaning the unordered set of all words weighted by how often each occurs in the review. Ultimately each review is represented in the model as a 1-D tensor (i.e., vector) with a dimension for each word in the vocabulary, whose values are equal to the number of times each word appears in the given review.

Given this structure to the data, we can interpret trained models by quantifying how each word influences predicted ratings. We will demonstrate some strategies for visualizing influence scores. There are three dimensions that characterize each of the examples we'll show:

  1. word influence for an individual review versus an aggregated set of reviews,
  2. word influence that is context-sensitive versus context-independent
  3. word influence at the output (prediction) layer versus the hidden layer

It'll become clearer what these dimensions mean as we go through each example.

Which words predict ratings?

We'll first focus on word influence at the output layer, meaning how words affect the ratings predicted by the model. In a model without a hidden layer like logistic regression, each word the model knows has a weight connecting it from the input layer to the prediction layer, so the influence of a word on the ratings is simply its weight value. In a neural network, on the other hand, the input weights for each word connect to the hidden layer instead of the output. A different set of weights connects the hidden layer and output layer, and these weights are based on the nonlinear values of the hidden units instead of the words themselves. By only looking at these weights, we don't know the overall influence of each word on the model's predictions.

It turns out that a bit of differential calculus can help solve this problem. In particular, we can compute the derivative of a layer in the network with respect to any layer below it. This happens to be a similar technique that was used to train the network, referred to as backpropogation. You can think of this like retracing a path from the output back to the input to get clues about the original route. This path can traverse any number of hidden layers, where here it's just one. The derivative of the output layer with respect to the input layer tells us how sensitive the predicted rating is to each word in the review. Words with positive values influence the rating to be higher while negative words influence it to be lower, with the magnitude indicating how strongly they influence the rating in that direction.

Which words are important overall?

Let's say we want to do an aggregated analysis to find out which words have the greatest influence on ratings overall in the model. We're looking for a context-independent score for each word: rather than asking the model to tell us what it knows about particular reviews, we're asking the model what it knows in general. To find this, we calculate the derivative of the network's output $y$ with respect to the representation of a review $x$: $\frac{\partial}{\partial x} y$. In this case, $x$ is a vector with only one nonzero value for a particular word, meaning that we're taking the derivative of a review that contains only one word. TensorFlow can actually calculate this in one call: tf.gradients($y$, $x$). The result is a vector with a derivative value for that particular word, which signifies its independent effect on predicted ratings. We repeat this for every word in the model's vocabulary. Here is a list of the 20 words that most negatively (red) and positively (blue) predict scotch ratings in our dataset.

In [6]:
HTML(vis_wordlist)
Out[6]:
awful (-0.446)
worst (-0.404)
buy again no (-0.373)
mixer (-0.368)
gross (-0.366)
short length (-0.364)
urine (-0.353)
spoiled (-0.348)
cheap (-0.347)
e000a (-0.336)
cf (-0.335)
nail polish remover (-0.332)
thin mouthfeel (-0.311)
drain (-0.309)
flat (-0.305)
coke (-0.302)
unbalanced (-0.301)
bad (-0.299)
unpleasant (-0.298)
terrible (-0.293)
000.0 proof (0.354)
fantastic (0.317)
birthday (0.287)
glenmorangie astar (0.28)
nicely balanced (0.28)
delicious (0.262)
final notes (0.261)
highly recommend (0.255)
justice (0.254)
warms (0.246)
unable (0.246)
excellent (0.244)
my first foray into (0.237)
ardbeg uigeadail (0.233)
$ 000 (0.233)
my favourite (0.232)
soil (0.231)
sweetly (0.231)
stagg (0.23)
my favorites (0.23)

For many words, it's probably intuitive why they appear on one side or the other of this list. Some of them generically express negative or positive sentiment ("awful", "fantastic"). Others have positive or negative meaning to people who are aware of scotch, or alcoholic beverages in general ("spoiled", "flat", "nicely balanced"). To those who aren't scotch experts, a few words may need additional context to understand their negative or positive influence ("stagg" and "coke" are two that stumped us, for example). This list is not sensitive to word frequency, so some words may have strong influence despite only showing up in a few reviews. For instance, only 86 of the 11832 reviews used to train the model discuss "nail polish remover", but these few reviews do have low ratings. Apparently this is enough for the model to know that scotch that resembles "nail polish remover" is bad scotch.

Which words are important in a single review?

The word list shows which words are most influential in the model at large, but it is also helpful to visualize word influence for an individual review. For instance, we can highlight influential words in a review and vary their color intensity according to their context-independent influence values.

In [11]:
HTML(review_vis)
Out[11]:

picked this up at finewineandgoodspirits.com - 000.0 proof , i had originally had it at doc crow’s in louisville with dinner , so i thought i would try a bottle and see what’s what . sampled in a glencairn price 00.00 plus tax and shipping ( plcb color almost cordovan - about the color of light maple syrup nose raisins , cherries , spicy with some definite heat , maybe a little vanilla and cayenne . got buffalo trace all over it taste definitely hot , more so than the nose . raisins , cinnamon , peppery . a bit of cherry . complex . after a bit i poured it into a tumbler and it opened up quite a bit , much bolder than in the glen cairn . sort of like turning the volume up on the stereo i guess finish very long , lasts all the way down . warm , stays like a good peppery étouffée . not as sweet as the nose score a solid 00 . this is a nice bottle to have on the shelf when you want something complex and with a little oomph ! this bottle will last me awhile but i would definitely buy again overall this is something to sit and cherish with friends - i can easily see us hanging out around the fire pit with a glass of this to ward off the chill in the fall . it’s definitely not something you’re going want to slam down at a tailgate or drown with coke . i have a bottle of stagg in the bunker , i’ll have to get some friends together and do a barrel proof flight with stagg , stagg jr and some others . (review link, actual rating: 90/100, predicted rating: 94/100)

We now get to see the context surrounding words in the word list, making it easier to understand their influence. This review suggests that "stagg" is a favorable brand of scotch, and "coke" is a mixer for less favorable scotch. Even though we're visualizing these words in context, the highlighting still reflects their context-independent scores in the word list. Alternatively, we can compute the context-sensitive influence of words on the ratings for this particular review. To do this, we again take the derivative of the output $y$ with respect to the review $x$, $\frac{\partial}{\partial x} y$, where now $x$ is a vector of word counts for this review.

In [13]:
HTML(review_vis)
Out[13]:

picked this up at finewineandgoodspirits.com - 000.0 proof , i had originally had it at doc crow’s in louisville with dinner , so i thought i would try a bottle and see what’s what . sampled in a glencairn price 00.00 plus tax and shipping ( plcb color almost cordovan - about the color of light maple syrup nose raisins , cherries , spicy with some definite heat , maybe a little vanilla and cayenne . got buffalo trace all over it taste definitely hot , more so than the nose . raisins , cinnamon , peppery . a bit of cherry . complex . after a bit i poured it into a tumbler and it opened up quite a bit , much bolder than in the glen cairn . sort of like turning the volume up on the stereo i guess finish very long , lasts all the way down . warm , stays like a good peppery étouffée . not as sweet as the nose score a solid 00 . this is a nice bottle to have on the shelf when you want something complex and with a little oomph ! this bottle will last me awhile but i would definitely buy again overall this is something to sit and cherish with friends - i can easily see us hanging out around the fire pit with a glass of this to ward off the chill in the fall . it’s definitely not something you’re going want to slam down at a tailgate or drown with coke . i have a bottle of stagg in the bunker , i’ll have to get some friends together and do a barrel proof flight with stagg , stagg jr and some others .

Here the focus shifts to the words that most influence the prediction for this specific review. The word "complex", for instance, has high context-independent influence, but it is not particularly important for assigning a rating to this review. If it were omitted, the rating would not change very much. In this analysis, the highlighted words are those that would impact the predicted rating if they didn't appear in the review. This is a little counterintuitive, because it seems like words with strong overall influence should have proportionate influence within a given review. Look at what happens when we truncate this review to the first few sentences.

In [15]:
HTML(review_vis)
Out[15]:

picked this up at finewineandgoodspirits.com - 000.0 proof , i had originally had it at doc crow’s in louisville with dinner , so i thought i would try a bottle and see what’s what . sampled in a glencairn price 00.00 plus tax and shipping ( plcb color almost cordovan - about the color of light maple syrup nose raisins , cherries , spicy with some definite heat , maybe a little vanilla and cayenne . got buffalo trace all over it taste definitely hot , more so than the nose . raisins , cinnamon , peppery . a bit of cherry . complex .

Now the word "complex" becomes important, and the words "shipping" and "cayenne" increase their influence as well. This is why: in the original review, these words only contributed moderately to the positive rating since there were several other positive words. In this shortened review, there are fewer words with positive influence, so the remaining ones become even more influential.

Which words characterize reviews?

One way to think about the hidden layer in a neural network is that each hidden unit corresponds to a topic dimension of the reviews. For instance, a movie review model could have a hidden unit for each genre as the topics (e.g. action, suspense, comedy), although this doesn't seem to be the case in our model. Instead of looking at how words affect the output, we'll now focus on the hidden layer to find the words that characterize the topic dimensions of these reviews.

Which words best describe review topics?

In the previous analysis we took the derivative of the predictions with respect to the input to find influential words. We can do the same with any layer of a model, by taking the derivative of that layer with respect to the input layer. When there's only one hidden layer, its derivative will be proportional to the weights between the input words and each hidden unit, so we actually don't need to compute the derivative. The weights of each unit already tell us which corresponding words are most influential to that unit. Turning to the movie reviews dataset, the aggregated and context-independent analysis shows the top 20 words with the highest weights for each of the five hidden units in the model:

In [18]:
HTML(vis_wordlist)
Out[18]:
sadly (0.227)
mediocre (0.223)
disappointing (0.215)
certainly (0.205)
fails (0.199)
relate (0.194)
bland (0.191)
problem (0.183)
adequate (0.177)
collective (0.171)
archie (0.171)
hype (0.168)
revenge (0.167)
lacking (0.166)
fans (0.166)
christopher lee (0.166)
curiosity (0.165)
strictly (0.165)
below average (0.164)
rogue (0.16)
worst (0.652)
awful (0.561)
waste (0.534)
horrible (0.456)
avoid (0.426)
terrible (0.422)
crappy (0.37)
bad (0.35)
insult (0.347)
sucks (0.344)
turkey (0.333)
pointless (0.331)
warn (0.321)
poorly (0.319)
worse (0.309)
piece (0.301)
complete waste (0.299)
junk (0.298)
stinks (0.297)
crap (0.294)
johnny depp (0.279)
high hopes (0.254)
downhill (0.244)
roughly (0.236)
mystery science theater 0000 (0.23)
replaced (0.226)
flop (0.219)
depressing (0.215)
morbid curiosity (0.215)
get rid (0.207)
trusted (0.205)
ice-t (0.199)
trite (0.199)
patriotic (0.199)
dopey (0.198)
high-school (0.197)
lacked (0.197)
vast (0.196)
stephen (0.194)
gardener (0.194)
complain (0.296)
ricky (0.295)
delicious (0.275)
likewise (0.272)
hilariously (0.264)
bruce campbell (0.261)
caddyshack (0.257)
noticed (0.25)
biting (0.245)
widescreen (0.241)
brilliantly (0.24)
idiocracy (0.234)
cloth (0.228)
good.the (0.227)
mavens (0.227)
factory (0.224)
referred (0.223)
fantastically (0.218)
illusions (0.217)
eisenstein (0.216)
planned (0.323)
damn (0.285)
alan rickman (0.28)
pornographic (0.274)
good.the (0.271)
bravo (0.256)
flaw (0.245)
awe (0.244)
adrian (0.243)
afraid (0.242)
sites (0.238)
reached (0.237)
unexpected (0.236)
desi (0.235)
canceled (0.235)
clueless (0.232)
remakes (0.227)
killer snowman (0.227)
scariest (0.225)
memorable quotes (0.224)

Neural networks usually have far more than 5 hidden units, but for the purpose of visualizing the units as topic dimensions in this post, we kept the model very small. Note that for the hidden layer, it only makes sense to show words with positive influence, since negative weights indicate the word does not activate that topic. Examining this list, we can start to see a few patterns among the words in each dimension. The yellow unit is almost exclusively composed of strongly negative sentiment words. The purple words seem to express a more moderate dissatisfaction. The red unit has more positive sentiment words, and the brown unit contains words suggesting surprise. Still, it seems difficult to assign an abstract topic label that summarizes the words in each dimension. Just as with the output analysis, we can try to disambiguate this list by viewing it from the perspective of specific reviews.

Which words describe topics in a single review?

Interestingly, "johnny depp" is the most influential word in the blue topic, so let's look its influence within an individual review.

In [21]:
HTML(review_vis)
Out[21]:

this film had a lot of potential - it's a great story and has the potential to be very creepy . but of course tim burton doesn't really do creepy films , he does wacky cartoonish films. and i usually like tim burton's stuff . but i thought this film was really weak . the best thing about the film ( and it is actually worth seeing just for this ) was the art direction - the film has an amazing intangible quality to it. the script was not good . it was boring in parts and confusing in other parts , and there was no building of characters . i never really cared that people were having their heads lopped off by a headless being . i thought johnny depp had a good thing going with his approach to the character , but given that the script was weak he couldn't go too far with it - and i was very irritated by the attempts at a slight accent on his and christina ricci's parts.anyway , it is sadly not a great film and not worth seeing unless you are interested in the art direction . (review link, actual rating: 3/10, predicted rating: 4/10)

Each word is colored by the unit where it has the highest context-independent influence. From this perspective, it looks like all the topics are relevant to some degree in this review. Alternatively, to visualize the context-sensitive influence, we can take the derivative of the hidden layer's output $h$ with respect to this specific review $x$: $\frac{\partial}{\partial x} h$. The derivative is a 2-D tensor (i.e., matrix) with rows for input elements (i.e. word counts) and columns for hidden units, and we can use it to find the most strongly associated words for each hidden dimension.

In [23]:
HTML(review_vis1)
Out[23]:

this film had a lot of potential - it's a great story and has the potential to be very creepy . but of course tim burton doesn't really do creepy films , he does wacky cartoonish films. and i usually like tim burton's stuff . but i thought this film was really weak . the best thing about the film ( and it is actually worth seeing just for this ) was the art direction - the film has an amazing intangible quality to it. the script was not good . it was boring in parts and confusing in other parts , and there was no building of characters . i never really cared that people were having their heads lopped off by a headless being . i thought johnny depp had a good thing going with his approach to the character , but given that the script was weak he couldn't go too far with it - and i was very irritated by the attempts at a slight accent on his and christina ricci's parts.anyway , it is sadly not a great film and not worth seeing unless you are interested in the art direction .

Now each word is colored by the unit with the highest derivative for that word, so we only see the dimensions that are "turned on" (i.e. have high values) for this review. Despite that this review contains words with red and yellow context-indepedent influence, only the blue and purple units are activated. It's still a bit mysterious what these colors represent. We can possibly get some further clues by comparing the colors of different reviews side by side. Here's the context-sensitive analysis for a different review of the same movie (Sleepy Hollow):

In [25]:
HTML(review_vis2)
Out[25]:

surely this film was hacked up by the studio ? perhaps not but i feel there were serious flaws in the storytelling that if not attributed to the editing process could only be caused by grievously bad , criminal indeed , writing and directing.i understand the effect burton wished to achieve with the stylised acting similar to the gothic fairytale atmosphere of edward scissorhands , but here unfortunately it falls flat and achieves no mythical depth of tropes but only the offensive tripe of affectation . ie bad acting and shallow characterisation even for a fairytale.finally not that scary , indeed only mildly amusing in its attempts . the use of dialogue as a vehicle for plot background was clumsy and unnecessary . the mystery of who is the headless horseman would suffice , no need for the myth about a german mercenary , although christopher walken did cut a dashing figure but not that menacing - seeing the horsemans head makes him seem far friendlier that a decapitated inhuman nine foot tall spirit as in the original legend.no real rhythm or universal tone was ever established and not a classic in burtons oevure . stilted and clipped as my parting shot ... (review link, actual rating: 1/10, predicted rating: 1/10)

And here's a comparison of their hidden activation scores, which confirms the second review has more yellow activation and less blue than the first review, as suggested by the context-sensitive highlighting.

In [27]:
comparison_graph(*graph_params)

While we can't draw any hard conclusions from this comparison, we can hypothesize. The similarity in the purple activation of the reviews may reflect their shared disappointment about the movie. Moreover, the model seems to have detected that the second review expresses more outrage, given its higher yellow activation. And perhaps the higher blue activity in the first review comes from its initial optimism about promising features of the movie ("johnny depp"?).

Conclusion

Neural networks are powerful models for text processing, but they don't naturally explain their decision-making to people. We demonstrated some straightforward ways to visualize how a neural network perceives words in the context of a prediction task. Future work will look for strategies to better explain how words influence the latent representation of text inside the model.

Notes

  • The examples shown here were taken primarily from the data held out during training for each model, with the exception of Review 1 in the hidden analysis, which was taken from the training set because it was an interesting example of "johnny depp"'s mysterious influence.
  • The methods explored here can be adapted to other types of neural networks. For example, our model ignores word order in the reviews, but using a order-sensitive model like a recurrent neural network (RNN) could better capture context-sensitive influence. Consider a review that says "This scotch was far from the worst I've tasted" as opposed to "This scotch was the worst I've tasted". RNNs process words sequentially, so the influence of "worst" in the first review is sensitive to the words "far from the" that precede it. Therefore it is more likely to learn that "worst" has much less negative influence in the first review than in the second. Our existing model processes words as a set, so it is not explicitly aware of sequential effects on word influence.

Related Work