Here I used the code on page 122 ~ 123 of the book 'Python for Informatics’ by Charles Severance (Dr. Chuck) as an example to explain how to sort the key value pairs in a Python dictionary by value.
In this example, we want to find the 10 most common words in a file named 'romeo.txt', with the knowledge that words are separated by space in each line.
This is a good example of how to sort the key value pairs in a Python dictionary by value because we want to find the 10 most common words (sort by word count, the value in the dictionary), not the first 10 words in the alphabetical order (sort by word, the key in the dictionary).
In the first chunk of the code, we accumulated the count for each word in a dictionary named 'counts.'
In the second chunk of the code, in order to find the top 10 common words, we created a list, ‘lst', using the value key pairs in the dictionary. We did that because 1) we wanted to find the 10 most common words and 2) there was no 'order' in a Python dictionary. Since list.sort() by default will compare the first element in a tuple first, which is the key in this case, we generated a list with each tuple consisting of the value as the first element and then the key as the second. By switching the order of key and value in a new tuple, we enabled list.sort() to sort by value, rather than key. Moreover, by specifying 'reverse=True', we instructed the sort() method to sort the list by value in 'descending' order.
In the last chunk of code, the last for loop, the [:10] expression allowed us to create a new list that consist of only the first 10 elements, i.e., tuples, in 'lst' we sorted in the second chunk. The newly generated list will consist of 10 (word count, word) tuples with the largest word count, because we have sorted the list by word count in descending order. We then iterated over the new list and printed the value key pairs, which were the top 10 common words (key) with their word count (value).
Let me know whether the explanation is clear!
# open the file named remeo.txt
fhand = open('romeo.txt')
# create a dictionary
counts = dict()
# read the file line by line
for line in fhand:
# split the line by whitespace characters (space, tab, newline, return, formfeed)
words = line.split()
# read the individual word
for word in words:
# increase the word count by 1 for that specific 'word'
counts[word] = counts.get(word, 0 ) + 1
# create our list
lst = list()
# iterate over each tuple (key value pair) in the list generated by counts.items()
for key, val in counts.items():
# create a new tuple (value, key) and append to our list, so that sort() will sort the list by the first element, value.
# if we want to sort the list (or you can say dictionary) by key, we can create new tuple (key, value), with the first element being key , and append to our list.
lst.append( (val, key) )
# sort our list in reverse order
lst.sort(reverse=True)
# iterate over each tuple (value key pair) in the list generated by extracting the first 10 elements in our list
for val, key in lst[:10] :
# display the key and value
print key, val
# we are done
fhand.close()
In this example, we want to find the 10 most common words in a file named 'romeo.txt', with the knowledge that words are separated by space in each line.
This is a good example of how to sort the key value pairs in a Python dictionary by value because we want to find the 10 most common words (sort by word count, the value in the dictionary), not the first 10 words in the alphabetical order (sort by word, the key in the dictionary).
In the first chunk of the code, we accumulated the count for each word in a dictionary named 'counts.'
In the second chunk of the code, in order to find the top 10 common words, we created a list, ‘lst', using the value key pairs in the dictionary. We did that because 1) we wanted to find the 10 most common words and 2) there was no 'order' in a Python dictionary. Since list.sort() by default will compare the first element in a tuple first, which is the key in this case, we generated a list with each tuple consisting of the value as the first element and then the key as the second. By switching the order of key and value in a new tuple, we enabled list.sort() to sort by value, rather than key. Moreover, by specifying 'reverse=True', we instructed the sort() method to sort the list by value in 'descending' order.
In the last chunk of code, the last for loop, the [:10] expression allowed us to create a new list that consist of only the first 10 elements, i.e., tuples, in 'lst' we sorted in the second chunk. The newly generated list will consist of 10 (word count, word) tuples with the largest word count, because we have sorted the list by word count in descending order. We then iterated over the new list and printed the value key pairs, which were the top 10 common words (key) with their word count (value).
Let me know whether the explanation is clear!
Comments