Skip to main content

A way to sort a Python dictionary by value, instead of key

Here I used the code on page 122 ~ 123 of the book 'Python for Informatics’ by Charles Severance (Dr. Chuck) as an example to explain how to sort the key value pairs in a Python dictionary by value.

  
# open the file named remeo.txt
fhand = open('romeo.txt')

# create a dictionary
counts = dict()

# read the file line by line
for line in fhand:
    # split the line by whitespace characters (space, tab, newline, return, formfeed)
    words = line.split()
    
    # read the individual word
    for word in words:
        # increase the word count by 1 for that specific 'word'
        counts[word] = counts.get(word, 0 ) + 1

# create our list
lst = list()
# iterate over each tuple (key value pair) in the list generated by counts.items()
for key, val in counts.items():
    # create a new tuple (value, key) and append to our list, so that sort() will sort the list by the first element, value.
    # if we want to sort the list (or you can say dictionary) by key, we can create new tuple (key, value), with the first element being key , and append to our list.
    lst.append( (val, key) )

# sort our list in reverse order
lst.sort(reverse=True)

# iterate over each tuple (value key pair) in the list generated by extracting the first 10 elements in our list
for val, key in lst[:10] :
    # display the key and value
    print key, val


# we are done
fhand.close()


In this example, we want to find the 10 most common words in a file named 'romeo.txt', with the knowledge that words are separated by space in each line.

This is a good example of how to sort the key value pairs in a Python dictionary by value because we want to find the 10 most common words (sort by word count, the value in the dictionary), not the first 10 words in the alphabetical order (sort by word, the key in the dictionary).

In the first chunk of the code, we accumulated the count for each word in a dictionary named 'counts.'

In the second chunk of the code, in order to find the top 10 common words, we created a list, ‘lst', using the value key pairs in the dictionary. We did that because 1) we wanted to find the 10 most common words and 2) there was no 'order' in a Python dictionary. Since list.sort() by default will compare the first element in a tuple first, which is the key in this case, we generated a list with each tuple consisting of the value as the first element and then the key as the second. By switching the order of key and value in a new tuple, we enabled list.sort() to sort by value, rather than key. Moreover, by specifying 'reverse=True', we instructed the sort() method to sort the list by value in 'descending' order.

In the last chunk of code, the last for loop, the [:10] expression allowed us to create a new list that consist of only the first 10 elements, i.e., tuples, in 'lst' we sorted in the second chunk. The newly generated list will consist of 10 (word count, word) tuples with the largest word count, because we have sorted the list by word count in descending order. We then iterated over the new list and printed the value key pairs, which were the top 10 common words (key) with their word count (value).

Let me know whether the explanation is clear!

Comments

Popular posts from this blog

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

I wanted to setup MySQL to use utf-8 on the MAMP installation on my Mac. I tried the instructions from this article: http://cameronyule.com/2008/07/configuring-mysql-to-use-utf-8/ However, I kept getting error messages that are similar to this one [ERROR] /Applications/MAMP/Library/bin/mysqld: unknown variable 'default-collation=utf8_general_ci' I did some search and realized that several variables are deprecated. Reference: http://dev.mysql.com/doc/refman/5.1/en/server-options.html Therefore, I added the following lines into /Applications/MAMP/conf/my.cnf [mysql] character-set-server=utf8 [client] character-set-server=utf8 [mysqld] character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' I restarted the server and mysql run successfully with relevant variables being set correctly. In the "Variables" tab under phpMyAdmin interface (ex. http://localhost:8888/MAMP/?language=English) character set c...

ngrok, a service to help you get a public URL for your local webserver

If you are looking for options to test your web app hosted on your own machine (e.g., laptop) remotely with someone, I recommend using a service called ngrok. https://ngrok.com/ It has many usages, but in the context of testing an app hosted on your own machine, the most important part is giving you a public URL that will redirect all the requests to your local webserver (e.g., Apache, Nginx, or whatever server you are running). You can give this URL to a testing participant without the need to host the app on a remote server. For instance, if you are testing your app using a sever on your machine, typically you can access your app in a URL like:  http://localhost:3000/?study=11 Using ngrok, you will have a dynamically generated URL like the following: https://3ebe3c019867.ngrok.io/ The service will redirect requests to https://3ebe3c019867.ngrok.io to http://localhost:3000 You can then share the following link with your testing participant for the participant to use your app. http...

How to update multiple fields in an SQL update statement

Syntax: /* the correct way of putting multiple fields together, using comma */ UPDATE `stories` SET `content`='Once upon a time ...', `update_time`=now() WHERE id=1; Note: there is no 'and' between the fields you are trying to update. You should use comma between each pair of the fields you are trying to update. Otherwise, you will likely get an '0' as the result value in the 'content' field without getting any error message. If you do the following, you will likely get an '0' in the 'content' field, without any error message /* the wrong way of putting multiple fields together, using 'and' */ UPDATE `stories` SET `content`='My Story' and `update_time`=now() WHERE id=1; See also:  http://stackoverflow.com/a/7375371