Skip to main content

A way to sort a Python dictionary by value, instead of key

Here I used the code on page 122 ~ 123 of the book 'Python for Informatics’ by Charles Severance (Dr. Chuck) as an example to explain how to sort the key value pairs in a Python dictionary by value.

  
# open the file named remeo.txt
fhand = open('romeo.txt')

# create a dictionary
counts = dict()

# read the file line by line
for line in fhand:
    # split the line by whitespace characters (space, tab, newline, return, formfeed)
    words = line.split()
    
    # read the individual word
    for word in words:
        # increase the word count by 1 for that specific 'word'
        counts[word] = counts.get(word, 0 ) + 1

# create our list
lst = list()
# iterate over each tuple (key value pair) in the list generated by counts.items()
for key, val in counts.items():
    # create a new tuple (value, key) and append to our list, so that sort() will sort the list by the first element, value.
    # if we want to sort the list (or you can say dictionary) by key, we can create new tuple (key, value), with the first element being key , and append to our list.
    lst.append( (val, key) )

# sort our list in reverse order
lst.sort(reverse=True)

# iterate over each tuple (value key pair) in the list generated by extracting the first 10 elements in our list
for val, key in lst[:10] :
    # display the key and value
    print key, val


# we are done
fhand.close()


In this example, we want to find the 10 most common words in a file named 'romeo.txt', with the knowledge that words are separated by space in each line.

This is a good example of how to sort the key value pairs in a Python dictionary by value because we want to find the 10 most common words (sort by word count, the value in the dictionary), not the first 10 words in the alphabetical order (sort by word, the key in the dictionary).

In the first chunk of the code, we accumulated the count for each word in a dictionary named 'counts.'

In the second chunk of the code, in order to find the top 10 common words, we created a list, ‘lst', using the value key pairs in the dictionary. We did that because 1) we wanted to find the 10 most common words and 2) there was no 'order' in a Python dictionary. Since list.sort() by default will compare the first element in a tuple first, which is the key in this case, we generated a list with each tuple consisting of the value as the first element and then the key as the second. By switching the order of key and value in a new tuple, we enabled list.sort() to sort by value, rather than key. Moreover, by specifying 'reverse=True', we instructed the sort() method to sort the list by value in 'descending' order.

In the last chunk of code, the last for loop, the [:10] expression allowed us to create a new list that consist of only the first 10 elements, i.e., tuples, in 'lst' we sorted in the second chunk. The newly generated list will consist of 10 (word count, word) tuples with the largest word count, because we have sorted the list by word count in descending order. We then iterated over the new list and printed the value key pairs, which were the top 10 common words (key) with their word count (value).

Let me know whether the explanation is clear!

Comments

Popular posts from this blog

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

I wanted to setup MySQL to use utf-8 on the MAMP installation on my Mac. I tried the instructions from this article: http://cameronyule.com/2008/07/configuring-mysql-to-use-utf-8/ However, I kept getting error messages that are similar to this one [ERROR] /Applications/MAMP/Library/bin/mysqld: unknown variable 'default-collation=utf8_general_ci' I did some search and realized that several variables are deprecated. Reference: http://dev.mysql.com/doc/refman/5.1/en/server-options.html Therefore, I added the following lines into /Applications/MAMP/conf/my.cnf [mysql] character-set-server=utf8 [client] character-set-server=utf8 [mysqld] character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' I restarted the server and mysql run successfully with relevant variables being set correctly. In the "Variables" tab under phpMyAdmin interface (ex. http://localhost:8888/MAMP/?language=English) character set c

Using Pandoc for Doing Citation and Bibliography in Markdown

Markdown is a simple formatting syntax that allows you to do common formatting with ease. Pandoc is a feature rich interpreter that helps you convert documents from one format to anther. If you are writing homework, research papers, or anything that needs citation and a bibliography, you can totally use Markdown and Pandoc to achieve that. Below is a set of instructions that you can follow to generate a document with in-text citation and bibliography. First, install Pandoc and the extension for creating citation ( pandoc-citeproc ).  There are several ways to install Pandoc, you can choose one of them recommended on the official website . On Mac, one way is to install Homebrew , a package manager, and then use Homebrew to install Pandoc and the extension. For Windows users, please refer the official website on how to install Pandooc and extension. Here I will show how to install Pandoc and the extension for citation through Homebrew. After installing homebrew, you can exec

Use NVM to handle Angular-Node.js incompatibility (e.g., Uncaught SyntaxError: Unexpected token 'export')

Overview If you are a fullstack developer, the chances are that you will be creating multiple frontend or Node.js projects along the way. When you are creating new projects using the newest version of the cli (command-line interface) of a particular framework, you might be asked to install the most updated version of Node.js in order to utilize the newest feature. This might involve upgrading your Node.js version, which might not be compatible with other existing projects (e.g., using Angular) you have created before. A potential solution is to use nvm (Node Version Manager) to install multiple versions of Node.js and use a particular version of the Node.js to install the proper version of the cli (e.g., Angular-CLI) that can be used to manage a particular project. Here I will use a problem I run into to explain how to solve it. Problem There is a compatibility issue between Node.js and angular.js. For instance, I run into an error when I was using an incompatible version of Node.js