Skip to main content

A way to sort a Python dictionary by value, instead of key

Here I used the code on page 122 ~ 123 of the book 'Python for Informatics’ by Charles Severance (Dr. Chuck) as an example to explain how to sort the key value pairs in a Python dictionary by value.

  
# open the file named remeo.txt
fhand = open('romeo.txt')

# create a dictionary
counts = dict()

# read the file line by line
for line in fhand:
    # split the line by whitespace characters (space, tab, newline, return, formfeed)
    words = line.split()
    
    # read the individual word
    for word in words:
        # increase the word count by 1 for that specific 'word'
        counts[word] = counts.get(word, 0 ) + 1

# create our list
lst = list()
# iterate over each tuple (key value pair) in the list generated by counts.items()
for key, val in counts.items():
    # create a new tuple (value, key) and append to our list, so that sort() will sort the list by the first element, value.
    # if we want to sort the list (or you can say dictionary) by key, we can create new tuple (key, value), with the first element being key , and append to our list.
    lst.append( (val, key) )

# sort our list in reverse order
lst.sort(reverse=True)

# iterate over each tuple (value key pair) in the list generated by extracting the first 10 elements in our list
for val, key in lst[:10] :
    # display the key and value
    print key, val


# we are done
fhand.close()


In this example, we want to find the 10 most common words in a file named 'romeo.txt', with the knowledge that words are separated by space in each line.

This is a good example of how to sort the key value pairs in a Python dictionary by value because we want to find the 10 most common words (sort by word count, the value in the dictionary), not the first 10 words in the alphabetical order (sort by word, the key in the dictionary).

In the first chunk of the code, we accumulated the count for each word in a dictionary named 'counts.'

In the second chunk of the code, in order to find the top 10 common words, we created a list, ‘lst', using the value key pairs in the dictionary. We did that because 1) we wanted to find the 10 most common words and 2) there was no 'order' in a Python dictionary. Since list.sort() by default will compare the first element in a tuple first, which is the key in this case, we generated a list with each tuple consisting of the value as the first element and then the key as the second. By switching the order of key and value in a new tuple, we enabled list.sort() to sort by value, rather than key. Moreover, by specifying 'reverse=True', we instructed the sort() method to sort the list by value in 'descending' order.

In the last chunk of code, the last for loop, the [:10] expression allowed us to create a new list that consist of only the first 10 elements, i.e., tuples, in 'lst' we sorted in the second chunk. The newly generated list will consist of 10 (word count, word) tuples with the largest word count, because we have sorted the list by word count in descending order. We then iterated over the new list and printed the value key pairs, which were the top 10 common words (key) with their word count (value).

Let me know whether the explanation is clear!

Comments

Popular posts from this blog

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

I wanted to setup MySQL to use utf-8 on the MAMP installation on my Mac. I tried the instructions from this article: http://cameronyule.com/2008/07/configuring-mysql-to-use-utf-8/ However, I kept getting error messages that are similar to this one [ERROR] /Applications/MAMP/Library/bin/mysqld: unknown variable 'default-collation=utf8_general_ci' I did some search and realized that several variables are deprecated. Reference: http://dev.mysql.com/doc/refman/5.1/en/server-options.html Therefore, I added the following lines into /Applications/MAMP/conf/my.cnf [mysql] character-set-server=utf8 [client] character-set-server=utf8 [mysqld] character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' I restarted the server and mysql run successfully with relevant variables being set correctly. In the "Variables" tab under phpMyAdmin interface (ex. http://localhost:8888/MAMP/?language=English) character set c...

ngrok, a service to help you get a public URL for your local webserver

If you are looking for options to test your web app hosted on your own machine (e.g., laptop) remotely with someone, I recommend using a service called ngrok. https://ngrok.com/ It has many usages, but in the context of testing an app hosted on your own machine, the most important part is giving you a public URL that will redirect all the requests to your local webserver (e.g., Apache, Nginx, or whatever server you are running). You can give this URL to a testing participant without the need to host the app on a remote server. For instance, if you are testing your app using a sever on your machine, typically you can access your app in a URL like:  http://localhost:3000/?study=11 Using ngrok, you will have a dynamically generated URL like the following: https://3ebe3c019867.ngrok.io/ The service will redirect requests to https://3ebe3c019867.ngrok.io to http://localhost:3000 You can then share the following link with your testing participant for the participant to use your app. http...

線上筆記本、便利貼整理 Online Note Taking Service (Especially Sticky Note) List

Some of the note taking service I have tried! Sticky Note lino it Comment: 精美、除了沒有辦法 download as file + print  之外,應該是這個  list  中的  best choice 中文資料儲存沒有問題,不會變亂碼 無法 double click create note, 但是可以用拖拉方式產生 可以 share, send link, embed, rss, 可用 email post, 無法存檔 無法 double click to edit,但是跳出視窗的速度還 OK、可直接 drag & drop 有 public(group) vs. private 的設定 不能 print(應該說 print 的時候內容不會出現) 有 Task 功能 無法download as file (應該沒有可以的) squareleaf Comment: 比較不  fancy  ,但是簡單可愛 中文資料儲存沒有問題,不會變亂碼 可用 browser print, 但是排版沒有很好 可直接點選編輯(不用等跳出小視窗後再 input) 無法在空白地方 double click 產生新 Note 無法 share, 無法download as file postica Comment: 精簡,稍微制式 中文資料儲存沒有問題,不會變亂碼 點選編輯的時候很慢,反應遲鈍 可以 Print, 但是中文編碼要選 UTF-8 可以 drag & drop 無法 download as file, 無法 share wall wisher Comment: 精美 中文儲存有問題 雙擊 create note, 可以拖拉 每個 note 有 160 character 的限制 文字沒有 Format,無法調整大小, 顏色 可 share, send link, rss fee...