Skip to main content

Tutorial: Using String.find() and String slicing with [n:m] notation to extract data in Python

When you need to extract data from a string in Python, you can use the built-in String.find() method in Python and the String[n:m] notation to extract the sub-string.

string.find(str) will return the starting position of the first instance of str in string. For example. 'abcdecd'.find('cd') will return 2, because the first 'cd' instance starts at position 2 (not 3, since position starts with 0). If no str can be found in the string, the method will return -1.

str[n:m] notation will extract the sub-string starting at position n and ending at position m-1, not including the m-th character. For example, if the content of my_str is 'I don't know', my_str[2:] will return 'don't know'.

By using string.find() and [n:m] notation, we can write a script to automatically extract the data we need from a string.

For example, we have a file test_score.txt that records the scores of all the students enrolled in a class. We know that each line records the student's first name and his or her score of the final as follow.

Peter 100
Anna 99
Henry 98
Jerry 97



If we want to calculate the average of the final, we can use find() and [n:m] notation to achieve this.


#average_sample.py

# used to accumulate the scores
total_score = 0.0

# used to count how many scores (students) we have
number_of_score = 0

# open the file for 'read'
file_h = open('test_score.txt', 'r')

# read the file line by line
for line in file_h:
    # find the marker
    white_space_position = line.find(' ')

    # calculate the position of the score 
    # in relation to position of the marker
    score_position =  white_space_position + 1

    # extract the substring and 
    # convert the substring into a floating point number
    score = float(line[score_position:])

    # print the score
    print 'Score: {}'.format(score)

    # accumulate the scores
    total_score = total_score + score

    # accumulate how many scores we got
    number_of_score = number_of_score + 1

# calculate and display the average
print 'Average: {}'.format(total_score/number_of_score)
 
You will get the following output on the terminal.

Score: 100.0
Score: 99.0
Score: 98.0
Score: 97.0
Average: 98.5

Comments

Popular posts from this blog

Use NVM to handle Angular-Node.js incompatibility (e.g., Uncaught SyntaxError: Unexpected token 'export')

Overview If you are a fullstack developer, the chances are that you will be creating multiple frontend or Node.js projects along the way. When you are creating new projects using the newest version of the cli (command-line interface) of a particular framework, you might be asked to install the most updated version of Node.js in order to utilize the newest feature. This might involve upgrading your Node.js version, which might not be compatible with other existing projects (e.g., using Angular) you have created before. A potential solution is to use nvm (Node Version Manager) to install multiple versions of Node.js and use a particular version of the Node.js to install the proper version of the cli (e.g., Angular-CLI) that can be used to manage a particular project. Here I will use a problem I run into to explain how to solve it. Problem There is a compatibility issue between Node.js and angular.js. For instance, I run into an error when I was using an incompatible version of Node.js ...

Using Pandoc for Doing Citation and Bibliography in Markdown

Markdown is a simple formatting syntax that allows you to do common formatting with ease. Pandoc is a feature rich interpreter that helps you convert documents from one format to anther. If you are writing homework, research papers, or anything that needs citation and a bibliography, you can totally use Markdown and Pandoc to achieve that. Below is a set of instructions that you can follow to generate a document with in-text citation and bibliography. First, install Pandoc and the extension for creating citation ( pandoc-citeproc ).  There are several ways to install Pandoc, you can choose one of them recommended on the official website . On Mac, one way is to install Homebrew , a package manager, and then use Homebrew to install Pandoc and the extension. For Windows users, please refer the official website on how to install Pandooc and extension. Here I will show how to install Pandoc and the extension for citation through Homebrew. After installing homebrew, you can ...

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

I wanted to setup MySQL to use utf-8 on the MAMP installation on my Mac. I tried the instructions from this article: http://cameronyule.com/2008/07/configuring-mysql-to-use-utf-8/ However, I kept getting error messages that are similar to this one [ERROR] /Applications/MAMP/Library/bin/mysqld: unknown variable 'default-collation=utf8_general_ci' I did some search and realized that several variables are deprecated. Reference: http://dev.mysql.com/doc/refman/5.1/en/server-options.html Therefore, I added the following lines into /Applications/MAMP/conf/my.cnf [mysql] character-set-server=utf8 [client] character-set-server=utf8 [mysqld] character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' I restarted the server and mysql run successfully with relevant variables being set correctly. In the "Variables" tab under phpMyAdmin interface (ex. http://localhost:8888/MAMP/?language=English) character set c...