Tutorial: Using String.find() and String slicing with [n:m] notation to extract data in Python

When you need to extract data from a string in Python, you can use the built-in String.find() method in Python and the String[n:m] notation to extract the sub-string.

string.find(str) will return the starting position of the first instance of str in string. For example. 'abcdecd'.find('cd') will return 2, because the first 'cd' instance starts at position 2 (not 3, since position starts with 0). If no str can be found in the string, the method will return -1.

str[n:m] notation will extract the sub-string starting at position n and ending at position m-1, not including the m-th character. For example, if the content of my_str is 'I don't know', my_str[2:] will return 'don't know'.

By using string.find() and [n:m] notation, we can write a script to automatically extract the data we need from a string.

For example, we have a file test_score.txt that records the scores of all the students enrolled in a class. We know that each line records the student's first name and his or her score of the final as follow.

Peter 100
Anna 99
Henry 98
Jerry 97
…

If we want to calculate the average of the final, we can use find() and [n:m] notation to achieve this.


#average_sample.py

# used to accumulate the scores
total_score = 0.0

# used to count how many scores (students) we have
number_of_score = 0

# open the file for 'read'
file_h = open('test_score.txt', 'r')

# read the file line by line
for line in file_h:
    # find the marker
    white_space_position = line.find(' ')

    # calculate the position of the score 
    # in relation to position of the marker
    score_position =  white_space_position + 1

    # extract the substring and 
    # convert the substring into a floating point number
    score = float(line[score_position:])

    # print the score
    print 'Score: {}'.format(score)

    # accumulate the scores
    total_score = total_score + score

    # accumulate how many scores we got
    number_of_score = number_of_score + 1

# calculate and display the average
print 'Average: {}'.format(total_score/number_of_score)

You will get the following output on the terminal.

Score: 100.0

Score: 99.0

Score: 98.0

Score: 97.0

Average: 98.5

Aventurine Yao

Search This Blog

Tutorial: Using String.find() and String slicing with [n:m] notation to extract data in Python

Labels

Comments

Popular posts from this blog

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

Using Pandoc for Doing Citation and Bibliography in Markdown

線上筆記本、便利貼整理 Online Note Taking Service (Especially Sticky Note) List