Skip to main content

Tutorial: Look for strings that start with a specific sub-string using regular expression in Python

In this example, we used regular expression to implement the strings.startswith() function. The program will ask for users' input on a sub-string to find at the beginning of each string. The program will then open a file and print out the lines that start with the sub-string being specified.

 
#Implementing Python's string.startswith() using regular expression

# import the regular expression module
import re

# allow users to specify what string to look for as the start of a string
start_str = raw_input("Enter the string that starts a line: ")

# specify the name of the file to search within
file_name = 'my_file.txt'

# open the file for reading
file_h = open(file_name, 'r')

# define the pattern
# ^ : start of the string
# {} : place holder for start_str
# ^{} -> ^start_str 
# -> pattern: a string that starts with the content stored in start_str
# if You want to specify a word, followed by a space, that starts a string 
# leave a space after the placeholder: '^{} '
pattern_str = '^{}'.format(start_str)

# read the file line by line
for line in file_h:
    # optional: strip the line
    line = line.strip()

    # search for sub-string that match the pattern in this string, the 'line'
    # this is equivalent to line.startswith(start_str)
    if re.search(pattern_str, line):
        # found the sub-string that matches the pattern in this string, the 'line'
        # print the entire string
        print line
    else:
        # didn't find the pattern in this string
        # do nothing
        pass

# we are done



The pattern_str stored the pattern we specified using regular expression. The '^' character allows us to specify that we are looking for the pattern at the beginning of a string. The '^{}'.format(start_str) allows us to generate a string that consists of '^' and the content in the start_str variable. In this case, '^{}'.format(start_str) will generate a string '^Hello' when the content/value of the variable start_str is 'Hello'. By using '^{}'.format(start_str), we can dynamically change the pattern based on users' input.

The re.search( pattern_str, some_str) will search for sub-string in some_str that matches the pattern in psttern_str. If a match is found, the search() will return a match Object. Otherwise, the search() will return None.

As a result, we can print out the lines that start with a sub-string when re.search() tells us that it has found the pattern in the string being provided, which is the value of the variable 'line' in this case.

Comments

Popular posts from this blog

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

I wanted to setup MySQL to use utf-8 on the MAMP installation on my Mac. I tried the instructions from this article: http://cameronyule.com/2008/07/configuring-mysql-to-use-utf-8/ However, I kept getting error messages that are similar to this one [ERROR] /Applications/MAMP/Library/bin/mysqld: unknown variable 'default-collation=utf8_general_ci' I did some search and realized that several variables are deprecated. Reference: http://dev.mysql.com/doc/refman/5.1/en/server-options.html Therefore, I added the following lines into /Applications/MAMP/conf/my.cnf [mysql] character-set-server=utf8 [client] character-set-server=utf8 [mysqld] character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' I restarted the server and mysql run successfully with relevant variables being set correctly. In the "Variables" tab under phpMyAdmin interface (ex. http://localhost:8888/MAMP/?language=English) character set c...

線上筆記本、便利貼整理 Online Note Taking Service (Especially Sticky Note) List

Some of the note taking service I have tried! Sticky Note lino it Comment: 精美、除了沒有辦法 download as file + print  之外,應該是這個  list  中的  best choice 中文資料儲存沒有問題,不會變亂碼 無法 double click create note, 但是可以用拖拉方式產生 可以 share, send link, embed, rss, 可用 email post, 無法存檔 無法 double click to edit,但是跳出視窗的速度還 OK、可直接 drag & drop 有 public(group) vs. private 的設定 不能 print(應該說 print 的時候內容不會出現) 有 Task 功能 無法download as file (應該沒有可以的) squareleaf Comment: 比較不  fancy  ,但是簡單可愛 中文資料儲存沒有問題,不會變亂碼 可用 browser print, 但是排版沒有很好 可直接點選編輯(不用等跳出小視窗後再 input) 無法在空白地方 double click 產生新 Note 無法 share, 無法download as file postica Comment: 精簡,稍微制式 中文資料儲存沒有問題,不會變亂碼 點選編輯的時候很慢,反應遲鈍 可以 Print, 但是中文編碼要選 UTF-8 可以 drag & drop 無法 download as file, 無法 share wall wisher Comment: 精美 中文儲存有問題 雙擊 create note, 可以拖拉 每個 note 有 160 character 的限制 文字沒有 Format,無法調整大小, 顏色 可 share, send link, rss fee...

Brackets: a free editor/environment for web development

 There are a lot of options, and VS Code is one of the top contenders. I am a VS Code fan, but if you are looking for an alternative, Brackets is another option that I find appealing. It was built for web development, using HTML/CSS/Javascript. I think it is especially helpful for people who just start learning HTML/CSS (and maybe Javascript). http://brackets.io/ Brackets has some built-in features that are pretty convenient. 1. auto-complete for CSS property and value. 2. Live preview the webpage to reflect the changes being made. You can make changes in code and see the result instantly. 3. In-place editing of CSS rules (you can select an element/class name in HTML and press the short keys to edit the corresponding CSS rules directly). 4. Code to browser mapping: you can select/edit an element in HTML or a rule in CSS, and the corresponding user interface elements or those that will be affected by the CSS rule will be highlighted in the browser. See this video for an overview. Th...