Skip to main content

Tutorial: Look for strings that start with a specific sub-string using regular expression in Python

In this example, we used regular expression to implement the strings.startswith() function. The program will ask for users' input on a sub-string to find at the beginning of each string. The program will then open a file and print out the lines that start with the sub-string being specified.

 
#Implementing Python's string.startswith() using regular expression

# import the regular expression module
import re

# allow users to specify what string to look for as the start of a string
start_str = raw_input("Enter the string that starts a line: ")

# specify the name of the file to search within
file_name = 'my_file.txt'

# open the file for reading
file_h = open(file_name, 'r')

# define the pattern
# ^ : start of the string
# {} : place holder for start_str
# ^{} -> ^start_str 
# -> pattern: a string that starts with the content stored in start_str
# if You want to specify a word, followed by a space, that starts a string 
# leave a space after the placeholder: '^{} '
pattern_str = '^{}'.format(start_str)

# read the file line by line
for line in file_h:
    # optional: strip the line
    line = line.strip()

    # search for sub-string that match the pattern in this string, the 'line'
    # this is equivalent to line.startswith(start_str)
    if re.search(pattern_str, line):
        # found the sub-string that matches the pattern in this string, the 'line'
        # print the entire string
        print line
    else:
        # didn't find the pattern in this string
        # do nothing
        pass

# we are done



The pattern_str stored the pattern we specified using regular expression. The '^' character allows us to specify that we are looking for the pattern at the beginning of a string. The '^{}'.format(start_str) allows us to generate a string that consists of '^' and the content in the start_str variable. In this case, '^{}'.format(start_str) will generate a string '^Hello' when the content/value of the variable start_str is 'Hello'. By using '^{}'.format(start_str), we can dynamically change the pattern based on users' input.

The re.search( pattern_str, some_str) will search for sub-string in some_str that matches the pattern in psttern_str. If a match is found, the search() will return a match Object. Otherwise, the search() will return None.

As a result, we can print out the lines that start with a sub-string when re.search() tells us that it has found the pattern in the string being provided, which is the value of the variable 'line' in this case.

Comments

Popular posts from this blog

Use NVM to handle Angular-Node.js incompatibility (e.g., Uncaught SyntaxError: Unexpected token 'export')

Overview If you are a fullstack developer, the chances are that you will be creating multiple frontend or Node.js projects along the way. When you are creating new projects using the newest version of the cli (command-line interface) of a particular framework, you might be asked to install the most updated version of Node.js in order to utilize the newest feature. This might involve upgrading your Node.js version, which might not be compatible with other existing projects (e.g., using Angular) you have created before. A potential solution is to use nvm (Node Version Manager) to install multiple versions of Node.js and use a particular version of the Node.js to install the proper version of the cli (e.g., Angular-CLI) that can be used to manage a particular project. Here I will use a problem I run into to explain how to solve it. Problem There is a compatibility issue between Node.js and angular.js. For instance, I run into an error when I was using an incompatible version of Node.js ...

Using Pandoc for Doing Citation and Bibliography in Markdown

Markdown is a simple formatting syntax that allows you to do common formatting with ease. Pandoc is a feature rich interpreter that helps you convert documents from one format to anther. If you are writing homework, research papers, or anything that needs citation and a bibliography, you can totally use Markdown and Pandoc to achieve that. Below is a set of instructions that you can follow to generate a document with in-text citation and bibliography. First, install Pandoc and the extension for creating citation ( pandoc-citeproc ).  There are several ways to install Pandoc, you can choose one of them recommended on the official website . On Mac, one way is to install Homebrew , a package manager, and then use Homebrew to install Pandoc and the extension. For Windows users, please refer the official website on how to install Pandooc and extension. Here I will show how to install Pandoc and the extension for citation through Homebrew. After installing homebrew, you can ...

Setting MySQL to Use UTF-8 on MAMP (MySQL 5.5.9, or 5+)

I wanted to setup MySQL to use utf-8 on the MAMP installation on my Mac. I tried the instructions from this article: http://cameronyule.com/2008/07/configuring-mysql-to-use-utf-8/ However, I kept getting error messages that are similar to this one [ERROR] /Applications/MAMP/Library/bin/mysqld: unknown variable 'default-collation=utf8_general_ci' I did some search and realized that several variables are deprecated. Reference: http://dev.mysql.com/doc/refman/5.1/en/server-options.html Therefore, I added the following lines into /Applications/MAMP/conf/my.cnf [mysql] character-set-server=utf8 [client] character-set-server=utf8 [mysqld] character-set-server=utf8 collation-server=utf8_general_ci init-connect='SET NAMES utf8' I restarted the server and mysql run successfully with relevant variables being set correctly. In the "Variables" tab under phpMyAdmin interface (ex. http://localhost:8888/MAMP/?language=English) character set c...