Python More about Strings and Re

Getting help: For more details, you can always call the built-in dir function, which returns a list of all the attributes available for a given object

To get how the function actually works:

>>> help(s.isidentifier)

Help on built-in function isidentifier:

isidentifier(...) method of builtins.str instance
    S.isidentifier() -> bool

    Return True if S is a valid identifier according
    to the language definition.

    Use keyword.iskeyword() to test for reserved identifiers
    such as "def" and "class".

Find + Replace

>>> line = 'abcdefg abcdefg abcdefg'
>>> line.find('abc')
0
>>> line.replace('abc','xyz')
'xyzdefg xyzdefg xyzdefg'
>>> line
'abcdefg abcdefg abcdefg'

str.isalpha() – check all characters (a-z and A-Z)
str.isalnum() – check all characters (a-z, A-Z and 0-9)
str.isdigit() – check all characters (0-9)
str.islower(), str.isupper()

>>> 'abc'.isalpha()
True
>>> 'abc'.isalnum()
True
>>> 'abc123'.isalpha()
False
>>> 'abc123'.isalnum()
True
>>> 'abc123'.isdigit()
False
>>> '123'.isdigit()
True
>>> 'abc123'.islower()
True
>>>  

any(iterable)
# Return True if any element of the iterable is true. If the iterable is empty, return False

#How to check if string includes any (not all) of chars:
>>> s = 'abcABC123!@#'
>>> print(any(each.isalnum() for each in s))
True
>>> print(any(each.isalpha() for each in s))
True
>>> print(any(each.isdigit() for each in s))
True
>>> print(any(each.islower() for each in s))
True
>>> print(any(each.isupper() for each in s))
True
>>> s = 'abcABC'
>>> print(any(each.isalnum() for each in s))
True
>>> print(any(each.isalpha() for each in s))
True
>>> print(any(each.isdigit() for each in s))
False

Textwrap module

wrap() – split the string by width and return list
fill() – split the string by width and return strung with \n

>>> import textwrap
>>> string = 'long long'
>>> print(textwrap.wrap(string, 2))
['lo', 'ng', 'lo', 'ng']
>>> print(textwrap.fill(string, 2))
lo
ng
lo
ng
>>>

String Formatting

For Python 3:

>>> name = 'Dmitry'
>>> print('Hello, {}'.format(name))
Hello, Dmitry

Starting with Python 3.6 there’s a way better format:

>>> name = 'Dmitry'
>>> print(f'Hello, {name}')
Hello, Dmitry

Template Strings – good if your format strings are user-supplied (security)

>>> from string import Template
>>> t = Template('Hello, $name!')
>>> t.substitute(name=name)
'Hello, Dmitry!'

RE – RegEx – Regular Expression

Very useful!

  • findall() – Returns a list containing all matches (in the order they were found)
  • search() – Returns a Match object if there is a match anywhere in the string
  • split() – Returns a list where the string has been split at each match
  • sub() – Replaces one or many matches with a string
>>> import re
>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> msg = 'Call me at 555-555-5555 or 777-777-7777 tomorrow'
>>> print(phoneNumRegex.findall(msg))
['555-555-5555', '777-777-7777']

#or

>>> print(re.findall(r'\d\d\d-\d\d\d-\d\d\d\d',msg))
['555-555-5555', '777-777-7777']

Search() – Returns a Match object if there is a match anywhere in the string. only the first occurrence of the match will be returned

  • group() – Return the string matched by the RE
  • start() – Return the starting position of the match
  • end() – Return the ending position of the match
  • span() – Return a tuple containing the (start, end) positions of the match
>>> x = phoneNumRegex.search(msg)
>>> x
<_sre.SRE_Match object; span=(11, 23), match='555-555-5555'>
>>> x = phoneNumRegex.search(msg)
>>> x.group()
'555-555-5555'
>>> x.start(), x.end()
(11, 23)
>>> x.span()
(11, 23)
.Any character (except newline character)
^Starts with
$Ends with
*Zero or more occurrences
+One or more occurrences
{}Exactly the specified number of occurrences “.{6}”
>>> print(re.findall('^This',string))
['This']
>>> print(re.findall('^This.',string))
['This ']
>>> print(re.findall('^This.*!$',string))
['This is the string I used to play with "re" module!']
>>> print(re.findall('^This.*',string))
['This is the string I used to play with "re" module!']
>>> print(re.findall('!$',string))
['!']
>>> print(re.findall('.!$',string))
['e!']
>>> print(re.findall('.{6}!$',string))
['module!']
from here
# \A - Returns a match if the specified characters are at the beginning of the string
>>> string = 'This is the string I used to play with "re" module!'
>>> print(re.findall('\AThis ',string))
['This ']
>>> print(re.findall('\Aused',string))
[]

# \b - Returns a match where the specified characters are
# at the beginning '\b<something> or at the end '<something>\b' of a word

# \B - Returns a match where the specified characters are present, 
# but NOT at the beginning (or at the end) of a word
>>> string
'This is the string I used to play with "re" module!'
>>> print(re.findall(r"ey\b",string))
['e', 'e', 'e']
>>>
>>> print(re.findall(r"\bpl",string))
['pl']
>>> print(re.findall(r"pl\b",string))
[]
>>> print(re.findall(r"pl\B",string))
['pl']
>>> print(re.findall(r"\Bpl",string))
[]


# \d - digit (numbers from 0-9)
# \D - not a digit
>>> string = "This is the string with digits 123!"
>>> print(re.findall("\d",string))
['1', '2', '3']

>>> string = "This is the stirng with no digits!"
>>> print(re.findall("\D",string))
['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 
's', 't', 'r', 'i', 'n', 'g', ' ', 'w', 'i', 't', 'h', ' ', 
'n', 'o', ' ', 'd', 'i', 'g', 'i', 't', 's', '!']
>>>

# \s - whitespace
# \S - not a whitespace
# \w - any word characters (a to Z, 0-9, and the underscore _ character)
# \W - not a word character
# \Z - specified characters are at the end of the string

Grouping:

>>> image
'System image file is "unix:/opt/unetlab/addons/iol/bin/i86bi-linux-l3-adventerprisek9-15.4"'
>>> version = re.findall('bin/(.*)"',image)
>>> version
['i86bi-linux-l3-adventerprisek9-15.4']

# first is always the string itself
>>> version = re.match('\w+\s(\w+)\s.*bin/(.*)"',image)
>>> version[0]
'System image file is "unix:/opt/unetlab/addons/iol/bin/i86bi-linux-l3-adventerprisek9-15.4"'
>>> version[1]
'image'
>>> version[2]
'i86bi-linux-l3-adventerprisek9-15.4'

Named group: (?P<name>…)

>>> version = re.match('\w+\s(?P<first>\w+)\s.*bin/(?P<last>.*)"',image)
>>> version.groupdict()
{'first': 'image', 'last': 'i86bi-linux-l3-adventerprisek9-15.4'}
>>> version.group('first')
'image'
>>> version.group('last')
'i86bi-linux-l3-adventerprisek9-15.4'

You can test the RegEx here

Share

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *