Python: Processes and Threads

top – show all processes:

 import os
 import time

 pid = os.getpid()

 while True:
     print(pid, time.time())
     time.sleep(5)

Find the proccess: ps aux | head -1; ps aux | grep 1248

6KB

How to create a process in Python

os.fork() – creates a copy of parent process => we have 2 processes, parent pid = PID, child pid = 0:

 import os
 import time

 pid = os.fork()

 if pid == 0:
     #this is child

     while True:
         print("child: {}".format(os.getpid()))
         time.sleep(5)
 else:
     # this is parent

     print("parent: {}".format(os.getpid()))
     os.wait() # wait for child proccess to finish

Fork copies everything – memory, variables, files etc => if you change variable in the child, it will not change the same variable in the parent process.

A better way – Multiprocessing:

from multiprocessing import Process

def main(name):
    print("hello", name)

p = Process (target=main, args=("Bob",))
p.start()
p.join()
  • target – which function needs to be executed in the child process
  • args – arguments for the function
  • p.start() – inside it, python will run fork
  • p.join() – wait for all child processes to finish

Threading

Many scripts related to network/data I/O spend the majority of their time waiting for data from a remote source. Threading is perfect for I/O operations such as web scraping because the processor is sitting idle waiting for data.

Pretty much the same as a process, but threads share resources inside the process.

from threading import Thread

def main(name):
    print("hello", name)

th = Thread (target=main, args=("Bob",))
th.start()
th.join()

Share data between threads: Queue and Blocking

from queue import Queue
from threading import Thread

def func(q, n):
    while True:
        item = q.get()
        if item is None:
            break
        print("process data {} {}".format(n, item))

q = Queue(5)

th1 = Thread(target=func, args=(q, 1))
th2 = Thread(target=func, args=(q, 2))

th1.start()
th2.start()

for i in range(50):
    q.put(i)

# Finish threads, after range 50, set q into None
# It will beak while True loop
q.put(None)
q.put(None)
th1.join()
th2.join()

Use put to get something into the queue. in this example, max in queue = 5 => the next thread will be blocked

  • Parallel run with Threads – good for IO tasks
  • Series run – good for CPU-only tasks

Summary

  • If code has a lot of I/O or Network usage – Multithreading (low overhead)
  • If code is CPU bound – Multiprocessing (if the machine has multiple cores)
Share

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *