Load in packages


#Packages
#--Web scraping packages
from bs4 import BeautifulSoup
import requests
#Pandas/numpy for data manipulation
import pandas as pd
import numpy as np

Load URLs we want to scrape into an array

#load URLs we want to scrape into an array
BASE_URL = [
'http://www.reuters.com/finance/stocks/company-officers/GOOG.O',
'http://www.reuters.com/finance/stocks/company-officers/AMZN',
'http://www.reuters.com/finance/stocks/company-officers/AAPL'
]

Loop through our URLs, scrape table, pass information to array

#loading empty array for board members
board_members = []
#Loop through our URLs we loaded above
for b in BASE_URL:
html = requests.get(b).text
soup = BeautifulSoup(html, "html.parser")
#identify table we want to scrape
officer_table = soup.find('table', {"class" : "dataTable"})

#try clause to skip any companies with missing/empty board member tables
try:
#loop through table, grab each of the 4 columns shown (try one of the links yourself to see the layout)
    for row in officer_table.find_all('tr'):
        cols = row.find_all('td')
        if len(cols) == 4:
            board_members.append((b, cols[0].text.strip(), cols[1].text.strip(), cols[2].text.strip(), cols[3].text.strip()))
except: pass

Create new array, check length to ensure things pulled in correctly

#convert output to new array, check length
board_array = np.asarray(board_members)
len(board_array)

49

Convert new array to dataframe

#convert new array to dataframe
df = pd.DataFrame(board_array)

Rename columns, preview output

#rename columns, check output
df.columns = ['URL', 'Name', 'Age','Year_Joined', 'Title']
df.head(10)

	URL	Name	Age	Year_Joined	Title
0	http://www.reuters.com/finance/stocks/company-...	Eric Schmidt	61	2015	Executive Chairman of the Board of Director
1	http://www.reuters.com/finance/stocks/company-...	Sergey Brin	43	2015	President, Director
2	http://www.reuters.com/finance/stocks/company-...	Lawrence Page	44	2015	Chief Executive Officer, Director
3	http://www.reuters.com/finance/stocks/company-...	Ruth Porat	59	2015	Chief Financial Officer, Senior Vice President
4	http://www.reuters.com/finance/stocks/company-...	Sundar Pichai	45	2017	Director, Chief Executive Officer, Google Inc.
5	http://www.reuters.com/finance/stocks/company-...	David Drummond	54	2015	Senior Vice President - Corporate Development,...
6	http://www.reuters.com/finance/stocks/company-...	John Hennessy	64	2007	Lead Independent Director
7	http://www.reuters.com/finance/stocks/company-...	Diane Greene	61	2015	Director
8	http://www.reuters.com/finance/stocks/company-...	L. John Doerr	65	2016	Independent Director
9	http://www.reuters.com/finance/stocks/company-...	Roger Ferguson	65	2016	Independent Director

Export data to CSV

#export data
df.to_csv('/Users/yourname/desktop/board_members.csv')

That's it! If you're interested in seeing how I used this data check out my visualization on the interconnectedness of companies through shared board members here.

Web scraping example using Python and Beautiful Soup

Load in packages

Load URLs we want to scrape into an array

Loop through our URLs, scrape table, pass information to array

Create new array, check length to ensure things pulled in correctly

Convert new array to dataframe

Rename columns, preview output

Export data to CSV