Setting up the environment for web scraping - Part 2
Installing PostgreSQL & pgadmin4
In the previous post, I installed webdrivers and selenium. These are for getting text from websites. After getting the text, what do I need do with it? It needs to be stored in a dataset.
That's why I installed PostgreSQL and pgAdmin4.
What is PostgreSQL?
PostgreSQL, also known as Postgres, is a open-source relational database management system. It is used as the data warehouse in many applications and support a large part of the SQL standard with many modern features. It can be considered as a program running on the server side.
What is pgadmin4?
pgAdmin is open source management tool for Postgres. A common misconception is PostgreSQL means pgAdmin, but as the name suggests, pgAdmin is administration tool, not a database. It provides a powerful graphical interface that simplifies the creation, maintenance and use of database objects. This can be considered as client side.
How to install
The process might differ depending on the environment, but in this post, I am used the environment listed below.
- Linux(Ubuntu)
- Miniconda3 (docs.conda.io/en/latest/miniconda.html)
- Visual Studio Code (code.visualstudio.com/download)
Step 1. Postgresql
I used apt to install the package along with a -contrib package that adds some additional utilities and functionality.
$ sudo apt update
$ sudo apt install postgresql postgresql-contrib
When PostgreSQL is installed, the default account postgres is given without password. After installing the package, I used the command below to switch over to this default account.
$ sudo -i -u postgres
$ psql
With this command, I logged to PostgreSQL prompt, and now I could set up the password for this default account with a command below.
$ \password postgres
I need to remember this password as I need it every time I access the database.
Step 2. pgAdmin4
This is more straight-forward as there is a instruction from the official website here.
I installed desktop mode following the instruction on the site.
sudo curl https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo apt-key add
sudo sh -c 'echo "deb https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/$(lsb_release -cs) pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update'
sudo apt install pgadmin4-desktop
Step 3. Check if they are installed
I created a basic python file and ran it. If there is no error appears, they are successfully installed!
from sqlalchemy import create_engine
# setting up the db
user = 'postgres' (default)
password = 'postgres' (use the password setup above)
host = 'localhost'
port = '5432'
db_name = 'postgres' (default)
db_string = f"postgresql://{user}:{password}@{host}:{port}/{db_name}"
# create engine
db = create_engine(db_string)
# create and execute query
query = "CREATE TABLE roles(
role_id serial PRIMARY KEY,
role_name VARCHAR (255) UNIQUE NOT NULL
)"
db.execute(query)
Trouble shooting
When encountering a problem with installing it, then t is best to uninstall the package and start from the scratch. If you want to uninstall, please follow the steps below.
# List All Postgres related packages
$ dpkg -l | grep postgres
# Remove related packages
$ sudo apt-get --purge remove postgresql\*
# Remove folders
$ sudo rm -rf /var/lib/postgresql/
$ sudo rm -rf /var/log/postgresql/
$ sudo rm -rf /etc/postgresql/
# Remove the postgres user
$ sudo deluser postgres