Designing a Chatbot Interview App Database – Part 8: Handling Location and Job Role Data with APIs

by DL Keeshin


November 13, 2024


kDS Discovery ERD
See Current ERD
View SQL script for this post

Previously, I described how I created a process for sending answers back to the database. Today I am going to address (pun intended) location data using Google Maps Geocoding API as well as using ChatGPT’s OpenAI API to assign a job role derived from a job title and job description.

Managing Location Data

The design approach for managing location data in the kDS Interview App Database uses a surrogate key—a unique hash derived from latitude and longitude coordinates—to uniquely identify each location. This hash-based identifier ensures that each geographic point is stored only once, reducing redundancy and improving lookup efficiency.

Storing Data in the reference.location Table

Location data is stored in the reference.location table. Its' structure is designed to store not only latitude and longitude but also details like address, city, and country. The location_hash_id is a primary key and unique constraint generated from the latitude and longitude values, preventing duplicate entries for the same geographic point.


CREATE TABLE IF NOT EXISTS reference.location (
    location_hash_id character varying(32),    -- Unique identifier based on lat/lon hash
    latitude_longitude point NOT NULL,         -- Stores coordinates as a point (x=longitude, y=latitude)
    description_ character varying(64),        -- Descriptive label for the location
    address_1 character varying(72),           -- First line of address
    address_2 character varying(72),           -- Second line of address, if applicable
    city_ character varying(72),               -- City of the location
    province_state character varying(32),      -- State or province
    postal_code character varying(10),         -- ZIP or postal code
    country_code character varying(4),         -- ISO country code
    source_ character varying(72),             -- Source or origin of location data
    effective_start_date date,                 -- Date when this location entry became valid
    effective_end_date date,                   -- Date when this location entry ended, if applicable
    CONSTRAINT location_pkey PRIMARY KEY (location_hash_id),       -- Primary key
    CONSTRAINT location_hash_id_unique UNIQUE (location_hash_id)   -- Enforces unique constraint
);

Fetching and Storing Coordinates

The Python code below integrates with the Google Maps Geocoding API to retrieve latitude and longitude from an address, generate a unique hash from these values, and store it in PostgreSQL.

  • get_coordinates: Sends a request to the Google Maps API, receives JSON data, and extracts the latitude and longitude. If the API request fails, an error message is displayed.
  • generate_location_hash: Hashes the latitude and longitude into a unique MD5 identifier, ensuring that the same coordinates always generate the same hash.
  • insert_location: Uses the hash and coordinates to insert location data into the reference.location table in PostgreSQL. The ON CONFLICT clause allows the code to skip duplicate inserts if a location with the same location_hash_id already exists.

Note: To run the following code, you’ll need an API key with access to the Geocoding API. Sign up on the Google Cloud Platform and enable billing to obtain one. Google Maps charges $0.005 per geocoding request after the first 40,000 free requests each month.


import os
import json
import requests
import hashlib
import psycopg2
from datetime import date
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Database connection variables

api_key = os.environ.get('GEOCODEAPIKEY')

# Connect to PostgreSQL database
def connect_db():
    return psycopg2.connect(
        dbname=os.getenv("DBSURVEY"),
        user=os.getenv("DBUSER"),
        password=os.getenv("DBPASSWORD"),
        host=os.getenv("DBHOST"),
        port=os.getenv("DBPORT")
    )

# Function to get coordinates from the Google Maps API
def get_coordinates(address, api_key):
    base_url = "https://maps.googleapis.com/maps/api/geocode/json"
    params = {"address": address, "key": api_key}
    response = requests.get(base_url, params=params)
    data = response.json()

    if data["status"] == "OK":
        location = data["results"][0]["geometry"]["location"]
        latitude = location["lat"]
        longitude = location["lng"]
        return latitude, longitude
    else:
        print("Error:", data["status"])
        return None

# Function to hash latitude and longitude into a unique ID
def generate_location_hash(latitude, longitude):
    location_string = f"{latitude:.5f},{longitude:.5f}"
    return hashlib.md5(location_string.encode()).hexdigest()

# Function to insert location into the PostgreSQL table
def insert_location(db_conn, latitude, longitude, address, description, postal_code, country_code, source):
    # Generate location hash and formatted data
    location_hash_id = generate_location_hash(latitude, longitude)
    point_value = f"({longitude}, {latitude})"  # PostgreSQL expects (x, y) format for point type
    effective_start_date = date.today()

    # SQL Insert statement
    query = """
    INSERT INTO reference.location (
        location_hash_id, latitude_longitude, description_, address_1, city_, province_state, postal_code, 
        country_code, source_, effective_start_date
    )
    VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    ON CONFLICT (location_hash_id) DO NOTHING;
    """
    data = (
        location_hash_id, point_value, description, address, city, province_state, postal_code,
        country_code, source, effective_start_date
    )

    # Insert into the table
    with db_conn.cursor() as cursor:
        cursor.execute(query, data)
        db_conn.commit()
    print(f"Inserted location: {description} at {latitude}, {longitude}")

# Main function to obtain coordinates and store them
def store_address_location(address, description, postal_code, country_code, source):
    # Step 1: Retrieve coordinates
    coordinates = get_coordinates(address, api_key)
    if coordinates:
        latitude, longitude = coordinates

        # Step 2: Insert data into the database
        db_conn = connect_db()
        insert_location(db_conn, latitude, longitude, address, description, postal_code, country_code, source)
        db_conn.close()

# Example usage
address = "5700 S Dusable Lk Shr Drive" #Museum of Science and Industry - Chicago"
description = "Museum of Science and Industry - Chicago"
postal_code = "60637"
country_code = "US"
city = "Chicago"
province_state = "Illinois"
source = "Google Maps API"

store_address_location(address, description, postal_code, country_code, source)

Generating Job Roles with ChatGPT

A job role is simply a clear label that captures what a person does in a position. In the kds discovery app this role is part of a unique key, made up of industry, business unit, and job role, which the app uses to tailor interview questions to specific roles. OpenAI’s ChatGPT API helps by taking job title and description data and deriving concise job roles from it.

To see how this works, we first need to create a new table for storing role data,


CREATE TABLE stage.role (
    email_ varchar(96) NOT NULL,           -- Email of the user associated with the role
    job_title varchar(255) NOT NULL,       -- Original job title
    job_role varchar(128) NULL,            -- Generated role name, standardized
    job_role_detail JSONB NOT NULL,        -- Detailed role description in JSONB format
    create_date date NOT NULL,             -- Creation date of the record
    created_by varchar(96) NOT NULL,       -- Creator or source, e.g., email
    source_ varchar(96) NOT NULL,          -- Source of data generation, e.g., "ChatGPT"
    CONSTRAINT PK_role PRIMARY KEY (email_)
);

Next,the following Python code uses OpenAI's GPT-4 model to generate standardized job role derived from a job title and job description.

  • generate_role_prompt: Constructs a prompt for ChatGPT, providing context for the job title and description.
  • get_role_code: Sends the prompt to OpenAI's API and parses the JSON response to retrieve structured role data.
  • Database Insertion: Inserts the structured role data into stage.role, using a stored procedure to handle complex JSONB data.

Note: to run the following code you'll need an OpenAI API key. To get one, sign up or log in at OpenAI’s website, navigate to API keys in the Account section, and generate a new key. OpenAI charges based on tokens (input and output text). GPT-4 models typically cost fractions of a cent per request, with longer prompts and responses increasing the token usage and cost. To date, OpenAI access charges for unit testing this Python script was 30 cents ($.30).


import os
import openai
import json
import psycopg2
from dotenv import load_dotenv
from datetime import date
import re

# Load environment variables from .env file
load_dotenv()
openai.api_key = os.environ.get('OPENAPIKEY')

# Database connection details from environment variables
db_params = {
    "dbname": os.getenv("DBSURVEY"),
    "user": os.getenv("DBUSER"),
    "password": os.getenv("DBPASSWORD"),
    "host": os.getenv("DBHOST"),
    "port": os.getenv("DBPORT")
}


# Function to generate prompt for job role
def generate_role_prompt(job_title, job_description):
    prompt = (
        f"Based on the following job title and description, please assign a more concise job title and provide a two-sentence role description.\n\n"
        f"Job Title: {job_title}\n"
        f"Description: {job_description}\n\n"
        f"Respond in JSON format as follows:\n"
        f"{{\n"
        f"  \"role\": \"\",\n"
        f"  \"description\": \"\",\n"
        f"  \"rationale\": \"\"\n"
        f"}}"
    )
    return prompt

# Function to interact with GPT-4 and parse the JSON response
def get_role_code_from_gpt(job_title, job_description):
    prompt = generate_role_prompt(job_title, job_description)
    
    response = openai.ChatCompletion.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    
    # Clean up and parse JSON response
    role_response = response['choices'][0]['message']['content'].strip()
    role_response = re.sub(r"```json\n|```", "", role_response).strip()
    
    try:
        role_data = json.loads(role_response)
    except json.JSONDecodeError as e:
        print("Error parsing JSON:", e)
        print("Response content was:", role_response)  # Debugging line
        raise ValueError("Failed to decode JSON from response.")
    
    return role_data

# Connect to PostgreSQL database
conn = psycopg2.connect(**db_params)
cur = conn.cursor()

# Generate role data from GPT-4
try:
    role_data = get_role_code_from_gpt(job_title, job_description)
except ValueError as e:
    print("Error generating role data:", e)
    conn.close()
    exit()

# Define metadata for insertion

source_ = "ChatGPT"
job_role_detail = json.dumps(role_data)  # Convert role_data to JSON for JSONB column

# Insert into PostgreSQL table using a stored procedure
try:
    cur.execute(
        """
        CALL stage.up_insert_role(%s, %s, %s, %s, %s, %s, %s);
        """,
        (
            email,
            job_title,
            role_data.get("role"),
            job_role_detail,  # JSONB job role detail
            date.today(),
            email,  # created_by
            source_
        )
    )
    
    # Commit changes
    conn.commit()
    print("Database populated with new concise job role and rationale.")
    
except Exception as e:
    print("Error inserting data:", e)
    conn.rollback()
finally:
    cur.close()
    conn.close()

Here are testing results from the above script showing each role_ that OpenAI API/ChatGPT generates from the user provided job title and job description ,


job_title: Director, Enterprise Architecture & Applications Group, Information Systems & Technology Services	
job_description: Oversees the development and implementation of enterprise IT architecture and applications. Ensures alignment between IT strategies and business objectives.
role_: Director of IT Architecture
rationale: Simplified title for clarity and emphasis on key responsibilities

job_title: Principal Group Engineering Manager, Cloud and Enterprise Security
job_description: Oversees the development and maintenance of security infrastructure for cloud and enterprise systems, making strategic architectural decisions and managing diverse engineering teams focused on security.
role_: Chief Security Engineering Manager
rationale: Simplifies original title while emphasizing leadership and scope of responsibility in security engineering.

job_title: Senior Vice President, Platform Product Management and Developer Relations
job_description: Responsible for guiding the development and strategy of the company's software platforms, while managing relationships and interactions with external developers. Ensures effective adoption and usage of platform tools by the developer community.
role_:Head of Platform Strategy & Developer Relations
rationale:The revised title concisely reflects leadership over both product strategy and community engagement.

Final Thoughts

This post explored how to handle location data and job roles within the kDS Interview App Database using external APIs. By leveraging Google Maps for precise geolocation and OpenAI’s ChatGPT API for standardizing job titles, we can efficiently store unique identifiers for locations and clarify job roles. This setup ensures a scalable, consistent data model, helping tailor interview questions to users' unique job roles and locations, ultimately enhancing data discovery and personalization in the app.

For more information about running and installing scripts described in this post, please see the section called "Database & APP Installation Notes" in the github README file here.

Thanks for stopping by.

Leave a Comment: