What would loading the 'price' table from an Azure Data Lake CSV file look like?

1. Using Python:


# Example Python code for loading the "price" table
import pandas as pd
from azure.storage.blob import BlobServiceClient

# Connect to ADLS and download the CSV file
blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")
blob_client = blob_service_client.get_blob_client("your_container_name", "your_file_name.csv")
data = blob_client.download_blob().content_as_text()

# Read the CSV data into a pandas DataFrame
df = pd.read_csv(io.StringIO(data))

# Connect to Synapse SQL and create a staging table
import pymssql
connection = pymssql.connect(server="your_server_name", user="your_username", password="your_password")
cursor = connection.cursor()

cursor.execute("""
CREATE TABLE staging_price (
  symbol VARCHAR(10) NOT NULL,
  date DATE NOT NULL,
  open DECIMAL(10, 2) NOT NULL,
  high DECIMAL(10, 2) NOT NULL,
  low DECIMAL(10, 2) NOT NULL,
  close DECIMAL(10, 2) NOT NULL,
  adj_close DECIMAL(10, 2) NOT NULL,
  volume BIGINT NOT NULL
);
""")

# Bulk insert the DataFrame data into the staging table
cursor.executemany("INSERT INTO staging_price VALUES (%s, %s, %s, %s, %s, %s, %s, %s)", df.values.tolist())

# Insert data from staging table into "price" table
cursor.execute("""
INSERT INTO price (symbol, date, open, high, low, close, adj_close, volume)
SELECT symbol, date, open, high, low, close, adj_close, volume
FROM staging_price;
""")

# Drop the staging table
cursor.execute("DROP TABLE staging_price;")

# Commit changes and close connection
connection.commit()
cursor.close()
connection.close()
                

2. Using T-SQL:


-- T-SQL code for loading the "price" table
CREATE EXTERNAL TABLE price_external (
  symbol VARCHAR(10) NOT NULL,
  date DATE NOT NULL,
  open DECIMAL(10, 2) NOT NULL,
  high DECIMAL(10, 2) NOT NULL,
  low DECIMAL(10, 2) NOT NULL,
  close DECIMAL(10, 2) NOT NULL,
  adj_close DECIMAL(10, 2) NOT NULL,
  volume BIGINT NOT NULL
)
WITH (
  LOCATION = '/your/container_name/your_file_name.csv',
  FILE_FORMAT = (TYPE = 'CSV', FIELD_QUOTE = '"', FIELD_DELIMITER = ',')
);

INSERT INTO price
SELECT *
FROM price_external;

DROP EXTERNAL TABLE price_external;
                

Both methods achieve the same result of loading the "price" table from the ADLS CSV file. Choose the approach that best suits your preference and workflow:

  • Python: Offers more flexibility for data manipulation and transformations before loading.
  • T-SQL: Simpler and direct, ideal for basic data loading workflows.