Designing a Data Source Discovery App - Part 23: Our Solution

by DL Keeshin


August 6, 2025


kDS Discovery ERD

It has been over a year since I began blogging about the design and development of the kDS Data Source Discovery App. Today I want to describe the current state of all this hard work and why my company, Keeshin Database Services, LLC (kDS), believes we have a well-designed app with a great underlying data structure that's ready to transform how organizations approach data source discovery.

What started as a concept to address the persistent challenge of undocumented data sources has evolved into a comprehensive, AI-powered platform that combines systematic data collection with enterprise-grade security. Let me walk you through what we've built and why it represents a significant advancement in data governance tooling.

The Problem We Set Out to Solve

Organizations today struggle with "data dark matter" – critical data sources, transformations, and flows that exist in various states of documentation. Some are well-documented, others exist only as tribal knowledge. When key personnel leave, institutional knowledge disappears, creating compliance risks and operational blind spots. Traditional data discovery tools focus on technical metadata but miss the business context that makes data truly useful.

In developing our solution, we deliberately chose stable, open-source tools that would provide a solid foundation for enterprise deployment. Our technology stack leverages the proven reliability of PostgreSQL's relational database capabilities while seamlessly integrating modern AI technologies. This approach reflects our core design philosophy: combine the best of traditional relational database strengths – data integrity, ACID compliance, and mature tooling – with cutting-edge AI capabilities for intelligent automation. By building on established, open-source technologies rather than proprietary platforms, we ensure that organizations can deploy and maintain the kDS Data Source Discovery App with confidence, knowing they're not locked into vendor-specific ecosystems or experimental technologies that may not stand the test of time.

Our Solution: AI-Enhanced Knowledge Discovery

The kDS Data Source Discovery App transforms the traditionally manual process of data source discovery into an intelligent, automated system. Here's what makes it unique:

Intelligent Organization Mapping

We built comprehensive organizational hierarchy management from parent companies down to individual business roles. The system uses GPT-4 to automatically classify organizations by standard industry codes and generate accurate role descriptions from job titles and descriptions. This foundation ensures that all subsequent data collection is properly contextualized.

kDS Discovery ERD

AI-Powered Interview Generation

Perhaps our most innovative feature is the dynamic interview question generation. Leveraging OpenAI's GPT-4, the system creates contextual questions tailored to specific roles, business functions, and industry contexts. Questions are configurable (6-24 per interview) with multiple response formats including text, dropdown, and checkbox options.

The questions aren't generic – they're intelligently crafted based on the respondent's role and industry. A database administrator in healthcare gets different questions than a data scientist in financial services.

Sophisticated Data Collection

Our data collection system supports multi-modal responses with custom options, staged answer collection with progress tracking, and intelligent question redirection to subject matter experts. For organizations that maintain up-to-date documentation, the app can track and incorporate existing documentation, creating a comprehensive view that combines formal documentation with discovered tribal knowledge.

The Analysis Engine: Turning Conversations into Intelligence

Raw interview responses are just the beginning. Our three-stage AI-powered analysis pipeline transforms unstructured conversations into actionable business intelligence:

Stage 1: Topic-Level Analysis

The first stage processes raw interview answers to generate structured summaries by topic. Using GPT-4, we extract:

  • Structured summaries of responses
  • Data flow mappings (source → destination)
  • Solutions and systems in use
  • Follow-up questions for deeper investigation

Stage 2: Executive Summary Generation

The second stage synthesizes topic-level summaries into comprehensive executive reports, generating high-level insights including key themes across topics, major data flows, critical solutions in use, and strategic recommendations.

Stage 3: Data Extraction & Normalization

The final stage parses JSON summaries and extracts structured data into relational tables, handling both legacy single-field and new array-based formats while maintaining backward compatibility.

// Example analysis output structure
{
  "analysis": [{
    "control_id": "uuid-here",
    "summary": "Comprehensive analysis of data flows...",
    "major_data_flows": [
      {"source": "CRM System", "destination": "Data Warehouse"},
      {"source": "ERP System", "destination": "Analytics Platform"}
    ],
    "key_solutions": ["Salesforce", "SAP", "Tableau"],
    "strategic_recommendations": [
      "Implement unified data catalog",
      "Standardize ETL processes"
    ]
  }]
}

Enterprise-Grade Security Architecture

Security isn't an afterthought – it's built into every layer of the application. We implement a comprehensive defense-in-depth approach:

Token-Based Authentication

We eliminate password vulnerabilities through cryptographically secure UUID tokens with automatic expiration. Separate authentication flows ensure appropriate access duration: 7-14 day tokens for respondents, 8-12 hour tokens for administrators.

Multi-Layered Database Security

Our database security includes SSL/TLS encryption for all connections, certificate-based authentication, row-level security for data isolation, and private network architecture. We implement role-based access control with three tiers: read-only, read-write, and administrative access.

-- Security roles implementation
CREATE ROLE kdsd_read;        -- For reporting, analytics
CREATE ROLE kdsd_readwrite;   -- For main app operations
CREATE ROLE kdsd_admin;       -- For migrations, admin tools

-- Row-level security example
ALTER TABLE interview.respondent ENABLE ROW LEVEL SECURITY;
CREATE POLICY customer_isolation ON interview.respondent
FOR ALL
USING (customer_id = current_setting('app.current_customer_id')::uuid);

Why This Architecture Works

Our design decisions weren't arbitrary – they solve real enterprise challenges:

Modular Architecture: Separation of concerns allows independent scaling and maintenance of different components. The Flask-based framework with distinct modules for core functionality, interview management, and data analysis ensures maintainability.

PostgreSQL Foundation: Our multi-schema database design provides namespace isolation while maintaining relational integrity. JSONB fields offer flexibility for dynamic content while preserving the benefits of structured data.

AI Integration Strategy: Robust error handling with retry logic, rate limiting, and database-stored prompt templates ensure reliable AI operations at enterprise scale.

Audit Trail Everything: Complete tracking of all operations, from question generation to data analysis, provides the accountability required in enterprise environments.

Real-World Impact

The business value is clear and measurable:

  • Accelerated Discovery: What takes months manually is completed in weeks
  • Preserved Knowledge: Systematic capture of tribal knowledge before it leaves
  • Compliance Ready: Comprehensive documentation supports GDPR, CCPA, and industry regulations
  • Strategic Insights: Executive summaries enable data-driven governance decisions

Looking Forward

This isn't the end of our journey – it's a milestone. We've built a solid foundation that can scale to enterprise needs while maintaining the flexibility to adapt to evolving requirements. The modular architecture, comprehensive security, and AI-powered analysis create a platform that grows with organizations' data governance maturity.

We're currently in beta testing with select enterprise customers, and the feedback has been overwhelmingly positive. Organizations are seeing immediate value in the structured approach to data discovery and the actionable insights generated by our analysis engine.

Join Our Beta Program

As we continue refining the kDS Data Source Discovery App based on real-world usage, we're actively seeking additional organizations to participate in our beta program. If your organization struggles with undocumented data assets and needs a systematic approach to data discovery, we'd love to collaborate with you.

Interested? Reach out directly at talk2us@keeshinds.com or leave a comment below.

Thank you for following along on this journey. What started as a blog series about designing an app has become a real product that's ready to transform how organizations approach data discovery.

Leave a Comment: