Archive

Posts Tagged ‘technology’

Why I Archive My Enterprise Custom GPTs (And Why You Probably Should Too)

February 17, 2026 Leave a comment

Over the last year, I’ve watched enterprises pour serious effort into building custom GPTs — internal copilots, domain-specific assistants, workflow automators, knowledge distillers. These aren’t toys. They represent real intellectual property and real operational leverage.

And yet there’s a quiet gap that most teams don’t notice until it’s too late:

There’s no simple “export workspace” or “archive all GPTs” button in OpenAI Enterprise. If your custom GPTs are mission-critical, you should not rely on the UI as your only source of truth.

Let me explain why — and how I handle it.

The Problem: Your GPTs Live Only in the Platform

When you build Enterprise custom GPTs, you’re encoding:

  • Carefully engineered system prompts
  • Guardrails and instructions
  • Workflow logic
  • Embedded knowledge
  • Operational patterns

That’s real institutional knowledge – but today, OpenAI does not provide a native way to export or archive an entire workspace configuration.

If something changes — accidentally deleted GPTs, admin turnover, workspace restructuring, or future product changes — you don’t have a clean offline snapshot of what existed.

For experimental GPTs, that’s fine. For mission-critical ones? That’s uncomfortable.

Why I Believe Offline Backups Matter

1. It’s Basic Operational Hygiene

In cybersecurity and enterprise software, we don’t deploy anything mission-critical without backup and recovery plans. Why should AI infrastructure be any different?

If your GPT supports:

  • Customer support workflows
  • Threat analysis
  • Executive reporting
  • Engineering automation

…then it deserves the same resiliency treatment as any other production system.

2. It Protects Institutional Knowledge

Custom GPTs often encode tacit knowledge:

  • The way your SOC triages alerts
  • The way your product team summarizes requirements
  • The way your analysts structure investigations

Losing that configuration means losing institutional memory. Having a local archive ensures that even if your platform configuration changes, your intellectual property does not disappear with it.

3. It Reduces Vendor Lock-In Risk

This one matters strategically.

AI infrastructure is evolving fast. Five years from now, your primary AI provider might not be the same as today.

If you have:

  • Archived system prompts
  • Configuration metadata
  • Historical GPT definitions
  • Structured logs of behavior

…you can migrate far more easily.

Without that archive, you’re effectively rebuilding from scratch. An offline repository of GPT configurations gives you negotiating leverage, architectural flexibility, and long-term optionality.

That’s just good platform strategy.

So How Do I Archive My Enterprise GPTs?

While there’s no built-in workspace export feature, OpenAI provides something powerful for enterprise customers: the Compliance API.

Here’s the documentation:

Compliance API Overview
https://help.openai.com/en/articles/9261474-compliance-api-for-enterprise-customers

Compliance API Reference (Logs & Platform)
https://chatgpt.com/admin/api-reference#tag/Compliance-API-Logs-Platform

The Compliance API is designed for audit and logging use cases, but it also gives you access to structured platform events. With the right scripting, you can extract the data necessary to reconstruct and archive GPT configurations and changes over time.

That’s the key.

To use the compliance API, you need to ask OpenAI to specifically enable it for one of your API keys – the instructions are noted in the OpenAI documentation – you can’t do this without their support. https://chatgpt.com/admin/api-reference#tag/Compliance-API-Logs-Platform

What My Backup Process Looks Like

At a high level, I:

  1. Call the Compliance API on a scheduled cadence
  2. Pull relevant GPT-related platform events
  3. Normalize and store the data locally
  4. Version it
  5. Secure it

The result is a structured, offline archive of my GPT environment. It’s not just logs. It’s a historical record of how my AI infrastructure evolves.

(You’ll see the script attached below that performs this backup.)

What This Enables

When you have a proper archive:

  • You can recreate a GPT definition if it’s lost
  • You can prove historical configurations for audit
  • You can freeze a known-good version before major changes
  • You can migrate to another AI provider more easily

That last point is important.

Strategically, you never want your AI capabilities to be hostage to a single console configuration. Offline archives restore architectural control.

Final Thoughts

Enterprise custom GPTs are quickly becoming operational assets — not experiments. If something is mission-critical, it deserves:

  • Backup
  • Versioning
  • Auditability
  • Recovery procedures

Today, OpenAI doesn’t provide native workspace archiving. But using the Enterprise Compliance API, you can build your own structured backup process.

In my view, that’s not optional — it’s responsible AI infrastructure management. If you’re serious about AI at scale, treat your GPTs like production systems.

Because they are.

The Script

Don’t forget to add your workspace and API keys!

#!/usr/bin/env python3
import argparse
import json
import os
import re
import sys
import time
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional
from urllib.error import HTTPError, URLError
from urllib.parse import urlencode, urljoin, urlparse
from urllib.request import Request, urlopen


BASE_URL = "https://api.chatgpt.com/v1"


def _now_utc_iso() -> str:
    return datetime.now(timezone.utc).isoformat()


def _slugify(text: str) -> str:
    text = text.strip().lower()
    text = re.sub(r"[^a-z0-9]+", "-", text)
    text = re.sub(r"-{2,}", "-", text).strip("-")
    return text or "gpt"


def _request(
    method: str,
    path: str,
    api_key: str,
    params: Optional[Dict[str, Any]] = None,
    retries: int = 5,
    backoff_seconds: float = 1.5,
    accept_json: bool = True,
) -> Any:
    if path.startswith("/"):
        url = f"{BASE_URL}{path}"
    else:
        url = f"{BASE_URL}/{path}"
    if params:
        url = f"{url}?{urlencode(params)}"
    headers = {
        "Authorization": f"Bearer {api_key}",
    }
    if accept_json:
        headers["Accept"] = "application/json"

    attempt = 0
    while True:
        attempt += 1
        try:
            req = Request(url, method=method, headers=headers)
            with urlopen(req) as resp:
                content_type = resp.headers.get("Content-Type", "")
                body = resp.read()
                if accept_json and "application/json" in content_type:
                    return json.loads(body.decode("utf-8"))
                return body
        except HTTPError as e:
            if e.code == 429 and attempt <= retries:
                time.sleep(backoff_seconds * attempt)
                continue
            if 500 <= e.code < 600 and attempt <= retries:
                time.sleep(backoff_seconds * attempt)
                continue
            # Include response body for easier debugging.
            try:
                body = e.read().decode("utf-8", errors="replace")
            except Exception:
                body = ""
            details = f"HTTP {e.code} for {method} {url}"
            if body:
                details = f"{details}\n{body}"
            raise HTTPError(e.url, e.code, details, e.headers, e.fp)
            raise
        except URLError:
            if attempt <= retries:
                time.sleep(backoff_seconds * attempt)
                continue
            raise


def _iter_list(endpoint: str, api_key: str, limit: int) -> Iterable[Dict[str, Any]]:
    after = None
    while True:
        params: Dict[str, Any] = {"limit": limit}
        if after:
            params["after"] = after
        page = _request("GET", endpoint, api_key, params=params)
        data = page.get("data", []) if isinstance(page, dict) else []
        for item in data:
            yield item
        if not page or not page.get("has_more"):
            break
        after = page.get("last_id")
        if not after:
            break


def _write_json(path: Path, payload: Any) -> None:
    path.write_text(json.dumps(payload, indent=2, sort_keys=True))


class DownloadError(Exception):
    pass


def _is_openai_api_url(url: str) -> bool:
    return urlparse(url).netloc == urlparse(BASE_URL).netloc


def _download_file(download_url: str, dest_path: Path, api_key: str) -> Dict[str, Any]:
    headers: Dict[str, str] = {
        # Safe browser-like defaults; avoids some overly strict edge filters.
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/122.0.0.0 Safari/537.36"
        ),
        "Accept": "*/*",
        "Accept-Language": "en-US,en;q=0.9",
    }
    if _is_openai_api_url(download_url):
        headers["Authorization"] = f"Bearer {api_key}"

    req = Request(download_url, method="GET", headers=headers)
    with urlopen(req) as resp:
        body = resp.read()
        content_type = resp.headers.get("Content-Type", "").lower()

        if not body:
            raise DownloadError("empty response body")
        if "text/html" in content_type:
            raise DownloadError("received HTML payload instead of file bytes")

        dest_path.write_bytes(body)
        return {
            "bytes_downloaded": len(body),
            "content_type": content_type,
            "source_url": resp.geturl(),
        }


def _get_file_download_url(workspace_id: str, file_id: str, api_key: str) -> Optional[str]:
    # This endpoint issues a 307 redirect with Location header to a signed URL.
    url = urljoin(BASE_URL, f"/compliance/workspaces/{workspace_id}/gpt_files/{file_id}")
    headers = {"Authorization": f"Bearer {api_key}"}
    req = Request(url, method="GET", headers=headers)
    try:
        with urlopen(req) as resp:
            # Some clients may follow redirects automatically; if so, this will
            # already be file bytes. We avoid that by returning final URL when possible.
            return resp.geturl()
    except HTTPError as e:
        if e.code in (301, 302, 303, 307, 308):
            return e.headers.get("Location")
        if e.code in (404, 410):
            return None
        raise


def _extract_files(config: Dict[str, Any]) -> List[Dict[str, Any]]:
    files = []
    files_obj = config.get("files")
    if isinstance(files_obj, dict):
        data = files_obj.get("data", [])
        if isinstance(data, list):
            for item in data:
                if isinstance(item, dict) and "id" in item:
                    files.append(item)
    return files


def backup_gpts(
    api_key: str,
    workspace_id: str,
    out_dir: Path,
    limit: int,
    fetch_all_configs: bool = True,
    max_gpts: Optional[int] = None,
    name_prefix: Optional[str] = None,
) -> None:
    out_dir.mkdir(parents=True, exist_ok=True)
    index: Dict[str, Any] = {
        "workspace_id": workspace_id,
        "exported_at": _now_utc_iso(),
        "gpts": [],
    }

    processed = 0
    normalized_prefix = name_prefix.casefold() if name_prefix else None
    for gpt in _iter_list(f"/compliance/workspaces/{workspace_id}/gpts", api_key, limit):
        if max_gpts is not None and processed >= max_gpts:
            break
        gpt_id = gpt.get("id", "unknown")
        gpt_name = None
        latest_config = gpt.get("latest_config")
        if isinstance(latest_config, dict):
            latest_data = latest_config.get("data")
            if isinstance(latest_data, list) and latest_data and isinstance(latest_data[0], dict):
                gpt_name = latest_data[0].get("name")
        if not gpt_name:
            gpt_name = gpt.get("builder_name") or gpt_id
        if normalized_prefix and not gpt_name.casefold().startswith(normalized_prefix):
            continue
        print(f"Processing GPT: {gpt_name} ({gpt_id})", file=sys.stderr)
        folder_name = f"{_slugify(gpt_name)}__{gpt_id}"
        gpt_dir = out_dir / folder_name
        gpt_dir.mkdir(parents=True, exist_ok=True)

        _write_json(gpt_dir / "gpt.json", gpt)

        # Fetch configs
        configs: List[Dict[str, Any]] = []
        if fetch_all_configs:
            for cfg in _iter_list(
                f"/compliance/workspaces/{workspace_id}/gpts/{gpt_id}/configs", api_key, limit
            ):
                configs.append(cfg)
        else:
            if isinstance(gpt.get("latest_config"), dict):
                configs = gpt["latest_config"].get("data", [])

        if configs:
            _write_json(gpt_dir / "configs.json", configs)

        # Use most recent config (first in list) to resolve files
        config_for_files = configs[0] if configs else None
        files = _extract_files(config_for_files or {})
        files_dir = gpt_dir / "files"
        files_dir.mkdir(parents=True, exist_ok=True)

        files_manifest: List[Dict[str, Any]] = []
        for f in files:
            file_id = f.get("id")
            if not file_id:
                continue
            file_name = f.get("name") or file_id
            safe_name = re.sub(r"[\\\\/:*?\"<>|]+", "_", file_name)
            target_path = files_dir / safe_name
            expected_size = f.get("size_bytes") or f.get("bytes") or f.get("size")

            download_url = f.get("download_url")
            if not download_url:
                download_url = _get_file_download_url(workspace_id, file_id, api_key)

            status = "skipped"
            bytes_downloaded = None
            content_type = None
            if download_url:
                try:
                    download_result = _download_file(download_url, target_path, api_key)
                    bytes_downloaded = download_result.get("bytes_downloaded")
                    content_type = download_result.get("content_type")
                    if isinstance(expected_size, int) and bytes_downloaded != expected_size:
                        status = f"error: size_mismatch expected={expected_size} got={bytes_downloaded}"
                        try:
                            target_path.unlink(missing_ok=True)
                        except Exception:
                            pass
                    else:
                        status = "downloaded"
                except DownloadError as e:
                    status = f"error: {e}"
                    try:
                        target_path.unlink(missing_ok=True)
                    except Exception:
                        pass
                except HTTPError as e:
                    if e.code == 403:
                        print(
                            (
                                "DEBUG 403 download denied "
                                f"file_id={file_id} "
                                f"url={download_url} "
                                f"host={urlparse(download_url).netloc} "
                                f"openai_host={_is_openai_api_url(download_url)}"
                            ),
                            file=sys.stderr,
                        )
                    status = f"error: HTTP {e.code}"
                    try:
                        target_path.unlink(missing_ok=True)
                    except Exception:
                        pass
                except URLError as e:
                    status = f"error: {e.reason}"
                    try:
                        target_path.unlink(missing_ok=True)
                    except Exception:
                        pass
                except Exception as e:
                    status = f"error: {e.__class__.__name__}"
                    try:
                        target_path.unlink(missing_ok=True)
                    except Exception:
                        pass
            else:
                status = "unavailable"

            files_manifest.append(
                {
                    "id": file_id,
                    "name": file_name,
                    "download_url": download_url,
                    "status": status,
                    "path": str(target_path),
                    "expected_size": expected_size,
                    "bytes_downloaded": bytes_downloaded,
                    "content_type": content_type,
                }
            )

        if files_manifest:
            _write_json(files_dir / "files.json", files_manifest)

        index["gpts"].append(
            {
                "id": gpt_id,
                "name": gpt_name,
                "path": str(gpt_dir),
                "files_count": len(files_manifest),
                "configs_count": len(configs),
            }
        )
        processed += 1

    _write_json(out_dir / "index.json", index)


def main() -> None:
    generated_date = datetime.now().strftime("%Y-%m-%d")
    parser = argparse.ArgumentParser(
        description=(
            "Backup OpenAPI GPT definitions, configs, and attached files from a Compliance API workspace.\nRequires an API key enabled for Compliance API Access."
        ),
        epilog=(
            "Author: Simon Hunt\n"
            "Role: CPO Securonix\n"
            "Contact: hunt.simon@gmail.com\n"
            "Version: 1.2\n"
            f"Date: {generated_date}\n\n"
        ),
        formatter_class=argparse.RawTextHelpFormatter,
    )
    default_out_dir = f"gpt-backups {datetime.now().strftime('%Y-%m-%d')}"
    parser.add_argument(
        "--run",
        action="store_true",
        help="Execute backup. Without this flag, the script only prints help. Default: disabled.",
    )
    parser.add_argument(
        "--workspace-id",
        default="*add your workspace id here*",
        help="Compliance workspace ID.",
    )
    parser.add_argument(
        "--out",
        default=default_out_dir,
        help=f"Output directory. Default: {default_out_dir}.",
    )
    parser.add_argument(
        "--limit",
        type=int,
        default=100,
        help="Pagination page size. Default: 100.",
    )
    parser.add_argument(
        "--api-key",
        default="*add your compliance api key here*",
        help="API key. Can also be set via OPENAI_API_KEY.",
    )
    parser.add_argument(
        "--latest-only",
        action="store_true",
        help="Only use latest config from GPT list (skip configs endpoint). Default: disabled.",
    )
    parser.add_argument(
        "--max-gpts",
        type=int,
        default=None,
        help="Maximum number of GPTs to process. Default: all.",
    )
    parser.add_argument(
        "--name-prefix",
        default=None,
        help="Only process GPTs whose names begin with this prefix (case-insensitive). Default: no filter.",
    )
    if len(sys.argv) == 1:
        parser.print_help()
        sys.exit(0)
    args = parser.parse_args()
    if not args.run:
        parser.print_help()
        sys.exit(0)

    api_key = args.api_key or os.getenv("OPENAI_API_KEY")
    if not api_key:
        print(
            "API key required. Pass --api-key or set OPENAI_API_KEY.",
            file=sys.stderr,
        )
        sys.exit(1)

    workspace_id = args.workspace_id
    if not workspace_id:
        print(
            "Workspace ID required. Pass --workspace-id.",
            file=sys.stderr,
        )
        sys.exit(1)

    out_dir = Path(args.out).expanduser().resolve()
    backup_gpts(
        api_key=api_key,
        workspace_id=workspace_id,
        out_dir=out_dir,
        limit=args.limit,
        fetch_all_configs=not args.latest_only,
        max_gpts=args.max_gpts,
        name_prefix=args.name_prefix,
    )
    print(f"Backup complete: {out_dir}")


if __name__ == "__main__":
    main()