The Arcology Site Engine

Arcology Python Prototype


I learned a lot in building The Arcology Project the first time around, and now that I have Migrated to org-roam v2 I need to evaluate the project, fix it, and get it running again.

Over the last few months, I have been playing with a Python package called FastAPI and loving the "batteries included" approach with modern Python 3 and an Flask- or Express-like router model rather than a full MVC framework which I was working with on Elixir

The Arcology is a FastAPI Web App

What we have here is a real simple FastAPI application.

Run it: shell:uvicorn arcology.server:app --reload --host &

from fastapi import FastAPI, Request
from sqlmodel import Session

from arcology.arroyo import Page, engine
import arcology.html as html

from arcology.parse import parse_sexp

app = FastAPI()

import uvicorn

#@click.command(help="start the DRP status servlet")
#@click.option("--host", "-h", help="the host IP to listen on, defaults to all IPs/interfaces", default="")
#@click.option("--port", "-p", help="port to listen on", default=8000)
def start(host="", port=8000):"arcology.server:app", host=host, port=port)

Arcology's FastAPI Instrumentation and Observability

It's instrumented with prometheus-fastapi-instrumentator. There's not much to observe; i guess i'll want to include things about the pandoc generator and cache size, etc…

from prometheus_fastapi_instrumentator import Instrumentator

prometheus_instrumentor = Instrumentator()

request counts broken down by Arcology Sites as http_request_by_site_total in Grafana

This instrument loads the site and arcology.key.ArcologyKey to emit counter metrics for each page and each site.

from typing import Callable
from prometheus_fastapi_instrumentator.metrics import Info
from prometheus_client import Counter
from arcology.sites import host_to_site
from arcology.key import ArcologyKey

import logging
logger = logging.getLogger(__name__)

def http_request_sites_total() -> Callable[[Info], None]:
    METRIC = Counter(
        "Number of times a site or page has been requested.",
        labelnames=("site", "key", "method", "status", "ua_type")

    def instrumentation(info: Info) -> None:
        key = ArcologyKey.from_request(info.request)

        user_agent = info.request.headers.get("User-Agent")
        agent_type = get_agent_type(user_agent)


        if agent_type == "unknown":
  "Detected unknown user agent: {agent}", dict(agent=user_agent))

        METRIC.labels(, key.key, info.method, info.modified_status, agent_type).inc()

    return instrumentation


get_agent_type tries to make some smart guesses to bucket the callers in to human/feed/fedi/crawler/etc buckets:

def get_agent_type(user_agent: str) -> str:

    if user_agent == "":
        return "no-ua"

    if "Synapse" in user_agent:
        return "matrix"
    if "Element" in user_agent:
        return "matrix"

    if "SubwayTooter" in user_agent:
        return "app"
    if "Dalvik" in user_agent:
        return "app"
    if "Nextcloud-android" in user_agent:
        return "app"

    if "prometheus" in user_agent:
        return "internal"
    if "feediverse" in user_agent:
        return "internal"

    if "Pleroma" in user_agent:
        return "fedi"
    if "Mastodon/" in user_agent:
        return "fedi"
    if "Akkoma" in user_agent:
        return "fedi"
    if "Friendica" in user_agent:
        return "fedi"
    if "FoundKey" in user_agent:
        return "fedi"
    if "MissKey" in user_agent:
        return "fedi"
    if "CalcKey" in user_agent:
        return "fedi"
    if "gotosocial" in user_agent:
        return "fedi"
    if "Epicyon" in user_agent:
        return "fedi"

    if "feedparser" in user_agent:
        return "feed"
    if "granary" in user_agent:
        return "feed"
    if "Tiny Tiny RSS" in user_agent:
        return "feed"
    if "Go-NEB" in user_agent:
        return "feed"
    if "Gwene" in user_agent:
        return "feed"
    if "Feedbin" in user_agent:
        return "feed"
    if "SimplePie" in user_agent:
        return "feed"
    if "Elfeed" in user_agent:
        return "feed"
    if "inoreader" in user_agent:
        return "feed"
    if "Reeder" in user_agent:
        return "feed"
    if "Miniflux" in user_agent:
        return "feed"

    if "Bot" in user_agent:
        return "bot"
    if "bot" in user_agent:
        return "bot"
    if "Poduptime" in user_agent:
        return "bot"

    if "Chrome/" in user_agent:
        return "browser"
    if "Firefox/" in user_agent:
        return "browser"
    if "DuckDuckGo/" in user_agent:
        return "browser"
    if "Safari/" in user_agent:
        return "browser"

    return "unknown"

Some of these URLs shouldn't be loaded and this bit of code in will ensure those requests aren't recorded by the per-site counter. Note that the paths aren't actually verified as existing in the database – the status will be a 4xx if "normal" pages aren't loaded but for static assets and favicon there will be some "chatter" in the logs which I simply short-circuit out here.

if info.request.url.path.startswith("/metrics"):
if info.request.url.path.startswith("/static"):
if info.request.url.path.startswith("/favicon.ico"):

Arcology Static Files and appearance

I can't be fucked to care about asset pipelines right now/these days. There's not a complex enough set of assets in this context – there is the problem of Arcology Media Store and exposing attachment files. This is just enough to make it look naisu, and to give each site a bit of flavor through the Arcology Sites customization module.

from fastapi.staticfiles import StaticFiles
import os

static_directory = os.environ.get('STATIC_FILE_DIR', "arcology/static")

app.mount("/static", StaticFiles(directory=static_directory), name="static")

Base HTML Template

    <meta name="author" content="Ryan Rix"/>
    <meta name="generator" content="Arcology Site Engine"/>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="/static/css/app.css"/>
    <link rel="stylesheet" href="/static/css/vulf.css"/>
    {% if site and site.css_file %}
    <link rel="stylesheet" href="/static/css/default-colors.css"/>
    <link rel="stylesheet" href="{{ site.css_file }}"/>
    {% else %}
    <link rel="stylesheet" href="/static/css/default-colors.css"/>
    {% endif %}
    {% block head %}
      <title>{{ site.title }}</title>
    {% endblock %}
      {% block h1 %}
      <h1><a href='/'>{{ site.title }}</a></h1>
      <h2>{{ page.get_title() }}</h2>
      {% endblock %}
        &bull; <a class="internal" href="">Life</a>
        &bull; <a class="internal" href="">Tech</a>
        &bull; <a class="internal" href="">Emacs</a>
        &bull; <a class="internal" href="">Topics</a>
        &bull; <a class="internal" href="">Arcology</a>

    {% block body %}
    {% endblock %}

      &copy; 02022 Ryan Rix &lt;<a href=""></a>&gt;


        Care has been taken to publish accurate information to
        long-lived URLs, but context and content as well as URLs may
        change without notice.

        This site collects no personal information from visitors, nor
        stores any identifying tokens besides a CSRF token which is
        rotated with every web request and not logged. If you or your
        personal information ended up in public notes please email me
        for correction or removal.

        Email me with questions, comments, insights, kind criticism.
        blow horn, good luck.

        <a href="/sitemap/">View the Site Map</a>

        <a class="internal" href="">&larr;</a>
        <a class="internal" href="">Fediring</a>
        <a class="internal" href="">&rarr;</a>

Page HTML Templates

{% extends "base.html.j2" %}

{% block h1 %}
  <h1><a href='/'>{{ site.title }}</a></h1>
  <h2>{{ page.get_title() }}</h2>
{% endblock %}

{% block head %}
  <title>{{ page.get_title() }} - {{ site.title }}</title>
  {% for feed in feeds %}
    <link rel="alternate" type="application/atom+xml" href="{{ feed[0] }}" title="{{ feed[1] }}" />
  {% endfor %}
  {% if page.allow_crawl is none or page.allow_crawl=='"nil"' %}
    <meta name="robots" content="noarchive noimageindex noindex nofollow"/>
  {% else %}
    <meta name="robots" content="noarchive noimageindex "/>
  {% endif %}
{% endblock %}

{% block body %}
    <section class="body">
      {{ document | safe}}

  <section class="backlinks">
    {% if page.references %}
        {% for ref in page.references %}
          [<a href="{{ref.url()}}">ref</a>]&nbsp
        {% endfor %}
    {% endif %}

    {% if backlink %}
      <h2>Pages which Link Here</h2>

      {{ backlink | safe}}
    {% endif %}
{% endblock %}

Arcology Site CSS

Look, there's not a lot of "there there". The default color variables are "nice to have".

:root {
  --alert: #CC6960;
  --primary: #707231;
  --secondary: #ebbe7b;
  --success: #67b4f8;
  --warning: #7e5c41;

  --white: #fcf6ed;
  --light-gray: #f6e5cb;
  --medium-gray: #BAAD9B;
  --dark-gray: #82796C;
  --black: #211F1C;

Dead links will be annotated by the HTML Rewriter and Hydrater with this class if they're internal links to pages which are not marked.

.dead-link::after {
    content: '🔗⚠';
.dead-link {
    color: var(--alert) !important;

Experimental: Mark external and internal URLs with an emoji.

/* a.internal::after {
     content: '';
} */
a[href*=""]::before {
    content: '🌱 ';

a[href*=""]::before {
    content: '🐲 ';

a[href*=""]::before {
    content: '🧑‍🔧 ';

a[href*=""]::before {
    content: '♾️ ';

a[href*=""]::before {
    content: '✒️️ ';

a[href*="localhost"]::before {
    content: '📚️️ ';

a[href*="//"]:not(.internal)::before {
    content: '🌏 ';

Color these things. The defaults are specified above, Sites can override these (and add other rules entirely of course!).

a {
    color: var(--primary);
    font-weight: 500;

a:visited {
    color: var(--warning);

pre, code {
    background-color: var(--light-gray);

.tags .tag {
    background-color: var(--success);
    color: var(--light-gray);

Configure the body, they headers, the footers, the whole dang lot! note that i use the vulf mono font – make sure to bring your own!

the <body> is the "root" of the rendered elements.

body {
    font-family: "Vulf Mono", monospace;
    font-style: italic;
    font-size: 14px;
    background-color: var(--white);
    color: var(--black);

All headings are italic, the headings inside of <header> are displayed inline with each other rather than blocking them out.

h1,h2,h3,h4,h5,h6 {
    font-style: italic;

h1,h2,h3,h4,h5,h6 > code.verbatim {
    font-style: regular;

h1 code.verbatim {
    font-style: normal;
h2 code.verbatim {
    font-style: normal;
h3 code.verbatim {
    font-style: normal;

header > h1, header > h2 {
  display: inline;

header > h1:after {
  content: " —";

It's important things have room to breath

header {
    padding: 0.5em;
    border-radius: 1em;
    background-color: var(--light-gray);
    margin-bottom: 2em;

main, section.backlinks {
    padding: 0.5em;
    border-radius: 1em;
    background-color: var(--light-gray);
    border: 1px var(--medium-gray) solid;
    font-weight: 300;

main, header :first-child {
    margin-top: 0 !important;

Margins must be set. This centers the major text sections on the page and lets them stretch to 80 characters. This is the holy and correct number for text to be displayed at, I guess, lol.

footer, section.backlinks, main {
    margin: 1em auto;
    max-width: 80em;

footer {
    text-align: center;

footer a {
    font-weight: 500;

Experimental: when hovering over code blocks, it will try to show you what file it's writing to.

pre.sourceCode {
    padding-left: 2em;
    padding-bottom: 1em;
    overflow: scroll;

.sourceCode[data-noweb]::before {
  content: "noweb setting " attr(data-noweb);

.sourceCode[data-noweb-ref]::before {
  content: "noweb interpolated as " attr(data-noweb-ref);

.sourceCode[data-tangle]::before {
  content: "write file to " attr(data-tangle);

.sourceCode, code {
    font-style: normal;

Various tweaks for SRS and friends. I should do this in Rewriting and Hydrating the Pandoc HTML

.tag .smallcaps {
    float: right;
    font-variant-caps:  small-caps;
    padding: 0.25em;

.REVIEW_DATA.drawer {
    display: none;

.fc-cloze {
    font-style: normal;
    text-decoration: underline;

Sitemap should have a height:

#sitemap-container {
  height: 100%;
  1. Generataing @font-face rules for a bunch of fonts

    Vulfpeck Fonts are pulled in with this code-gen because writing @font-face rules does not bring joy and I don't have the right to redistribute these files, so I won't check it in at all.

    VulfSans Regular 500
    VulfMono Regular 500
    VulfSans Bold 800
    VulfMono Bold 800
    VulfSans Italic 500 italic
    VulfMono Italic 500 italic
    VulfSans BoldItalic 800 italic
    VulfMono BoldItalic 800 italic
    VulfSans Light 300
    VulfMono Light 300
    VulfSans LightItalic 500 italic
    VulfMono LightItalic 500 italic
      (-map (pcase-lambda (`(,first ,second ,weight ,style))
               (s-join "\n" (list
                             "@font-face {"
                             "font-family: "  (if (equal first "VulfMono")
                                                  "\"Vulf Mono\""
                                                "\"Vulf Sans\"")
                             "; src:"
                             (concat "url('/static/fonts/" first "-" second ".woff') format('woff'),")
                             (concat "url('/static/fonts/" first "-" second ".woff2') format('woff2'),")
                             (concat "url('/static/fonts/" first "-" second ".ttf') format('truetype');")
                             "font-weight: " (number-to-string weight) ";"
                             (unless (equal style "")
                               (concat "font-style: " style ";"))
      (write-file "~/org/arcology-fastapi/arcology/static/css/vulf.css"))
  2. NEXT [#C] tufte sidenotes for the backlinks -> HTML should inject sidenotes in during rewritehtml?

  3. NEXT [#C] page template for a backlink buffer like Topic Index

Wiring up Arcology Routing Logic

The Arcology Routing Logic needs to be wired up to the server, after the static asset routes are defined.

import as domains
app = domains.decorate_app(app)

NEXT [#B] Org pre-processing

Arcology BaseSettings Configuration Class

Ref FastAPI Settings and Pydantic Settings management.

This is mostly used to coordinate the Arcology Batch Commands but will eventually contain all configurable elements of the web server and inotify worker.

from pydantic import BaseSettings
from enum import Enum
from functools import lru_cache
from pathlib import Path

class Environment(str, Enum):
    prod = "prod"
    dev  = "dev"

class Settings(BaseSettings):
    arcology_directory: Path = Path("~/org")

    arcology_src: Path = Path("~/org/arcology-fastapi")
    arroyo_src: Path = Path("~/org/arroyo")
    arroyo_emacs: Path = Path("emacs")

    arcology_db: Path = Path("~/org/arcology-fastapi/arcology.db")
    org_roam_db: Path = Path("~/org/arcology-fastapi/org-roam.db")

    db_generation_debounce: int = 15
    db_generation_cooldown: int = 300

    arcology_env: Environment =

def get_settings():
    return Settings()

Translate in/out of s-expression forms with sexpdata

Use sexpdata to decode some of the keys which come out of the org-roam EmacSQL. At some point I could do some hackery-pokery to monkeypatch this in to some points to magically unwrap fields. For now it'll be great in __str__ and some property access methods.

import sexpdata as sexp

def parse_sexp(in_sexp: str):
    return sexp.loads(in_sexp)

def print_sexp(in_obj) -> str:
    return sexp.dumps(in_obj)