Posting Arcology Feeds to the Fediverse Automatically with Feediverse

feediverse will read RSS/Atom feeds and send the messages as Mastodon posts. It's meant to add a little bit of spice to your timeline from other places. Please use it responsibly.

I was not convinced that feed2toot was the right way to go about this and in trying to extend it, I found myself frustrated. Well, here's a simpler single-file solution. I extended The Arcology Project to expose a JSON list of feeds and their metadata and with my modified version of feediverse I can post all of my sites' toots with one command.

`feediverse.py`

This is a lightly modified version of the referenced feediverse.py above, with my modifications distributed under the Hey Smell This license.

This thing is, basically, simple to operate. it's driven by a YAML configuration file:

yaml source: :tangle ~/Code/feediverse/config.yml.sample
tokens:
  lionsrear: *lionsrear-creds
  teasite: *lionsrear-creds
  garden: *garden-creds
  cce: *garden-creds
  arcology: *garden-creds

feeds_index: https://thelionsrear.com/feeds.json
post_template: >-
  NEW by @rrix@notes.whatthefuck.computer: {post_text}

  {url} {hashtags}
updated: '2023-01-25T06:13:50.343361+00:00'
url: https://notes.whatthefuck.computer

This file will be created the first time you run this command; in my case it's generated locally and then copied to my Wobserver in the NixOS declarations below.

python source: :noweb-ref config-load-save
def save_config(config, config_file):
    copy = dict(config)
    with open(config_file, 'w') as fh:
        fh.write(yaml.dump(copy, default_flow_style=False))

def read_config(config_file):
    config = {
        'updated': datetime(MINYEAR, 1, 1, 0, 0, 0, 0, timezone.utc)
    }
    with open(config_file) as fh:
        cfg = yaml.load(fh, yaml.SafeLoader)
        if 'updated' in cfg:
            cfg['updated'] = dateutil.parser.parse(cfg['updated'])
    config.update(cfg)
    return config

So the /feeds.json in the Arcology Router returns a list of objects, in here it's re-key'd to be a per-site dictionary and returned:

python source: :noweb-ref fetch-feeds
def fetch_dynamic_feeds(feeds_url):
    feeds = requests.get(feeds_url,
                         headers={"User-Agent": "feediverse 0.0.1"}).json()

    feeds_by_site = dict()
    for feed in feeds:
      feeds_by_site[feed['site']] = feeds_by_site.get(feed['site'], []) + [feed]
    return feeds_by_site

With that loaded, it's possible to just loop over the sites, and then loop over each feed in the site to post new entries from them:

python source: :noweb-ref inner-loop
newest_post = config['updated']
per_site_feeds = fetch_dynamic_feeds(config['feeds_index'])

for site, feeds in per_site_feeds.items():
    masto = Mastodon(
        api_base_url=config['url'],
        feature_set='pleroma',
        client_id=config['tokens'][site]['client_id'],
        client_secret=config['tokens'][site]['client_secret'],
        access_token=config['tokens'][site]['access_token']
    )

    for feed in feeds:
        if args.verbose:
            print(f"fetching {feed['url']} entries since {config['updated']}")
        for entry in get_feed(feed['url'], config['updated']):
            newest_post = max(newest_post, entry['updated'])
            if args.verbose:
                print(entry)
            if args.dry_run:
                print("trial run, not tooting ", entry["title"][:50])
                continue
            kwargs = dict(
                content_type='text/html',
                visibility=feed['visibility']
            )
            if feed.get('spoiler_text'):
                kwargs['spoiler_text'] = feed.get('spoiler_text')
            masto.status_post(config['post_template'].format(**entry), **kwargs)
if not args.dry_run:
    config['updated'] = newest_post.isoformat()
    save_config(config, config_file)

All the feed-parsing stuff is more or less lifted directly from the original feediverse, but modified to just post the HTML directly to +Akkoma+ Pleroma .

python source: :noweb-ref feed-parsing
def get_feed(feed_url, last_update):
    feed = feedparser.parse(feed_url)
    if last_update:
        entries = [
            e for e in feed.entries
            if dateutil.parser.parse(e['updated']) > last_update
        ]
        # entries = []
        # for e in feed.entries:
        #     if dateutil.parser.parse(e['updated']) > last_update:
        #         entries.append(e)
    else:
        entries = feed.entries
    entries.sort(key=lambda e: e.updated_parsed)
    for entry in entries:
        yield get_entry(entry)

MAX_LEN=8000
def get_entry(entry):
    res = dict(
        link=entry.link,
        title=cleanup(entry.title),
        updated=dateutil.parser.parse(entry['updated'])
    )
    
    hashtags = []
    for tag in entry.get('tags', []):
        t = tag['term'].replace(' ', '_').replace('.', '').replace('-', '')
        hashtags.append(f'#{t}')
    res['hashtags'] = ' '.join(hashtags)

    post_text = entry.get('summary', '')
    if len(post_text) > MAX_LEN:
        post_text = f"{post_text[:MAX_LEN]} ..."
    res['post_text'] = post_text

    if len(cleanup(post_text)) > 1000:
        res['spoiler_text'] = f"Long Article: {res['title']}"

    content = entry.get('content', '') or ''
    if len(content) > MAX_LEN:
        content = f"{content[:MAX_LEN]} ..."
    res['content'] = content

    url = entry.get('url', '')
    res['url'] = url

    return res

def cleanup(text, strip_html=True):
    if strip_html:
        html = BeautifulSoup(text, 'html.parser')
        text = html.get_text()
    text = re.sub('\xa0+', ' ', text)
    text = re.sub('  +', ' ', text)
    text = re.sub(' +\n', '\n', text)
    text = re.sub('(\w)\n(\w)', '\\1 \\2', text)
    text = re.sub('\n\n\n+', '\n\n', text, flags=re.M)
    return text.strip()

Setting up the config file is a bit different than the upstream stuff because my version supports setting up multiple accounts on a single instance. I made the design decision to only support one fedi instance per feedi instance, if you want to run this on multiple fedi servers, you'll need to run more than one config file or just don't.

python source: :noweb-ref setup-config
def yes_no(question):
    res = input(question + ' [y/n] ')
    return res.lower() in "y1"

def setup(config_file):
    url = input('What is your Fediverse Instance URL? ')
    feeds_index = input("What is the arcology feed index URL? ")
    tokens = dict()
    for site in fetch_dynamic_feeds(feeds_index).keys():
        print(f"Configuring for {site}...")
        print("I'll need a few things in order to get your access token")
        name = input('app name (e.g. feediverse): ')  or "feediverse"
        client_id, client_secret = Mastodon.create_app(
            api_base_url=url,
            client_name=name,
            #scopes=['read', 'write'],
            website='https://engine.arcology.garden/feediverse'
        )
        username = input('mastodon username (email): ')
        password = input('mastodon password (not stored): ')
        m = Mastodon(client_id=client_id, client_secret=client_secret, api_base_url=url)
        access_token = m.log_in(username, password)

        tokens[site] = {
            'client_id': client_id,
            'client_secret': client_secret,
            'access_token': access_token,
        }

    old_posts = yes_no('Shall already existing entries be tooted, too?')
    config = {
        'name': name,
        'url': url,
        'feeds_index': feeds_index,
        'tokens': tokens,
        'post_template': '{title} {post_text} {url}'
    }
    if not old_posts:
        config['updated'] = datetime.now(tz=timezone.utc).isoformat()
    save_config(config, config_file)
    print("")
    print("Your feediverse configuration has been saved to {}".format(config_file))
    print("Add a line line this to your crontab to check every 15 minutes,  or use my NixOS module with a systemd timer!:")
    print("*/15 * * * * /usr/local/bin/feediverse")
    print("")

All of that is assembled together in to a single command which takes a --dry-run, --verbose and --config argument to operate:

python source: :tangle ~/Code/feediverse/feediverse.py :noweb yes :shebang #!/usr/bin/env python3
# Make sure to edit this in cce/feediverse.org !!!

import os
import re
import sys
import yaml
import argparse
import dateutil
import feedparser

import requests

from bs4 import BeautifulSoup
from mastodon import Mastodon
from datetime import datetime, timezone, MINYEAR

DEFAULT_CONFIG_FILE = os.path.join("~", ".feediverse")

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-n", "--dry-run", action="store_true",
                        help=("perform a trial run with no changes made: "
                              "don't toot, don't save config"))
    parser.add_argument("-v", "--verbose", action="store_true",
                        help="be verbose")
    parser.add_argument("-c", "--config",
                        help="config file to use",
                        default=os.path.expanduser(DEFAULT_CONFIG_FILE))

    args = parser.parse_args()
    config_file = args.config

    if args.verbose:
        print("using config file", config_file)

    if not os.path.isfile(config_file):
        setup(config_file)

    config = read_config(config_file)

    <<inner-loop>>

<<fetch-feeds>>

<<feed-parsing>>

<<config-load-save>>

<<setup-config>>

if __name__ == "__main__":
    main()

Packaging `feediverse` in rixpkgs

This is pretty easy to get running; if you do this yourself, you'll want to override src to point to https://code.rix.si/rrix/feediverse, but I don't like remembering to push my changes 😇

nix source: :tangle ~/arroyo-nix/pkgs/feediverse.nix
{ lib,
  buildPythonPackage,
  fetchPypi,
  beautifulsoup4,
  feedparser,
  python-dateutil,
  requests,
  mastodon-py,
  pyyaml,
  python,
}:

buildPythonPackage rec {
  pname = "feediverse";
  version = "0.4.0";

  src = /home/rrix/Code/feediverse;
  pyproject = true;
  build-system = [ python.pkgs.setuptools ] ;

  propagatedBuildInputs = [
    beautifulsoup4
    feedparser
    python-dateutil
    requests
    pyyaml
    mastodon-py
  ];

  meta = with lib; {
    homepage = "https://code.rix.si/rrix/feediverse";
    description = "feediverse will read RSS/Atom feeds and send the messages as Mastodon posts.";
    license = licenses.mit;
    maintainers = with maintainers; [ rrix ];
  };
}

`nix-shell` for developing feediverse

Simple enough to get a dev environment running rather than using venv...

nix source: :tangle ~/Code/feediverse/shell.nix
{ pkgs ? import <nixpkgs> {} }:

let 
  myPy = pkgs.python3.withPackages (ps: with ps; [
    beautifulsoup4
    feedparser
    python-dateutil
    requests
    pyyaml
    mastodon-py
  ]);
in myPy.env

Running `feediverse` on The Wobserver

Okay, with the configuration file generated and then copied on to the server (since it's mutated by the script...), I shove it in to the Arroyo Nix index and then set up an Arroyo NixOS module to set up a service account and run it with a SystemD timer. This will be pretty straightforward if you've seen NixOS before.

nix source: :tangle ~/arroyo-nix/nixos/feediverse.nix
{ pkgs, lib, config, ... }:

{
  ids.uids.feediverse = 902;
  ids.gids.bots = 902;

  users.groups.bots = {
    gid = config.ids.gids.bots;
  };

  users.users.feediverse = {
    home = "/srv/feediverse";
    group = "bots";
    uid = config.ids.uids.feediverse;
    isSystemUser = true;
  };

  systemd.services.feediverse = {
    description = "Feeds to Toots";
    after = ["pleroma.service"];
    wantedBy = ["default.target"];
    script = 
      ''
      ${pkgs.feediverse}/bin/feediverse -c ${config.users.users.feediverse.home}/feediverse.yml
      '';
    serviceConfig = {
      User = "feediverse";
      WorkingDirectory = config.users.users.feediverse.home;
    };
  };
  systemd.timers.feediverse = {
    description = "Start feediverse on the quarter-hour";
    timerConfig = {
      OnUnitActiveSec = "15 minutes";
      OnStartupSec = "15 minutes";
    };
    wantedBy = [ "default.target" ];
  };
}

The Arcology Site Engine