florian marending

06 Jul 2025

Exploring LLMs

A skeptic reconsiders

Work in progress…

Goals and summary

[x] Learn technical details of how LLMs work

Excellent videos by 3Blue1Brown

[x] Explore modern local models

Not great if I want more than simple summarization or proofreading

[x] Try local model on my jour data set

Not working. Context too small or they just can’t keep track of anything.

[x] Explore modern LLM tools and agents

Tried Gemini and Claude. I’ll focus on Claude for now just to dive deep rather than wide.

[x] Explore Claude on proofreading

Works great.

[x] Explore Claude on spotting mistakes in crypto code

Excellent.

[x] Explore Claude on architecture discussions

Phenomenal.

[x] Explore Claude on search (Google replacement)

Pretty useful.

[x] Explore Claude on embedded project

Pretty useful.

[x] Explore Claude on web dev project

Mixed bag.

Explore local models, Claude and Gemini

Who is currently in the Swiss National Council

Summarize my server setup blog post and explain the scripts

Local models conclusion

As much as I would like to use local models for privacy reasons, it doesn’t look like it’s going to happen. This is not to say that open-weight models in general are bad, just the ones that are small enough that I can run them on my laptop. So I guess I’ll have to go with one of the bigger hosted ones.

Hosted models

I’m going with Claude for this exploration for two reasons. First, it’s said to be the strongest model for coding, which is my primary use case. Second, Anthropic seems to be the most sympathetic of the AI companies out there to me.

I can’t get behind Google due to privacy reasons, they can simply bring together too many threads of my digital persona. OpenAI is exceptionally unpleasant in their public conduct. Mostly Sam Altman if I’m being honest. Grok also won’t cut it for similar reasons.

Unfortunately there are no good options when it comes to privacy, but it seems Anthropic is the least bad.

Claude

Proofreading

Works very well. I can paste in a note in mdx format and just need to remind it to keep line breaks as is, and it will do a fine job directly correcting typos and awkward sentences, ready to paste back into my website.

Even handles markdown tables etc. flawlessly. Much simpler than pasting note into an online spell-checker and manually fixing all the findings. With having the note checked into git, it’s easy to see the diff that Claude produced. Wouldn’t trust it otherwise.

Spot mistakes in crypto code

I’m pasting in the encryption code used in Fieldnotes and ask to assess the security of this code and whether it follows best practices. Claude says the code is sound and accomplishes its goal. Asking follow-up questions, it’s clear that Claude has a rather sophisticated understanding. E.g. “What could be the advantage of using per-note keys rather than encrypting notes directly with the main key?”.

To check that Claude isn’t overly agreeable I ask the same question in a new context with modified crypto code that includes some issues. In one instance I set the main key to be extractable, which it correctly flags as something that could be improved. Further, when passing undefined in the initialization vectors, it raises alarm bells, calling this a critical bug with catastrophic consequences.

This isn’t to say that an AI chatbot should be entrusted with security audits, it is still reassuring that it doesn’t mess up on obvious issues.

Architecture discussion

I’m writing a short overview of requirements for my upcoming rewrite of my observability / metrics tool. There are a bunch of open questions in my head and I haven’t managed to get a clear picture of what I want to do. Part of the reason is the absolute mess that is OpenTelemetry documentation.

# Observatory: Observability and Time Series System

Observatory is a new system to store observability data as well as time series data.

## Store time series data from embedded devices

There are multiple embedded devices that periodically record sensor data. They should be able
to send that time series data to Observatory.

## Store observability data from backend services

There are multiple backend services implemented in Rust. They use the `tracing` crate to instrument them.
They should be able to export their tracing data to Observatory.

## Visualization

Observatory should include a web application that visualizes the data stored within.

## Questions

- What storage layer is advisable?
- Can a database be used directly to ingest observability data or is a backend application in front of
the storage layer needed?

After pasting in this document and prompting Claude to ask clarifying questions, it produces the updated requirements doc below. It contains a bunch of boilerplate that would satisfy an architect, but also some interesting stuff.

Resulting architecture document

Overview

Observatory will handle data ingestion from embedded devices and backend services, store the data efficiently for perpetual retention, and provide visualization through Grafana integration.

Scale and Performance Requirements

  • Device Scale: Support for less than 20 embedded devices
  • Data Volume: Handle at most hundreds of messages per second
  • Retention: Perpetual data retention (no automatic deletion)
  • Query Performance: Up to 5 seconds delay acceptable for visualization
  • Downsampling: No requirement for data downsampling or tiered storage

Data Sources

Store Time Series Data from Embedded Devices

There are multiple embedded devices that periodically record sensor data. They should be able to send that time series data to Observatory.

Device Specifications:

  • Connectivity: WiFi connection to Observatory
  • Data Types: Temperature, CO2, location, and other sensor data
  • Flexibility: API should allow sending arbitrary JSON data structures
  • Frequency: Data transmission every 30 seconds to every 10 minutes
  • Reliability: Devices should cache data locally if Observatory is unreachable
  • Connection: No intermittent connectivity issues expected

Store Observability Data from Backend Services

There are multiple backend services implemented in Rust. They use the tracing crate to instrument them. They should be able to export their tracing data to Observatory.

Observability Specifications:

  • Data Types: Both spans and events from Rust tracing
  • Format: Structured logging and metrics
  • Integration: OpenTelemetry standard for data export
  • Services: Multiple Rust backend services using tracing crate

Storage Architecture

Technology Stack

  • Database: ClickHouse (preferred for time-series analytics)
  • Backend: Rust application layer
  • Alternative: Open to other technologies if justified

Database Schema

Two specialized tables in ClickHouse:

-- Device sensor data
CREATE TABLE device_data (
    timestamp DateTime64(3),
    device_id String,
    sensor_type String,  -- 'temperature', 'co2', 'location', etc.
    data JSON,
    INDEX idx_device_id device_id TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_sensor_type sensor_type TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (device_id, timestamp);

-- Observability data (spans + events)
CREATE TABLE observability_data (
    timestamp DateTime64(3),
    service_name String,
    trace_id String,
    span_id String,
    event_type Enum('span', 'event'),
    data JSON,
    INDEX idx_service service_name TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_trace trace_id TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (service_name, timestamp);

Backend Application Requirements

A Rust backend application is required in front of ClickHouse rather than direct database ingestion to provide:

  • Protocol Translation: Convert device JSON and OpenTelemetry data to ClickHouse schemas
  • Authentication: API key management and user authentication
  • Data Validation: Schema validation and data transformation
  • Performance Optimization: Batching and buffering for efficient ingestion
  • Flexible JSON Handling: Support for arbitrary sensor data structures

API Specifications

Device Data Ingestion

  • Protocol: HTTP REST interface
  • Endpoint: /api/v1/devices/{device_id}/data
  • Method: POST
  • Authentication: API key based authentication
  • Data Format: JSON payload with sensor readings

Observability Data Ingestion

  • Protocol: OpenTelemetry Protocol (OTLP)
  • Endpoint: /api/v1/otlp/v1/traces
  • Standard: Full OpenTelemetry compatibility
  • Integration: Compatible with Rust tracing crate via OpenTelemetry exporters

Visualization

Observatory should integrate with Grafana for data visualization rather than building a custom web application.

Visualization Requirements:

  • Platform: Grafana integration with ClickHouse data source
  • Users: Primary users are operators
  • Dashboards: Support for real-time dashboards, historical analysis, and alerting
  • Query Types: Complex aggregations across time windows and devices
  • Performance: Near real-time updates (up to 5 second delay acceptable)

Authentication and Security

API Authentication:

  • API key authentication for device data ingestion
  • API key authentication for observability data ingestion

User Authentication:

  • User authentication system for Grafana visualization access
  • No device-level permissions required (all operators can see all device data)

Authorization Scope:

  • Authentication required for both data ingestion and visualization
  • Single-tenant system (no multi-tenancy requirements)

Deployment and Scalability

Constraints:

  • No specific deployment constraints (cloud/on-premises flexibility)
  • No horizontal scalability requirements given the scale
  • No specific performance SLA requirements beyond the 5-second visualization delay

Integration:

  • No requirement to integrate with existing monitoring tools
  • No requirement to export data to other systems
  • Grafana satisfies all visualization requirements

Technical Implementation Details

Data Processing

  • Batching: Configurable batch insertion (2-3 second intervals) to meet 5-second delay target
  • Error Handling: Robust error handling for device connectivity issues
  • Data Transformation: Convert OpenTelemetry spans/events to unified storage format

Performance Optimization

  • Connection Pooling: ClickHouse connection pooling for efficient database access
  • Indexing: Optimized indexes for common query patterns
  • Partitioning: Monthly partitioning for efficient data management

Monitoring and Reliability

  • Caching: Local device caching when Observatory is unreachable
  • Buffering: Backend buffering to handle traffic spikes
  • Health Checks: System health monitoring capabilities

Success Criteria

  • Successfully ingest data from up to 20 embedded devices
  • Handle hundreds of messages per second without data loss
  • Provide real-time visualization through Grafana with sub-5-second delay
  • Support perpetual data retention without performance degradation
  • Authenticate and authorize all data access appropriately
  • Maintain system availability even when individual devices are offline

I also ask follow-up questions regarding OpenTelemetry that I haven’t been able to find answers to after hours of scouring the web and Claude can give competent answers. Overall, exceptionally helpful for this task.

Search replacement

Embedded development

Web development

Masonry grid

Async image processing


Notes on Deep Dive into LLMs like ChatGPT

Notes on How I use LLMs

Notes on Transformers, the tech behind LLMs

Notes on Attention in transformers, step-by-step

Notes on How might LLMs store facts