Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
1ca4677
smp server: messaging services (#1565)
epoberezkin Nov 7, 2025
3ccf854
servers: maintain xor-hash of all associated queue IDs in PostgreSQL …
epoberezkin Nov 25, 2025
5e9b164
agent: fail when per-connection transport isolation is used with serv…
epoberezkin Nov 25, 2025
38e8999
agent: service subscription events (#1671)
epoberezkin Nov 27, 2025
ff7bdbc
Merge branch 'master' into rcv-services
epoberezkin Dec 3, 2025
2ea9a9a
agent: finalize initial service subscriptions, remove associations on…
epoberezkin Dec 5, 2025
35fe5ac
Merge branch 'master' into rcv-services
epoberezkin Dec 5, 2025
8389407
Merge branch 'master' into rcv-services
epoberezkin Dec 13, 2025
f5eb735
servers: service stats and logging, allow services without option (re…
epoberezkin Dec 14, 2025
568500c
Merge branch 'master' into rcv-services
epoberezkin Dec 14, 2025
c8a7243
Merge branch 'master' into rcv-services
epoberezkin Dec 15, 2025
a1277bf
agent: remove service queue association when service ID changed, proc…
epoberezkin Dec 19, 2025
11ae20e
ntf server: use different client certs for each SMP server, remove su…
epoberezkin Dec 22, 2025
bafdbc1
smp protocol: fix encoding for SOKS/ENDS responses (#1683)
epoberezkin Dec 25, 2025
9e813c2
Merge branch 'master' into rcv-services
epoberezkin Dec 25, 2025
db4b27e
agent: create user with option to enable client service (#1684)
epoberezkin Dec 27, 2025
d908404
Merge branch 'master' into rcv-services
epoberezkin Jan 15, 2026
502d923
agent: minor fixes
epoberezkin Jan 17, 2026
ac825b0
Merge branch 'master' into rcv-services
epoberezkin Jan 24, 2026
84e8b72
docs: update protocol (#1705)
epoberezkin Jan 27, 2026
aebc01b
Merge branch 'master' into rcv-services
epoberezkin Mar 3, 2026
1d30579
Merge branch 'master' into rcv-services
epoberezkin Mar 4, 2026
8518f60
docs: agent threat model
evgeny-simplex Mar 7, 2026
c624a10
Merge branch 'master' into rcv-services
evgeny-simplex Mar 9, 2026
3a25561
Merge branch 'master' into rcv-services
epoberezkin Mar 9, 2026
3c57523
update protocol docs
evgeny-simplex Mar 9, 2026
583f4e0
update RFCs (#1730)
epoberezkin Mar 9, 2026
f745ce5
docs: fix minor issues in protocols
evgeny-simplex Mar 10, 2026
01785d5
docs: add e2e encrypted message wire encoding to PQDR spec
evgeny-simplex Mar 10, 2026
98351cf
docs: add missing encodings and other protocol corrections
evgeny-simplex Mar 10, 2026
b81670c
docs: move implemented rfcs
evgeny-simplex Mar 10, 2026
48eba59
Merge branch 'master' into rcv-services
epoberezkin Mar 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
# SMP server message storage

# SMP router message storage

## Problem

Currently SMP servers store all queues in server memory. As the traffic grows, so does the number of undelivered messages. What is worse, Haskell is not avoiding heap fragmentation when messages are allocated and then de-allocated - undelivered messages use ByteString and GC cannot move them around, as they use pinned memory.
Currently SMP routers store all queues in router memory. As the traffic grows, so does the number of undelivered messages. What is worse, Haskell is not avoiding heap fragmentation when messages are allocated and then de-allocated - undelivered messages use ByteString and GC cannot move them around, as they use pinned memory.

## Possible solutions

### Solution 1: solve only GC fragmentation problem

Move from ByteString to some other primitive to store messages in memory long term, e.g. ShortByteString, or manage allocation/de-allocation of stored messages manually in some other way.

Pros: the simplest solution that avoids substantial re-engineering of the server.
Pros: the simplest solution that avoids substantial re-engineering of the router.

Cons:
- not a long term solution, as memory growth still has limits.
Expand All @@ -22,12 +23,12 @@ Use files or RocksDB to store messages.

Pros:
- much lower memory usage.
- no message loss in case of abnormal server termination (important until clients have delivery redundancy).
- no message loss in case of abnormal router termination (important until clients have delivery redundancy).
- this is a long term solution, and at some point it might need to be done anyway.

Cons:
- substantial re-engineering costs and risks.
- metadata privacy. Currently we only save undelivered messages when server is restarted, with this approach all messages will be stored for some time. this argument is limited, as hosting providers of VMs can make memory snapshots too, on the other hand they are harder to analyze than files. On another hand, with this approach messages will be stored for a shorter time.
- metadata privacy. Currently we only save undelivered messages when router is restarted, with this approach all messages will be stored for some time. this argument is limited, as hosting providers of VMs can make memory snapshots too, on the other hand they are harder to analyze than files. On another hand, with this approach messages will be stored for a shorter time.

#### RocksDB and other key-value stores

Expand Down Expand Up @@ -67,7 +68,7 @@ queueLogLine =
%s"write_msg=" digits
```

When queue is first requested by the server:
When queue is first requested by the router:

```c
if queue folder exists:
Expand All @@ -87,7 +88,7 @@ nextReadMsg = read_msg
open write_file in AppendMode
```

When message is added to the queue (assumes that queue state is loaded to server memory, if not the previous section will be done first):
When message is added to the queue (assumes that queue state is loaded to router memory, if not the previous section will be done first):

```c
if write_msg > max_queue_messages:
Expand Down Expand Up @@ -128,7 +129,7 @@ else
nextReadByte = current position in file
```

When message delivery is acknowledged, the read queue needs to be advanced, and possibly switched to read from the current write_queue:
When message delivery is acknowledged, the read queue needs to be advanced, and possibly switched to read from the current write queue:

```c
if nextReadByte == read_byte:
Expand Down Expand Up @@ -162,9 +163,9 @@ Most Linux systems use EXT4 filesystem where the file lookup time scales linearl

So storing all queue folders in one folder won't scale.

To solve this problem we could use recipient queue ID in base64url format not as a folder name, but as a folder path, splitting it to path fragments of some length. The number of fragments can be configurable and migration to a different fragment size can be supported as the number of queues on a given server grows.
To solve this problem we could use recipient queue ID in base64url format not as a folder name, but as a folder path, splitting it to path fragments of some length. The number of fragments can be configurable and migration to a different fragment size can be supported as the number of queues on a given router grows.

Currently, queue ID is 24 bytes random number, thus allowing 2^192 possible queue IDs. If we assume that a server must hold 1b queues, it means that we have ~2^162 possible addresses for each existing queue. 24 bytes in base64 is 32 characters that can be split into say 8 fragments with 4 characters each, so that queue folder path for queue with ID `abcdefghijklmnopqrstuvwxyz012345` would be:
Currently, queue ID is 24 bytes random number, thus allowing 2^192 possible queue IDs. If we assume that a router must hold 1b queues, it means that we have ~2^162 possible addresses for each existing queue. 24 bytes in base64 is 32 characters that can be split into say 8 fragments with 4 characters each, so that queue folder path for queue with ID `abcdefghijklmnopqrstuvwxyz012345` would be:

`/var/opt/simplex/messages/abcd/efgh/ijkl/mnop/qrst/uvwx/yz01/2345`

Expand All @@ -174,6 +175,6 @@ So we could use an unequal split of path, two letters each and the last being lo

`/var/opt/simplex/messages/ab/cd/ef/ghijklmnopqrstuvwxyz012345`

The first three levels in this case can have 4096 subfolders each, and it gives 68b possible subfolders (64^2^3), so the last level will be sparse in case of 1b queues on the server. So we could make it 4 levels with 2 letters to never think about it, accounting for a large variance of the random numbers distribution:
The first three levels in this case can have 4096 subfolders each, and it gives 68b possible subfolders (64^2^3), so the last level will be sparse in case of 1b queues on the router. So we could make it 4 levels with 2 letters to never think about it, accounting for a large variance of the random numbers distribution:

`/var/opt/simplex/messages/ab/cd/ef/gh/ijklmnopqrstuvwxyz012345`
Original file line number Diff line number Diff line change
@@ -1,30 +1,31 @@

# Sharing protocol ports with HTTPS

Some networks block all ports other than web ports, including port 5223 used for SMP protocol by default. Running SMP servers on a common web port 443 would allow them to work on more networks. The servers would need to provide an HTTPS page for browsers (and probes).
Some networks block all ports other than web ports, including port 5223 used for SMP protocol by default. Running SMP routers on a common web port 443 would allow them to work on more networks. The routers would need to provide an HTTPS page for browsers (and probes).

## Problem

Browsers and tools rely on system CA bundles instead of certificate pinning.
The crypto parameters used by HTTPS are different from what the protocols use.
Public certificate providers like LetsEncrypt can only sign specific types of keys and Ed25519 isn't one of them.

This means a server should distinguish browser and protocol clients and adjust its behavior to match.
This means a router should distinguish browser and protocol clients and adjust its behavior to match.

## Solution

`tls` package has a server hook that allows producing a different set of `TLS.Credentials` according to a client-provided "Server Name Indication" extension.

Since LE certificates are only handed out to domain names, TLS client will be sending the SNI.
However client transports are constructed over connected sockets and the SNI wouldn't be present unless explicitly requested.
When a client sends SNI, then it's a browser and a web credentials should be used.
When a client sends SNI, then it's a browser and web credentials should be used.
Otherwise it's a protocol client to be offered the self-signed ca, cert and key.

When a transport colocated with a HTTPS, its ALPN list should be extended with `h2 http/1.1`.
The browsers will send it, and it should be checked before running transport client.
If HTTP ALPN is detected, then the client connection is served with HTTP `Application` instead (the same "server information" page).
If HTTP ALPN is detected, then the client connection is served with HTTP `Application` instead (the same "router information" page).

If some client connects to server IP, doesn't send SNI and doesn't send ALPN, it will look like a pre-handshake client.
In that case a server will send its handshake first.
If some client connects to router IP, doesn't send SNI and doesn't send ALPN, it will look like a pre-handshake client.
In that case a router will send its handshake first.
This can be mitigated by delaying its handshake and letting the probe to issue its HTTP request.

## Implementation plan
Expand All @@ -43,7 +44,7 @@ runServer (tcpPort, ATransport t) = do
else runClient serverSignKey t h `runReaderT` env -- performs serverHandshake etc as usual
```

The web app and server live outside, so `runHttp` has to be provided by the `runSMPServer` caller.
The web app and router live outside, so `runHttp` has to be provided by the `runSMPServer` caller.
Additonally, Warp is using its `InternalInfo` object that's scoped to `withII` bracket.

```haskell
Expand All @@ -65,11 +66,9 @@ The implementation relies on a few modification to upstream code:
- `warp`: Only the re-export of `serveConnection` is needed.
Unfortunately the most recent `warp` version can't be used right away due to dependency cascade around `http-5` and `auto-update-2`.
So a fork containing the backported re-export has to be used until the dependencies are refreshed.


### TLS.ServerParams

When a server has port sharing enabled, a new set of TLS params is loaded and combined with transport params:
When a router has port sharing enabled, a new set of TLS params is loaded and combined with transport params:

```haskell
newEnv config = do
Expand Down Expand Up @@ -129,7 +128,7 @@ key: /etc/opt/simplex/web.key
# key: /etc/letsencrypt/live/smp.hostname.tld/privkey.pem
```

When `TRANSPORT.port` matches `WEB.https` the transport server becomes shared.
When `TRANSPORT.port` matches `WEB.https` the transport router becomes shared.

Perhaps a more desirable option would be explicit configuration resulting in additional transported to run:

Expand All @@ -148,16 +147,16 @@ key: /etc/opt/simplex/web.key

## Caveats

Serving static files and the protocols togother may pose a problem for those who currently use dedicated web servers as they should switch to embedded http handlers.
Serving static files and the protocols together may pose a problem for those who currently use dedicated web servers as they should switch to embedded http handlers.

As before, using embedded HTTP server is increasing attack surface.

Users who want to run everything on a single host will have to add and extra IP address and bind servers to specific IPs instead of 0.0.0.0.
An amalgamated server binary can be provided that would contain both SMP and XFTP servers, where transport will dispatch connections by handshake ALPN.
Users who want to run everything on a single host will have to add an extra IP address and bind routers to specific IPs instead of 0.0.0.0.
An amalgamated router binary can be provided that would contain both SMP and XFTP routers, where transport will dispatch connections by handshake ALPN.

## Alternative: Use transports routable with reverse-proxies

An "industrial" reverse proxy may do the ALPN routing, serving HTTP by itself and delegating `smp` and `xftp` to protocol servers.
Same with the `websockets`.

Since this in effect does TLS termination, the protocol servers will have to rely on credentials from protocol handshakes.
Since this in effect does TLS termination, the protocol routers will have to rely on credentials from protocol handshakes.
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@

# Expiring messages in journal storage

## Problem

The journal storage servers recently migrated to do not delete delivered or expired messages, they only update pointers to journal file lines. The messages are actually deleted when the whole journal file is deleted (when fully deleted or fully expired).
The journal storage routers recently migrated to do not delete delivered or expired messages, they only update pointers to journal file lines. The messages are actually deleted when the whole journal file is deleted (when fully deleted or fully expired).

The problem is that in case the queue stops receiving the new messages then writing of messages won't switch to the new journal file, and the current journal file containing delivered or expired messages would never be deleted.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# Fix subQ deadlock: blocking writeTBQueue inside connLock

## Problem
Expand Down
Loading
Loading