Home About

Swift Stream Parsing, an Incremental Byte-by-Byte Parsing Library

-- Minutes to Read

Say we have a structured output from an LLM.


{
  "key": "value",
  "key2": "value2",
  // More fields...
}
        

Given that many LLM applications stream data incrementally, how can we stream a structured format such as JSON incrementally? After all, one cannot fully parse JSON without having all the bytes up front.

One approach is NDJSON, in which we can simply stream each token as an entire full JSON blob.


{ "key": "v" }
{ "key": "va" }
{ "key": "val" }
{ "key": "valu" }
{ "key": "value" }
{ "key2": "v" }
// And so on...
        

Another approach is to complete the incomplete JSON string and decode with JSONDecoder. In fact, there’s a library called PartialJSONDecoder that does just this.


{ "key": "value", "key2": "va
// Gets translated to
{ "key": "value", "key2": "va"}
        

Of course, the issue with doing this is that one must decode the entire JSON payload with each new significant byte arrival.

Lastly, one could just parse the JSON byte-by-byte until reaching an invalid character or the end of the stream. At the time of writing this, there has been at least one working implementation of this in the library.

Regardless of the format and parsing technique, Decodable will not work because Decodable generally only works if the full data payload is present. If we want to support incremental and lazy parsing without all the data being available, we need a completely different interface altogether. That’s why I today I would like to announce Swift Stream Parsing, a cross-platform library for stream parsing in Swift.

As always, this article is about the library’s design and quite weird development process this time around. You can check the README and documentation for usage examples.


Some Basics

Not a JSONDecoder/Decodable Replacement

This library is meant to parse streams of data byte-by-byte. The advantage of this is that it allows one to parse streams of data incrementally, but the disadvantage is that parsing the entire payload itself byte-by-byte is often quite slow (exponentially slower than libraries like simdjson). Having all the bytes available enables certain kinds of optimizations that one doesn’t get with a byte-by-byte approach, such as far less branching and SIMD.

If you have all the bytes available, or don’t need to wait too long for them to arrive, JSONDecoder is a far better option than the JSON parser built into this library.

Additionally, the library has zero interop with Decodable. This is because Decodable is generally only meant for decoding entire data payloads, and not incremental streaming. The equivalent protocol in the library, StreamParseableValue, is quite limited by comparison. Types that expect primitive data in a certain format like URL and Date are completely unsupported by the library because there’s no way to incrementally parse them (the incremental values likely wouldn’t be valid instances of those types).

There are some niceities that you get with the JSON parser in this library, such as being able to toggle certain forms of syntax on and off (eg. Comments, hex numbers, trailing commas, etc.). This syntax toggling was largely needed due to the tendency of local LLMs to not output properly formatted JSON. The parser can also automatically handle converting snake case to camel case just like JSONDecoder if you decide to enable it (you can also use a custom key decoding strategy just like JSONDecoder).

In essence, this library is not a JSONDecoder replacement in the same sense that the @Generable macro from FoundationModels also doesn’t replace JSONDecoder. Of course, FoundationModels does use JSON, but had to create its own incremental parsing logic since JSONDecoder wasn’t a viable option.

In fact, you can say that this library is an open source implementation of the parsing capabilities @Generable. Though @Generable must also generate a JSON Schema that the underlying LLM must understand.

Dependencies

The library itself has no dependencies, not even on Foundation, just pure Swift with the standard library. Given that it ships with a built-in JSON parser, this means that I had the pleasure of writing that from scratch.

However, while the core library itself has no dependencies, there are package traits that add support for a few different libraries: Foundation, Swift Collections, and Tagged. The package trait for Foundation support is even enabled by default.


It’s Just as Easy as Conforming to Decodable!

Of course, I wanted to make the interface nearly as convenient as implementing Decodable on a struct. Which naturally means we have to have the compiler generate code. I wonder what feature of Swift can do that? It turns out, it’s called a macro!


// See it's just as easy as Decodable!
@StreamParseable
struct Streamable {
  var property: String
  var property2: Int
}
        

Don’t get ahead of yourself however. All properties on Streamable are required to conform to a protocol called StreamParseable. Due to the nature of Decodable requiring the full value to be present at decode time, the library unfortunately has no overlap with it.


Stream Parseable

The StreamParseable protocol requires one to have a Partial associated type. This Partial type is the type of value that can be parsed from a stream of bytes. The @StreamParseable macro of course generates this Partial type for you. For the record, this is exactly how @Generable from FoundationModels works as well.

The Partial type must then conform to another protocol called StreamParseableValue, which has a requirement that requires the conforming type to register key paths to fields that can be parsed incrementally.

The @StreamParseable macro generates this payload.


extension Streamable: StreamParseable {
  struct Partial: StreamParsingCore.StreamParseable,
    StreamParsingCore.StreamParseableValue {
    var property: String.Partial?
    var property2: Int.Partial?

    init(
      property: String.Partial? = nil,
      property2: Int.Partial? = nil,
    ) {
      self.property = property
      self.property2 = property2
    }

    static func initialReduceableValue() -> Self {
      Self()
    }

    static func registerHandlers(in handlers: some StreamParsingCore.StreamParserHandlers<Self>) {
      handlers.registerKeyedHandler(forKey: "property", \.property)
      handlers.registerKeyedHandler(forKey: "property2", \.property2)
    }
  }
}
        

The interesting bit is registerHandlers which effectively tells that parser that the data type is interested in parsing the values for the ”property" and ”property2" keys, which are than assigned at parse time via the key paths passed to registerKeyedHandler.

Similarly, more primitive types can also register handlers as well. Take the conformance of String for instance.


extension String: StreamParseableValue {
  public static func initialParseableValue() -> Self {
    ""
  }

  public static func registerHandlers(in handlers: inout some StreamParserHandlers<Self>) {
    handlers.registerStringHandler(\.self)
  }
}
        

This depends on a primitive handler registration method. There a number of these primitive handler registration methods, and generally speaking they mirror all the primitive types (including Int128 and UInt128) that are supported by the Standard Library’s Decoder protocol.

More compound types such as Optional, or even Tagged, the latter of which is only available behind a package trait, can be registered via the registerScopedHandler method.


extension Optional: StreamParseableValue where Wrapped: StreamParseableValue {
  public static func initialParseableValue() -> Wrapped? {
    Wrapped.initialParseableValue()
  }

  public static func registerHandlers(in handlers: inout some StreamParserHandlers<Self>) {
    handlers.registerScopedHandlers(on: Wrapped.self, \.streamParsingWrappedValue)
    handlers.registerNilHandler(\.self)
  }

  private var streamParsingWrappedValue: Wrapped {
    get { self ?? Wrapped.initialParseableValue() }
    set { self = newValue }
  }
}
        

Notice how since the library requires handlers to be key paths, we can use a computed property to handler mappings to and from the wrapped value.

Data is another interesting case of this.


extension Data: StreamParseableValue {
  public static func initialParseableValue() -> Self {
    Self()
  }

  public static func registerHandlers(in handlers: inout some StreamParserHandlers<Self>) {
    handlers.registerStringHandler(\.streamParsingStringValue)
  }

  private var streamParsingStringValue: String {
    get { String(decoding: self, as: UTF8.self) }
    set { self = Data(newValue.utf8) }
  }
}
        

Now what about collection types like Array? Since there are more collections than arrays, the library actually has a protocol called StreamParseableArrayObject that array conforms to. The protocol has requirements for minimal functionality necessary to parse array objects incrementally, and this minimalism is what allows types from Swift Collections like BitArray and Deque to be registered with the handlers directly.


public protocol StreamParseableArrayObject<Element>: StreamParseableValue {
  associatedtype Element: StreamParseableValue

  subscript(index: Int) -> Element { get set }
  mutating func append(contentsOf sequence: some Sequence<Element>)
}

extension StreamParseableArrayObject {
  public static func registerHandlers(in handlers: inout some StreamParserHandlers<Self>) {
    handlers.registerArrayHandler(\.self)
  }
}

// Array

extension Array: StreamParseableValue where Element: StreamParseableValue {
  public static func initialParseableValue() -> [Element] {
    []
  }
}

extension Array: StreamParseableArrayObject where Element: StreamParseableValue {}

// Deque

extension Deque: StreamParseableValue where Element: StreamParseableValue {
  public static func initialParseableValue() -> Deque<Element> {
    []
  }
}

extension Deque: StreamParseableArrayObject where Element: StreamParseableValue {}
        

Dictionaries, and closely related types like OrderedDictionary also use a similar dedicated protocol called StreamParseableDictionaryObject.

Under the hood, the Handlers type for each parser knows exactly how to append key paths together in order to construct the proper key path to write to at parse time. This process is quite ugly involving many hacks with AnyKeyPath, but thankfully you don’t have to worry about it!


Parsers

The library ships with a built-in JSON parser, and because of its no dependencies philosophy, it had to be implemented it from scratch! How fun!

If you want another format, just ask Claude to create a parser for your desired format, it’s that easy! (Ok, enough joking…) All your custom format parser has to do is conform to the StreamParser protocol, and you’re set.

The StreamParser protocol is quite simple.


public protocol StreamParser<Value> {
  associatedtype Value: StreamParseableValue
  associatedtype Handlers: StreamParserHandlers<Value>

  mutating func parse(bytes: some Sequence<UInt8>, into reducer: inout Value) throws

  mutating func finish(reducer: inout Value) throws

  mutating func registerHandlers()
}
        

As we can see by the protocol definition, the parser owns the that areHandlers that are passed to registerHandlers static method on Value.

The protocol also works over an arbitrary sequence of bytes, which represent a chunk of bytes from a source. If you want to optimize parsing somehow based on the type of the sequence, then you have the ability to do so, but generally speaking you should expect that your parser will have to do things byte by byte.


Algorithms

Generally speaking, you won’t use the StreamParser interface directly, as the library provides a PartialsStream type that wraps the parser and current value being parsed. Additionally, there are also extensions on Sequence and AsyncSequence that will map a stream of bytes to a stream of parsed values.


// PartialsStream

@StreamParseable
struct Payload {
  // ...
}

let json = "{ ... }"

var stream = PartialsStream(initialValue: Payload.Partial(), parser: .json())

for byte in jsonBytes {
  let partial = stream.next(byte)
  // ...
}

let finalPartial = stream.finish()
        

// Sequence

let payloads: [Payload.Partial] = try json.utf8.partials(of: Payload.Partial.self, using: .json())
        

// AsyncSequence

struct AsyncBytesSequence: AsyncSequence {
  typealias Element = UInt8
  // ...
}

let json = AsyncBytesSequence(...)
let partials = json.partials(of: Payload.Partial.self, using: .json())
for try await payload in partials {
  print(payload)
}
        

Under the hood, the sequence and async sequence helpers are of course just using PartialsStream.


Performance

As I mentioned earlier, this library is far from a JSONDecoder replacement, and performance is the primary reason.

There are 2 reasons for this:

However, my only threshold was to be many orders of magnitude faster than PartialJSONDecoder, which I can gladly say was achieved.

Don’t get me wrong, there’s certainly a lot of room for optimization (probably via a method that would extract pointers to each primitive field directly rather than key paths), but I don’t see it ever getting to the point where it suffices as a JSONDecoder replacement simply due to the nature of the required approach. Right now, the performance is good enough for my needs (parsing small-mid sized payloads from LLMs), so I likely won’t revisit this until a need for better performance arises.


Conclusion

As of writing this the library should now be available. I don’t suspect that I’ll need to make many updates to the library itself since it handles a pretty simple and stable concern. At the end of the day, I suspect this will mainly be used by other libraries/tools rather than applications directly.

— 1/30/26