Skip to content

Latest commit

 

History

History
1420 lines (1086 loc) · 52.6 KB

formats.md

File metadata and controls

1420 lines (1086 loc) · 52.6 KB

Alternative and custom formats (experimental)

This is the sixth chapter of the Kotlin Serialization Guide. It goes beyond JSON, covering alternative and custom formats. Unlike JSON, which is stable, these are currently experimental features of Kotlin Serialization.

Table of contents

CBOR (experimental)

CBOR is one of the standard compact binary encodings for JSON, so it supports a subset of JSON features and is generally very similar to JSON in use, but produces binary data.

CBOR support is (experimentally) available in a separate org.jetbrains.kotlinx:kotlinx-serialization-cbor:<version> module.

Cbor class has Cbor.encodeToByteArray and Cbor.decodeFromByteArray functions. Let us take the basic example from the JSON encoding, but encode it using CBOR.

@Serializable
data class Project(val name: String, val language: String)

fun main() {
    val data = Project("kotlinx.serialization", "Kotlin") 
    val bytes = Cbor.encodeToByteArray(data)   
    println(bytes.toAsciiHexString())
    val obj = Cbor.decodeFromByteArray<Project>(bytes)
    println(obj)
}

You can get the full code here.

We print a filtered ASCII representation of the output, writing non-ASCII data in hex, so we see how all the original strings are directly represented in CBOR, but the format delimiters themselves are binary.

{BF}dnameukotlinx.serializationhlanguagefKotlin{FF}
Project(name=kotlinx.serialization, language=Kotlin)

In CBOR hex notation, the output is equivalent to the following:

BF                                      # map(*)
   64                                   # text(4)
      6E616D65                          # "name"
   75                                   # text(21)
      6B6F746C696E782E73657269616C697A6174696F6E # "kotlinx.serialization"
   68                                   # text(8)
      6C616E6775616765                  # "language"
   66                                   # text(6)
      4B6F746C696E                      # "Kotlin"
   FF                                   # primitive(*)

Note, CBOR as a format, unlike JSON, supports maps with non-trivial keys (see the Allowing structured map keys section for JSON workarounds), and Kotlin maps are serialized as CBOR maps, but some parsers (like jackson-dataformat-cbor) don't support this.

Ignoring unknown keys

CBOR format is often used to communicate with IoT devices where new properties could be added as a part of a device's API evolution. By default, unknown keys encountered during deserialization produce an error. This behavior can be configured with the ignoreUnknownKeys property.

val format = Cbor { ignoreUnknownKeys = true }

@Serializable
data class Project(val name: String)

fun main() {
    val data = format.decodeFromHexString<Project>(
        "bf646e616d65756b6f746c696e782e73657269616c697a6174696f6e686c616e6775616765664b6f746c696eff"
    )
    println(data)
}

You can get the full code here.

It decodes the object, despite the fact that Project is missing the language property.

Project(name=kotlinx.serialization)

In CBOR hex notation, the input is equivalent to the following:

BF                                      # map(*)
   64                                   # text(4)
      6E616D65                          # "name"
   75                                   # text(21)
      6B6F746C696E782E73657269616C697A6174696F6E # "kotlinx.serialization"
   68                                   # text(8)
      6C616E6775616765                  # "language"
   66                                   # text(6)
      4B6F746C696E                      # "Kotlin"
   FF                                   # primitive(*)

Byte arrays and CBOR data types

Per the RFC 7049 Major Types section, CBOR supports the following data types:

  • Major type 0: an unsigned integer
  • Major type 1: a negative integer
  • Major type 2: a byte string
  • Major type 3: a text string
  • Major type 4: an array of data items
  • Major type 5: a map of pairs of data items
  • Major type 6: optional semantic tagging of other major types
  • Major type 7: floating-point numbers and simple data types that need no content, as well as the "break" stop code

By default, Kotlin ByteArray instances are encoded as major type 4. When major type 2 is desired, then the @ByteString annotation can be used.

@Serializable
data class Data(
    @ByteString
    val type2: ByteArray, // CBOR Major type 2
    val type4: ByteArray  // CBOR Major type 4
)        

fun main() {
    val data = Data(byteArrayOf(1, 2, 3, 4), byteArrayOf(5, 6, 7, 8)) 
    val bytes = Cbor.encodeToByteArray(data)   
    println(bytes.toAsciiHexString())
    val obj = Cbor.decodeFromByteArray<Data>(bytes)
    println(obj)
}

You can get the full code here.

As we see, the CBOR byte that precedes the data is different for different types of encoding.

{BF}etype2D{01}{02}{03}{04}etype4{9F}{05}{06}{07}{08}{FF}{FF}
Data(type2=[1, 2, 3, 4], type4=[5, 6, 7, 8])

In CBOR hex notation, the output is equivalent to the following:

BF               # map(*)
   65            # text(5)
      7479706532 # "type2"
   44            # bytes(4)
      01020304   # "\x01\x02\x03\x04"
   65            # text(5)
      7479706534 # "type4"
   9F            # array(*)
      05         # unsigned(5)
      06         # unsigned(6)
      07         # unsigned(7)
      08         # unsigned(8)
      FF         # primitive(*)
   FF            # primitive(*)

ProtoBuf (experimental)

Protocol Buffers is a language-neutral binary format that normally relies on a separate ".proto" file that defines the protocol schema. It is more compact than CBOR, because it assigns integer numbers to fields instead of names.

Protocol buffers support is (experimentally) available in a separate org.jetbrains.kotlinx:kotlinx-serialization-protobuf:<version> module.

Kotlin Serialization is using proto2 semantics, where all fields are explicitly required or optional. For a basic example we change our example to use the ProtoBuf class with ProtoBuf.encodeToByteArray and ProtoBuf.decodeFromByteArray functions.

@Serializable
data class Project(val name: String, val language: String)

fun main() {
    val data = Project("kotlinx.serialization", "Kotlin") 
    val bytes = ProtoBuf.encodeToByteArray(data)   
    println(bytes.toAsciiHexString())
    val obj = ProtoBuf.decodeFromByteArray<Project>(bytes)
    println(obj)
}

You can get the full code here.

{0A}{15}kotlinx.serialization{12}{06}Kotlin
Project(name=kotlinx.serialization, language=Kotlin)

In ProtoBuf hex notation, the output is equivalent to the following:

Field #1: 0A String Length = 21, Hex = 15, UTF8 = "kotlinx.serialization"
Field #2: 12 String Length = 6, Hex = 06, UTF8 = "Kotlin"

Field numbers

By default, field numbers in the Kotlin Serialization ProtoBuf implementation are automatically assigned, which does not provide the ability to define a stable data schema that evolves over time. That is normally achieved by writing a separate ".proto" file. However, with Kotlin Serialization we can get this ability without a separate schema file, instead using the ProtoNumber annotation.

@Serializable
data class Project(
    @ProtoNumber(1)
    val name: String, 
    @ProtoNumber(3)
    val language: String
)

fun main() {
    val data = Project("kotlinx.serialization", "Kotlin") 
    val bytes = ProtoBuf.encodeToByteArray(data)   
    println(bytes.toAsciiHexString())
    val obj = ProtoBuf.decodeFromByteArray<Project>(bytes)
    println(obj)
}

You can get the full code here.

We see in the output that the number for the first property name did not change (as it is numbered from one by default), but it did change for the language property.

{0A}{15}kotlinx.serialization{1A}{06}Kotlin
Project(name=kotlinx.serialization, language=Kotlin)

In ProtoBuf hex notation, the output is equivalent to the following:

Field #1: 0A String Length = 21, Hex = 15, UTF8 = "kotlinx.serialization" (total 21 chars)
Field #3: 1A String Length = 6, Hex = 06, UTF8 = "Kotlin"

Integer types

Protocol buffers support various integer encodings optimized for different ranges of integers. They are specified using the ProtoType annotation and the ProtoIntegerType enum. The following example shows all three supported options.

@Serializable
class Data(
    @ProtoType(ProtoIntegerType.DEFAULT)
    val a: Int,
    @ProtoType(ProtoIntegerType.SIGNED)
    val b: Int,
    @ProtoType(ProtoIntegerType.FIXED)
    val c: Int
)

fun main() {
    val data = Data(1, -2, 3) 
    println(ProtoBuf.encodeToByteArray(data).toAsciiHexString())
}

You can get the full code here.

  • The default is a varint encoding (intXX) that is optimized for small non-negative numbers. The value of 1 is encoded in one byte 01.
  • The signed is a signed ZigZag encoding (sintXX) that is optimized for small signed integers. The value of -2 is encoded in one byte 03.
  • The fixed encoding (fixedXX) always uses a fixed number of bytes. The value of 3 is encoded as four bytes 03 00 00 00.

uintXX and sfixedXX protocol buffer types are not supported.

{08}{01}{10}{03}{1D}{03}{00}{00}{00}

In ProtoBuf hex notation the output is equivalent to the following:

Field #1: 08 Varint Value = 1, Hex = 01
Field #2: 10 Varint Value = 3, Hex = 03
Field #3: 1D Fixed32 Value = 3, Hex = 03-00-00-00

Lists as repeated fields

By default, kotlin lists and other collections are representend as repeated fields. In the protocol buffers when the list is empty there are no elements in the stream with the corresponding number. For Kotlin Serialization you must explicitly specify a default of emptyList() for any property of a collection or map type. Otherwise you will not be able deserialize an empty list, which is indistinguishable in protocol buffers from a missing field.

@Serializable
data class Data(
    val a: List<Int> = emptyList(),
    val b: List<Int> = emptyList()
)

fun main() {
    val data = Data(listOf(1, 2, 3), listOf())
    val bytes = ProtoBuf.encodeToByteArray(data)
    println(bytes.toAsciiHexString())
    println(ProtoBuf.decodeFromByteArray<Data>(bytes))
}

You can get the full code here.

{08}{01}{08}{02}{08}{03}
Data(a=[1, 2, 3], b=[])

In ProtoBuf diagnostic mode the output is equivalent to the following:

Field #1: 08 Varint Value = 1, Hex = 01
Field #1: 08 Varint Value = 2, Hex = 02
Field #1: 08 Varint Value = 3, Hex = 03

Packed fields

Collection types (not maps) can be written as packed fields when annotated with the @ProtoPacked annotation. Per the standard packed fields can only be used on primitive numeric types. The annotation is ignored on other types.

Per the format description the parser ignores the annotation, but rather reads list in either packed or repeated format.

ProtoBuf schema generator (experimental)

As mentioned above, when working with protocol buffers you usually use a ".proto" file and a code generator for your language. This includes the code to serialize your message to an output stream and deserialize it from an input stream. When using Kotlin Serialization this step is not necessary because your @Serializable Kotlin data types are used as the source for the schema.

This is very convenient for Kotlin-to-Kotlin communication, but makes interoperability between languages complicated. Fortunately, you can use the ProtoBuf schema generator to output the ".proto" representation of your messages. You can keep your Kotlin classes as a source of truth and use traditional protoc compilers for other languages at the same time.

As an example, we can display the following data class's ".proto" schema as follows.

@Serializable
data class SampleData(
    val amount: Long,
    val description: String?,
    val department: String = "QA"
)
fun main() {
  val descriptors = listOf(SampleData.serializer().descriptor)
  val schemas = ProtoBufSchemaGenerator.generateSchemaText(descriptors)
  println(schemas)
}

You can get the full code here.

Which would output as follows.

syntax = "proto2";


// serial name 'example.exampleFormats08.SampleData'
message SampleData {
  required int64 amount = 1;
  optional string description = 2;
  // WARNING: a default value decoded when value is missing
  optional string department = 3;
}

Note that since default values are not represented in ".proto" files, a warning is generated when one appears in the schema.

See the documentation for ProtoBufSchemaGenerator for more information.

Properties (experimental)

Kotlin Serialization can serialize a class into a flat map with String keys via the Properties format implementation.

Properties support is (experimentally) available in a separate org.jetbrains.kotlinx:kotlinx-serialization-properties:<version> module.

@Serializable
class Project(val name: String, val owner: User)

@Serializable
class User(val name: String)

fun main() {
    val data = Project("kotlinx.serialization",  User("kotlin"))
    val map = Properties.encodeToMap(data)
    map.forEach { (k, v) -> println("$k = $v") }
}

You can get the full code here.

The resulting map has dot-separated keys representing keys of the nested objects.

name = kotlinx.serialization
owner.name = kotlin

Custom formats (experimental)

A custom format for Kotlin Serialization must provide an implementation for the Encoder and Decoder interfaces that we saw used in the Serializers chapter.
These are pretty large interfaces. For convenience the AbstractEncoder and AbstractDecoder skeleton implementations are provided to simplify the task. In AbstractEncoder most of the encodeXxx methods have a default implementation that delegates to encodeValue(value: Any) — the only method that must be implemented to get a basic working format.

Basic encoder

Let us start with a trivial format implementation that encodes the data into a single list of primitive constituent objects in the order they were written in the source code. To start, we implement a simple Encoder by overriding encodeValue in AbstractEncoder.

class ListEncoder : AbstractEncoder() {
    val list = mutableListOf<Any>()

    override val serializersModule: SerializersModule = EmptySerializersModule()

    override fun encodeValue(value: Any) {
        list.add(value)
    }
}

Now we write a convenience top-level function that creates an encoder that encodes an object and returns a list.

fun <T> encodeToList(serializer: SerializationStrategy<T>, value: T): List<Any> {
    val encoder = ListEncoder()
    encoder.encodeSerializableValue(serializer, value)
    return encoder.list
}

For even more convenience, to avoid the need to explicitly pass a serializer, we write an inline overload of the encodeToList function with a reified type parameter using the serializer function to retrieve the appropriate KSerializer instance for the actual type.

inline fun <reified T> encodeToList(value: T) = encodeToList(serializer(), value)

Now we can test it.

@Serializable
data class Project(val name: String, val owner: User, val votes: Int)

@Serializable
data class User(val name: String)

fun main() {
    val data = Project("kotlinx.serialization",  User("kotlin"), 9000)
    println(encodeToList(data))
}

You can get the full code here.

As a result, we got all the primitive values in our object graph visited and put into a list in serial order.

[kotlinx.serialization, kotlin, 9000]

By itself, that's a useful feature if we need compute some kind of hashcode or digest for all the data that is contained in a serializable object tree.

Basic decoder

A decoder needs to implement more substance.

  • decodeValue — returns the next value from the list.
  • decodeElementIndex — returns the next index of a deserialized value. In this primitive format deserialization always happens in order, so we keep track of the index in the elementIndex variable. See the Hand-written composite serializer section on how it ends up being used.
  • beginStructure — returns a new instance of the ListDecoder, so that each structure that is being recursively decoded keeps track of its own elementIndex state separately.
class ListDecoder(val list: ArrayDeque<Any>) : AbstractDecoder() {
    private var elementIndex = 0

    override val serializersModule: SerializersModule = EmptySerializersModule()

    override fun decodeValue(): Any = list.removeFirst()
    
    override fun decodeElementIndex(descriptor: SerialDescriptor): Int {
        if (elementIndex == descriptor.elementsCount) return CompositeDecoder.DECODE_DONE
        return elementIndex++
    }

    override fun beginStructure(descriptor: SerialDescriptor): CompositeDecoder =
        ListDecoder(list)
}

A couple of convenience functions for decoding.

fun <T> decodeFromList(list: List<Any>, deserializer: DeserializationStrategy<T>): T {
    val decoder = ListDecoder(ArrayDeque(list))
    return decoder.decodeSerializableValue(deserializer)
}

inline fun <reified T> decodeFromList(list: List<Any>): T = decodeFromList(list, serializer())

That is enough to start encoding and decoding basic serializable classes.

fun main() {
    val data = Project("kotlinx.serialization",  User("kotlin"), 9000)
    val list = encodeToList(data)
    println(list)
    val obj = decodeFromList<Project>(list)
    println(obj)
}

You can get the full code here.

Now we can convert a list of primitives back to an object tree.

[kotlinx.serialization, kotlin, 9000]
Project(name=kotlinx.serialization, owner=User(name=kotlin), votes=9000)

Sequential decoding

The decoder we have implemented keeps track of the elementIndex in its state and implements decodeElementIndex. This means that it is going to work with an arbitrary serializer, even the simple one we wrote in the Hand-written composite serializer section. However, this format always stores elements in order, so this bookkeeping is not needed and undermines decoding performance. All auto-generated serializers on the JVM support the Sequential decoding protocol (experimental), and the decoder can indicate its support by returning true from the CompositeDecoder.decodeSequentially function.

class ListDecoder(val list: ArrayDeque<Any>) : AbstractDecoder() {
    private var elementIndex = 0

    override val serializersModule: SerializersModule = EmptySerializersModule()

    override fun decodeValue(): Any = list.removeFirst()
    
    override fun decodeElementIndex(descriptor: SerialDescriptor): Int {
        if (elementIndex == descriptor.elementsCount) return CompositeDecoder.DECODE_DONE
        return elementIndex++
    }

    override fun beginStructure(descriptor: SerialDescriptor): CompositeDecoder =
        ListDecoder(list) 

    override fun decodeSequentially(): Boolean = true
}        

You can get the full code here.

Adding collection support

This basic format, so far, cannot properly represent collections. In encodes them, but it does not keep track of how many elements there are in the collection or where it ends, so it cannot properly decode them. First, let us add proper support for collections to the encoder by implementing the Encoder.beginCollection function. The beginCollection function takes a collection size as a parameter, so we encode it to add it to the result. Our encoder implementation does not keep any state, so it just returns this from the beginCollection function.

class ListEncoder : AbstractEncoder() {
    val list = mutableListOf<Any>()

    override val serializersModule: SerializersModule = EmptySerializersModule()

    override fun encodeValue(value: Any) {
        list.add(value)
    }                               

    override fun beginCollection(descriptor: SerialDescriptor, collectionSize: Int): CompositeEncoder {
        encodeInt(collectionSize)
        return this
    }                                                
}

The decoder, for our case, needs to only implement the CompositeDecoder.decodeCollectionSize function in addition to the previous code.

The formats that store collection size in advance have to return true from decodeSequentially.

class ListDecoder(val list: ArrayDeque<Any>, var elementsCount: Int = 0) : AbstractDecoder() {
    private var elementIndex = 0

    override val serializersModule: SerializersModule = EmptySerializersModule()

    override fun decodeValue(): Any = list.removeFirst()

    override fun decodeElementIndex(descriptor: SerialDescriptor): Int {
        if (elementIndex == elementsCount) return CompositeDecoder.DECODE_DONE
        return elementIndex++
    }

    override fun beginStructure(descriptor: SerialDescriptor): CompositeDecoder =
        ListDecoder(list, descriptor.elementsCount)

    override fun decodeSequentially(): Boolean = true

    override fun decodeCollectionSize(descriptor: SerialDescriptor): Int =
        decodeInt().also { elementsCount = it }
}

That is all that is needed to support collections and maps.

@Serializable
data class Project(val name: String, val owners: List<User>, val votes: Int)

@Serializable
data class User(val name: String)

fun main() {
    val data = Project("kotlinx.serialization",  listOf(User("kotlin"), User("jetbrains")), 9000)
    val list = encodeToList(data)
    println(list)
    val obj = decodeFromList<Project>(list)
    println(obj)
}

You can get the full code here.

We see the size of the list added to the result, letting the decoder know where to stop.

[kotlinx.serialization, 2, kotlin, jetbrains, 9000]
Project(name=kotlinx.serialization, owners=[User(name=kotlin), User(name=jetbrains)], votes=9000)

Adding null support

Our trivial format does not support null values so far. For nullable types we need to add some kind of "null indicator", telling whether the upcoming value is null or not.

In the encoder implementation we override Encoder.encodeNull and Encoder.encodeNotNullMark.

    override fun encodeNull() = encodeValue("NULL")
    override fun encodeNotNullMark() = encodeValue("!!")

In the decoder implementation we override Decoder.decodeNotNullMark.

    override fun decodeNotNullMark(): Boolean = decodeString() != "NULL"

Let us test nullable properties both with not-null and null values.

@Serializable
data class Project(val name: String, val owner: User?, val votes: Int?)

@Serializable
data class User(val name: String)

fun main() {
    val data = Project("kotlinx.serialization",  User("kotlin") , null)
    val list = encodeToList(data)
    println(list)
    val obj = decodeFromList<Project>(list)
    println(obj)
}

You can get the full code here.

In the output we see how not-null!! and NULL marks are used.

[kotlinx.serialization, !!, kotlin, NULL]
Project(name=kotlinx.serialization, owner=User(name=kotlin), votes=null)

Efficient binary format

Now we are ready for an example of an efficient binary format. We are going to write data to the java.io.DataOutput implementation. Instead of encodeValue we must override the individual encodeXxx functions for each of ten primitives in the encoder.

class DataOutputEncoder(val output: DataOutput) : AbstractEncoder() {
    override val serializersModule: SerializersModule = EmptySerializersModule()
    override fun encodeBoolean(value: Boolean) = output.writeByte(if (value) 1 else 0)
    override fun encodeByte(value: Byte) = output.writeByte(value.toInt())
    override fun encodeShort(value: Short) = output.writeShort(value.toInt())
    override fun encodeInt(value: Int) = output.writeInt(value)
    override fun encodeLong(value: Long) = output.writeLong(value)
    override fun encodeFloat(value: Float) = output.writeFloat(value)
    override fun encodeDouble(value: Double) = output.writeDouble(value)
    override fun encodeChar(value: Char) = output.writeChar(value.code)
    override fun encodeString(value: String) = output.writeUTF(value)
    override fun encodeEnum(enumDescriptor: SerialDescriptor, index: Int) = output.writeInt(index)

    override fun beginCollection(descriptor: SerialDescriptor, collectionSize: Int): CompositeEncoder {
        encodeInt(collectionSize)
        return this
    }

    override fun encodeNull() = encodeBoolean(false)
    override fun encodeNotNullMark() = encodeBoolean(true)
}

The decoder implementation mirrors encoder's implementation overriding all the primitive decodeXxx functions.

class DataInputDecoder(val input: DataInput, var elementsCount: Int = 0) : AbstractDecoder() {
    private var elementIndex = 0
    override val serializersModule: SerializersModule = EmptySerializersModule()
    override fun decodeBoolean(): Boolean = input.readByte().toInt() != 0
    override fun decodeByte(): Byte = input.readByte()
    override fun decodeShort(): Short = input.readShort()
    override fun decodeInt(): Int = input.readInt()
    override fun decodeLong(): Long = input.readLong()
    override fun decodeFloat(): Float = input.readFloat()
    override fun decodeDouble(): Double = input.readDouble()
    override fun decodeChar(): Char = input.readChar()
    override fun decodeString(): String = input.readUTF()
    override fun decodeEnum(enumDescriptor: SerialDescriptor): Int = input.readInt()

    override fun decodeElementIndex(descriptor: SerialDescriptor): Int {
        if (elementIndex == elementsCount) return CompositeDecoder.DECODE_DONE
        return elementIndex++
    }

    override fun beginStructure(descriptor: SerialDescriptor): CompositeDecoder =
        DataInputDecoder(input, descriptor.elementsCount)

    override fun decodeSequentially(): Boolean = true

    override fun decodeCollectionSize(descriptor: SerialDescriptor): Int =
        decodeInt().also { elementsCount = it }

    override fun decodeNotNullMark(): Boolean = decodeBoolean()
}

We can now serialize and deserialize arbitrary data. For example, the same classes as were used in the CBOR (experimental) and ProtoBuf (experimental) sections.

@Serializable
data class Project(val name: String, val language: String)

fun main() {
    val data = Project("kotlinx.serialization", "Kotlin")
    val output = ByteArrayOutputStream()
    encodeTo(DataOutputStream(output), data)
    val bytes = output.toByteArray()
    println(bytes.toAsciiHexString())
    val input = ByteArrayInputStream(bytes)
    val obj = decodeFrom<Project>(DataInputStream(input))
    println(obj)
}

You can get the full code here.

As we can see, the result is a dense binary format that only contains the data that is being serialized. It can be easily tweaked for any kind of domain-specific compact encoding.

{00}{15}kotlinx.serialization{00}{06}Kotlin
Project(name=kotlinx.serialization, language=Kotlin)

Format-specific types

A format implementation might provide special support for data types that are not among the list of primitive types in Kotlin Serialization, and do not have a corresponding encodeXxx/decodeXxx function. In the encoder this is achieved by overriding the encodeSerializableValue(serializer, value) function.

In our DataOutput format example we might want to provide a specialized efficient data path for serializing an array of bytes since DataOutput has a special method for this purpose.

Detection of the type is performed by looking at the serializer.descriptor, not by checking the type of the value being serialized, so we fetch the builtin KSerializer instance for ByteArray type.

This an important difference. This way our format implementation properly supports Custom serializers that a user might specify for a type that just happens to be internally represented as a byte array, but need a different serial representation.

private val byteArraySerializer = serializer<ByteArray>()

Specifically for byte arrays, we could have also used the builtin ByteArraySerializer function.

We add the corresponding code to the Encoder implementation of our Efficient binary format. To make our ByteArray encoding even more efficient, we add a trivial implementation of encodeCompactSize function that uses only one byte to represent a size of up to 254 bytes.

    override fun <T> encodeSerializableValue(serializer: SerializationStrategy<T>, value: T) {
        if (serializer.descriptor == byteArraySerializer.descriptor)
            encodeByteArray(value as ByteArray)
        else
            super.encodeSerializableValue(serializer, value)
    }

    private fun encodeByteArray(bytes: ByteArray) {
        encodeCompactSize(bytes.size)
        output.write(bytes)
    }
    
    private fun encodeCompactSize(value: Int) {
        if (value < 0xff) {
            output.writeByte(value)
        } else {
            output.writeByte(0xff)
            output.writeInt(value)
        }
    }            

A similar code is added to the Decoder implementation. Here we override the decodeSerializableValue function.

    @Suppress("UNCHECKED_CAST")
    override fun <T> decodeSerializableValue(deserializer: DeserializationStrategy<T>, previousValue: T?): T =
        if (deserializer.descriptor == byteArraySerializer.descriptor)
            decodeByteArray() as T
        else
            super.decodeSerializableValue(deserializer, previousValue)

    private fun decodeByteArray(): ByteArray {
        val bytes = ByteArray(decodeCompactSize())
        input.readFully(bytes)
        return bytes
    }

    private fun decodeCompactSize(): Int {
        val byte = input.readByte().toInt() and 0xff
        if (byte < 0xff) return byte
        return input.readInt()
    }

Now everything is ready to perform serialization of some byte arrays.

@Serializable
data class Project(val name: String, val attachment: ByteArray)

fun main() {
    val data = Project("kotlinx.serialization", byteArrayOf(0x0A, 0x0B, 0x0C, 0x0D))
    val output = ByteArrayOutputStream()
    encodeTo(DataOutputStream(output), data)
    val bytes = output.toByteArray()
    println(bytes.toAsciiHexString())
    val input = ByteArrayInputStream(bytes)
    val obj = decodeFrom<Project>(DataInputStream(input))
    println(obj)
}

You can get the full code here.

As we can see, our custom byte array format is being used, with the compact encoding of its size in one byte.

{00}{15}kotlinx.serialization{04}{0A}{0B}{0C}{0D}
Project(name=kotlinx.serialization, attachment=[10, 11, 12, 13])

This chapter concludes Kotlin Serialization Guide.