Skip to content

Commit

Permalink
DIP-273 Content Addressing (#280)
Browse files Browse the repository at this point in the history
Problem
=======
DSNP should provide affordances for finding content that do not rely on
a single DNS-based point of failure for content hosting.
More in the original discussion: #273 

Solution
========
Enhance the specification to treat URLs in Announcements as suggestions
rather than canonical locations for content.
Provide a simple and well-specified set of hashes and encodings that can
be used consistently throughout the protocol.
Use IPFS CIDv1 specifically for locating profiles.

Change summary:
---------------
* Broaden ProfileResource definition so that different types of profile
resources can use different, possibly distributed, file systems via a
generic `contentAddress` field
* Simplify multihashes to use base32 encoding only and sha2-256 or
blake3 as the hashing algorithm
* Update various example hashes in line with this
* Update to pre-1.3.0 versioning and sync prerelease changelogs for
other recent additions to the spec
* Change spec language to expand on how to treat content hash + URL
pairs.
* Announcements use the same base32 encoded multihash for the
`contentHash` and `targetContentHash` fields.

---------

Co-authored-by: Wes Biggs <[email protected]>
  • Loading branch information
wesbiggs and Wes Biggs authored Aug 13, 2024
1 parent 30e8e28 commit 0a81372
Show file tree
Hide file tree
Showing 16 changed files with 104 additions and 110 deletions.
12 changes: 8 additions & 4 deletions .spellcheckerdict.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
Alexa
announcementType
Avro
base32
Base58
BLAKE2b
BLAKE3
Brötli
CalVer
CC0
Changelog
changeType
(cid|CID)s?
[Cc]odec('s)?
contentAddress
contentHash
cryptographic
cryptographically
Expand All @@ -19,7 +21,7 @@ CtxSharedSecretBob
Curve25519
decrypt(ed)?
Delegator
Deserialize
[Dd]eserialize(d)?
Diffie-Hellman
discoverability
[Dd]iscoverable
Expand Down Expand Up @@ -56,7 +58,7 @@ mdBook
MDX
MP[34]
MSA
multibase
[Mm]ultibase
multicodec
[Mm]ultihash
multihash-encoded
Expand All @@ -71,6 +73,8 @@ PNG
Polkadot
Poly1305
pre-configured
Prepending
Prerelease
PRId([ABs])?
ProfileResource
[Pp]ublicKey
Expand Down Expand Up @@ -110,6 +114,6 @@ W3C
WebM
WebP
websocket
whitepaper
[Ww]hitepaper
X25519
XSalsa20
9 changes: 5 additions & 4 deletions pages/ActivityContent/Associated/Attachments.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
"href": "https://upload.wikimedia.org/wikipedia/commons/d/d9/Wilhelm_Scream.ogg",
"mediaType": "audio/ogg",
"hash": [
"QmQrGdv6Ky5sJhaVdw27y4aod5pdfihDkBTxiBkRaSGJJ7"
"bdyqbcji3okmzxobvaqgduz5prixmumyndzopyufultmslndi4pdebii"
]
}
],
Expand Down Expand Up @@ -73,7 +73,7 @@
| `type` | [Activity Vocabulary 2.0](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-type) | YES | Identifies the type of the object | MUST be set to `Link` |
| `href` | [Activity Vocabulary 2.0](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-href) | YES | The URL for the given image | MUST be a [Supported URL Schema](../Overview.md#supported-url-schema) |
| `mediaType` | [Activity Vocabulary 2.0](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype) | YES | MIME type of `href` content | |
| `hash` | [DSNP 1.0](Hash.md) | YES | Array of hashes for linked content validation | MUST include at least one [supported hash](Hash.md#supported-algorithms) |
| `hash` | [DSNP 1.0](Hash.md) | YES | Array of hashes for linked content validation | MUST include at least one [supported hash](../../DSNP/Identifiers.md#supported-hashing-algorithms) |
| `height` | [Activity Vocabulary 2.0](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-height) | no | A hint as to the rendering height in device-independent pixels | |
| `width` | [Activity Vocabulary 2.0](https://www.w3.org/TR/activitystreams-vocabulary/#dfn-width) | no | A hint as to the rendering width in device-independent pixels | |

Expand Down Expand Up @@ -106,7 +106,8 @@
"height": 228,
"mediaType": "image/jpg",
"hash": [
"2Drjgb5yoVWTpubcWmDBLJqkxrFkZamekzJoYLSWwM2ezpFkab"
"bciqjiqcidmzuqpvrl5cocu3l4z2uhj22xqruht3d5kx7qijvfbnjlda",
"bdyqbjzt5drgji5w7xhsddsynusgx2vdmakcsrr4sfin5fyfkwlpup6q"
]
}
]
Expand Down Expand Up @@ -169,7 +170,7 @@
"height": 2250,
"mediaType": "video/webm",
"hash": [
"2Drjgb4a8eC4XheBKCBcbAcaVdEWcKjMbCSZ2L2c9CQs4x98jf"
"bdyqed7dnok3batd7tr64trqmovfxam5tqsgxkiv2op5765pq43swtui"
]
}
]
Expand Down
19 changes: 5 additions & 14 deletions pages/ActivityContent/Associated/Hash.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,18 @@
*NOT* part of the Activity Streams 2.0 Vocabulary.

Activity objects linking to external content such as audio, image or video files must include a `"hash"` field for users to validate linked content.
The value of this `"hash"` field must be an array of strings, each representing a hash output using a specific algorithm.
AT LEAST ONE hash in the array MUST be one of the [supported algorithms](#supported-algorithms), although others may also be used.

Hashes MUST be encoded using the [multihash](https://github.com/multiformats/multihash) specification, and serialized as a [multibase](https://github.com/multiformats/multibase) string.

### Supported Algorithms

| Algorithm | Multihash Name | Leading bytes (as [varint](https://github.com/multiformats/unsigned-varint)) | Reference | DSNP Version Added |
| --- | --- | --- | --- | --- |
| SHA-256 | `sha2-256` | `0x1220` | [RFC 6234](https://tools.ietf.org/html/rfc6234) | 1.2.0 |
| BLAKE2b | `blake2b-256` | `0xa0e40220` | [RFC 7693](https://tools.ietf.org/html/rfc7693) | 1.2.0 |
The value of this `"hash"` field must be an array of strings.
Each item in the array MUST be a valid [DSNP Content Hash](../../DSNP/Identifiers.md#dsnp-content-hash) for the content associated with the hash.

### Example

This example gives SHA-256 and BLAKE2b hashes for the [PDF version of the DSNP whitepaper](https://github.com/LibertyDSNP/papers/raw/main/whitepaper/dsnp_whitepaper.pdf).
This example gives SHA-256 and BLAKE3 hashes for the [PDF version of the DSNP whitepaper](https://github.com/LibertyDSNP/papers/raw/main/whitepaper/dsnp_whitepaper.pdf).

```json
{
"hash": [
"QmQNHNfHnbgJJ6nK4UPx2VtTUCafAKCbqZJ6ZRYUGjoeFj",
"2DrjgbGgSsXRhTiBWckoVwBFC6H4qiBWWNumSsRwdUt82YnTdN"
"bciqdnu347gcfmxzbkhgoubiobphm6readngitfywktdtbdocgogop2q",
"bdyqhwoxp2mc6oyaqpqyd2fvaxralslk32ggazv6nxpp342iec6652tq"
]
}
```
7 changes: 4 additions & 3 deletions pages/ActivityContent/Overview.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Activity Content Specification
__Version 1.2.0__
__Version pre-1.3.0__

Content references shared via the DSNP consist of URLs pointing to documents containing Activity Streams JSON objects.
For the purposes of the DSNP, restrictions are placed on the [Activity Streams 2.0](https://www.w3.org/TR/activitystreams-core/) specification.
Expand Down Expand Up @@ -45,9 +45,10 @@ URLs in DSNP-compatible Activity Content MUST use one of the following URL schem
| [LibertyDSNP/activity-content-java](https://github.com/LibertyDSNP/activity-content-java) | Java/Kotlin |
| [LibertyDSNP/activity-content-swift](https://github.com/LibertyDSNP/activity-content-swift) | Swift |

<!--- Uncomment for pre-release changes and prefix the version with `pre-[next version]`
<!--- Uncomment for pre-release changes and prefix the version with `pre-[next version]` --->
## Prerelease Changelog
--->

- DIP-273 Content Addressing

## Releases

Expand Down
12 changes: 6 additions & 6 deletions pages/ActivityContent/Types/Profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,22 @@ Profiles are used to provide additional user information display.
"icon": [
{
"type": "Link",
"href": "https://placekitten.com/256/256",
"href": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/1-month-old_kittens_32.jpg/256px-1-month-old_kittens_32.jpg",
"mediaType": "image/jpeg",
"width": "256",
"height": "256",
"height": "171",
"hash": [
"QmVmUqGYtHcVgpTFR64bHNcLGGFEeWxmUP6pV2C2RbWpKT"
"bdyqphnphmjdoumkxqbsuspribxvlsx2hx6525u3fh2dkr5bxnqritzi"
]
},
{
"type": "Link",
"href": "https://placekitten.com/64/64",
"href": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/1-month-old_kittens_32.jpg/64px-1-month-old_kittens_32.jpg",
"mediaType": "image/jpeg",
"width": "64",
"height": "64",
"height": "43",
"hash": [
"QmcAh1rov5GcddekCffGeRnaSyiji6ATmfGWpxXYJHgJZx"
"bdyqktr7p5hc27bx4ernmngs6tj7uyukfb4atrtq44mdmx4yntuh2s5y"
]
}
],
Expand Down
8 changes: 6 additions & 2 deletions pages/DSNP/Announcements.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,12 @@ Implementations MUST provide a way to validate that the [identifier](Identifiers
## External Content URLs and Hashes

Where Announcements refer to external documents (such as Activity Content documents), these are referenced by both a URL and a [DSNP Content Hash](Identifiers.md#dsnp-content-hash).
The content hash MUST be generated by applying a [Supported Hashing Algorithm](Identifiers.md#supported-hashing-algorithms) to the full, unaltered contents retrieved from the URL.
When readers retrieve content referenced in an Announcement, they can validate the authenticity of the content by regenerating the hash output and comparing it with the content hash recorded in the Announcement.
The content hash MUST be generated by applying a [Supported Hashing Algorithm](Identifiers.md#supported-hashing-algorithms) to the full, unaltered contents of the document.

The URL associated with a content hash should be construed as a hint to initially locate a document matching the content hash, but is in no way meant to be the only way to locate the indicated document.
Over time, a URL may cease to reference the specified document, or might have its contents altered; therefore, the content hash should be considered the authoritative value and the URL only one of many possible ways of locating a document.
For example, services may cache documents or retrieve them from a content-addressed file system by applying the content hash (or a value derived from the content hash, such as a CID).
When readers retrieve content referenced in an Announcement, they can validate the authenticity of the content, regardless of where it is hosted, by regenerating the hash output and comparing it with the content hash recorded in the Announcement.

## Duplicate Handling

Expand Down
14 changes: 7 additions & 7 deletions pages/DSNP/BatchPublications.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,13 @@ See also [Announcement Types](Announcements.md).

#### Columns with Bloom Filters

| Column | Parquet Type |
| ------ | ---- |
| contentHash | `BYTE_ARRAY` |
| emoji | `BYTE_ARRAY` |
| fromId | `BYTE_ARRAY` |
| inReplyTo | `BYTE_ARRAY` |
| objectId | `BYTE_ARRAY` |
| Column | Primitive Type | Logical Type | Converted Type (deprecated) |
| ------ | ---- | ---- | --- |
| contentHash | `BYTE_ARRAY` | `STRING` | `UTF8` |
| emoji | `BYTE_ARRAY` | `STRING` | `UTF8` |
| fromId | `INT64` | `INT(64, false)` | `UINT_64` |
| inReplyTo | `BYTE_ARRAY` | `STRING` | `UTF8` |
| targetContentHash | `BYTE_ARRAY` | `STRING` | `UTF8` |

## Non-Normative

Expand Down
46 changes: 21 additions & 25 deletions pages/DSNP/Identifiers.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,28 @@ Graph connections are formed through the DSNP User Id.

## DSNP Content Hash

- Variable length byte array (fixed length for a given hashing algorithm)
- MUST be a valid [multihash](https://github.com/multiformats/multihash) encoding of the hash output for the bytes of the content, generated with a [Supported Hashing Algorithm](Announcements.md#supported-hashing-algorithms)
- MUST be a multibase string using the `base32` encoding
- MUST represent a valid [multihash](https://github.com/multiformats/multihash) encoding of the hashing algorithm output for the bytes of the content
- MUST use a [Supported Hashing Algorithm](Announcements.md#supported-hashing-algorithms)

### Serialization Steps

1. Apply the Supported Hashing Algorithm to create a digest of the content.
2. Prepend the leading bytes from the table below indicating the hashing algorithm in the multicodec table and the length of the hash output.
3. Serialize as a [base32 multibase](./Serializations.md#base32-multibase) string.

### Example

1. Applying the BLAKE3 algorithm to the [DSNP Whitepaper](https://dsnp.org/dsnp_whitepaper.pdf) yields the following 32 bytes: `0x3a0393e3ee6c6fec1b13885763225fd0927884b2d431ed262899523ade281cb4`.
2. Prepending the multihash indicator (`0x1e` for `blake3`) and hash length (`0x20` for 32 bytes) gives `0x1e203a0393e3ee6c6fec1b13885763225fd0927884b2d431ed262899523ade281cb4`.
3. Serializing as a base32 multibase string gives us the final DSNP Content Hash of `bdyqdua4t4pxgy37mdmjyqv3dejp5betyqsznimpneyujsur23yubzna`.

### Supported Hashing Algorithms

| Algorithm | Multihash Name | Leading bytes (as [varint](https://github.com/multiformats/unsigned-varint)) | Reference | DSNP Version Added |
| Algorithm | Multihash Name | Leading bytes | Reference | DSNP Version Added |
| --- | --- | --- | --- | --- |
| SHA-256 | `sha2-256` | `0x1220` | [RFC 6234](https://tools.ietf.org/html/rfc6234) | 1.2.0 |
| BLAKE2b | `blake2b-256` | `0xa0e40220` | [RFC 7693](https://tools.ietf.org/html/rfc7693) | 1.2.0 |
| SHA-256 | `sha2-256` | `0x1220` | [RFC 6234](https://tools.ietf.org/html/rfc6234) | 1.2 |
| BLAKE3 | `blake3` | `0x1e20` | [blake3.io](https://blake3.io) | 1.3 |

## DSNP Protocol Scheme

Expand All @@ -46,32 +59,15 @@ The DSNP Content URI consists of three parts: the scheme, the user id, and the c
It is used to uniquely identify an Announcement from a given user with content.

Any [Announcement Types](Announcements.md#announcement-types) with a `fromId` and `contentHash` have a DSNP Content URI.
When encoding a DSNP Content URI, the `contentHash` field MUST be serialized exactly as it appears in the Announcement (that is, as a base32 multihash string).

### Example
```
dsnp://78187493520/QmQNHNfHnbgJJ6nK4UPx2VtTUCafAKCbqZJ6ZRYUGjoeFj
dsnp://78187493520/bdyqdua4t4pxgy37mdmjyqv3dejp5betyqsznimpneyujsur23yubzna
```

| part | value |
| ---- | ----- |
| Scheme | `dsnp://` |
| User Id | `78187493520` |
| Content Hash | `QmQNHNfHnbgJJ6nK4UPx2VtTUCafAKCbqZJ6ZRYUGjoeFj` |

## DSNP CID

A DSNP CID is a valid [Content IDentifier](https://github.com/multiformats/cid) generated using the following parameters.

### Supported CID Parameters

The CID specification allows CIDs to be generated with a wide and ever-growing range of possible hashing algorithms, string encodings, and block sizes.
In order for DSNP applications to interoperate, the required functionality is limited as follows:

- CID version: MUST be version 1, in order to distinguish CIDs from simple multihash values in situations where either may be used
- Hash algorithm: MUST be `sha2-256` or `blake2b-256`
- Encoding: MUST be `base58btc` or `base32`
- Codec: MUST be `dag-pb` for data 256*1024 bytes or longer; `raw` for data less than 256*1024 bytes
- Chunking: Non-leaf nodes MUST be 256*1024 bytes

The rationale for these options is to allow consuming applications to attempt to generate a matching CID from a byte stream for validation purposes without the need to reprocess the stream.
These options are intentionally aligned to interoperate with the default output of the [Kubo](https://github.com/ipfs/kubo) IPFS command line utility when invoked as `ipfs add --cid-version=1 ...`.
| Content Hash | `bdyqdua4t4pxgy37mdmjyqv3dejp5betyqsznimpneyujsur23yubzna` |
9 changes: 6 additions & 3 deletions pages/DSNP/Overview.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# DSNP Specification
__Version 1.2.0__
__Version pre-1.3.0__

DSNP (Decentralized Social Networking Protocol) is a social networking protocol designed to run on a blockchain.
It specifies a set of social primitives along with requirements for interoperability.
Expand Down Expand Up @@ -29,9 +29,12 @@ Compliant DSNP system specifications MUST document how each of the required DSNP

A compliant specification MUST specify a mapping from its system-specific state change data (for example, the events emitted by a blockchain) to the DSNP State Change Records that data represents.

<!--- Uncomment for pre-release changes and prefix the version with `pre-[next version]`
<!--- Uncomment for pre-release changes and prefix the version with `pre-[next version]` --->
## Prerelease Changelog
--->

- DIP-263 User Data for Public Keys
- DIP-267 User Data for Profile Resources
- DIP-273 Content Addressing

## Releases

Expand Down
27 changes: 14 additions & 13 deletions pages/DSNP/Serializations.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,25 @@ Strings are used to avoid issues with different implementations of numbers.

| Invalid | Why | Valid |
| --- | --- | --- |
| `0x123` | Must be decimal | `"291"` |
| 291 | Must be a string | `"291"` |
| `291n` | `BigInt(291)` serialization appends an `n` | `"291"` |
| `"0x123"` | Must be decimal | `"291"` |
| `291` | Must be a string | `"291"` |
| `"291n"` | `BigInt(291)` serialization appends an `n` | `"291"` |

## hexadecimal
## base32 multibase

Used to represent bytes.
A base32 multibase string is self-identifying and always begins with the `b` character.
The [Multibase Table](https://github.com/multiformats/multibase?tab=readme-ov-file#multibase-table) describes this encoding as "RFC4648 case-insensitive - no padding".

- MUST use 0-9 and a-f representation
- MUST use [RFC4648](https://datatracker.ietf.org/doc/html/rfc4648) §6 alphabet `abcdefghijklmnopqrstuvwxyz234567`
- MUST be lowercase
- MUST be prefixed with a `0x`
- MUST be prefixed with `b`
- MUST NOT have spaces or separators
- MUST have two characters per byte in addition to the `0x` characters
- MUST NOT end with or contain padding characters (`=`)

| Bytes | Invalid | Valid |
| Invalid | Why | Valid |
| --- | --- | --- |
| 2 | `0x123` | `0x0123` |
| 2 | `123h` | `0x0123` |
| 2 | `0x0ABC` | `0x0abc` |
| 8 | `0xabc` | `0x0000000000000abc` |
| 32 | `0x3e34c4325f4461b9355027b314f3eb56d31af549f7da7bd9ef1ce951651e` | `0x00003e34c4325f4461b9355027b314f3eb56d31af549f7da7bd9ef1ce951651e` |
| `BDYQDUA4T` | Must user lowercase | `bdyqdua4t` |
| `dyqdua4t` | Missing `b` prefix | `bdyqdua4t` |
| `b3og3k0sj` | Wrong alphabet (`base32hex` was used) | `bdyqdua4t` |
| `bdyqdua4t=` | Must not have padding characters | `bdyqdua4t` |
4 changes: 2 additions & 2 deletions pages/DSNP/Types/Broadcast.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ A Broadcast Announcement is a way to send a public message to everyone.
| Field | Description | Data Type | Serialization | Parquet Type | Bloom Filter |
| ----- | ----------- | --------- | ------------- | ------------ | ------------ |
| announcementType | Announcement Type Enum (`2`) | enum | [decimal](../Serializations.md#decimal) | `INT32` | no |
| contentHash | multihash-encoded hash of content stored at URL | variable length byte array | [hexadecimal](../Serializations.md#hexadecimal) | `BYTE_ARRAY` | YES
| contentHash | [DSNP Content Hash](../Identifiers.md#dsnp-content-hash) of content | UTF-8 | [base32 multibase](../Serializations.md#base32-multibase) | `UTF8` | YES
| fromId | id of the user creating the Announcement | 64-bit unsigned integer | [decimal](../Serializations.md#decimal) | `UINT_64` | YES
| url | content URL | UTF-8 | [UTF-8](https://datatracker.ietf.org/doc/html/rfc3629) | `UTF8` | no

Expand All @@ -19,7 +19,7 @@ A Broadcast Announcement is a way to send a public message to everyone.

### contentHash

- MUST be a [DSNP Content Hash](../Identifiers.md#dsnp-content-hash)
- MUST be a valid [DSNP Content Hash](../Identifiers.md#dsnp-content-hash)

### fromId

Expand Down
Loading

0 comments on commit 0a81372

Please sign in to comment.