README.md 12.7 KB
Newer Older
1
# Samizdat
Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
2

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
3
A browser-based solution to Web censorship, implemented as a JavaScript library to be deployed easily on any website. Samizdat uses [ServiceWorkers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers) and a suite of non-standard in-browser delivery mechanisms, with a strong focus on decentralized tools like [IPFS](https://ipfs.io/).
4
5

Ideally, users should not need to install any special software nor change any settings to continue being able to access a blocked Samizdat-enabled site as soon as they are able to access it *once*.
6

7
## Current status
8

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
9
Samizdat is currently considered *alpha*: the code works, but major rewrites and API changes are coming. It has been tested on Firefox, Chromium and Chrome on desktop, as well as Firefox for mobile on Android, but it should work in any browser implementing the ServiceWorker API.
10

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
11
Feel free to test it, but be aware that it might not work as expected. If you'd like to get in touch, please email us at `rysiek+samizdat[at]hackerspace.pl`, create an [issue](https://0xacab.org/rysiek/samizdat/-/issues/new), or contact us [on the Fediverse](https://mastodon.social/tags/samizdat).
12

13
14
15
16
## Rationale

While a number of censorship circumvention technologies exist, these typically require those who want to access the blocked content (readers) to install specific tools (applications, browser extensions, VPN software, etc.), or change their settings (DNS servers, HTTP proxies, etc.). This approach does not scale.

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
17
At the same time, large-scale Internet censorship solutions are deployed in places like UK, Azerbaijan or Tajikistan, effectively blocking whole nations from accessing information deemed *non grata* by the relevant governments. And with the ever-increasing centralization of the Web, censorship has never been easier.
18

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
19
20
21
This project explores the possibility of solving this in a way that would not require visitors to install any special software or change any settings; the only things that are needed are a modern Web browser and the ability to visit a website once, so that the JavaScript ServiceWorker kicks in.

You can read more in-depth overview of Samizdat [here](./docs/OVERVIEW.md). And [here](./docs/PHILOSOPHY.md) is a document describing the philosophy influencing project goals and relevant technical decisions.
22
23
24
25
26

## Architecture

A [ServiceWorker](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API) is used as a way to persist the censorship circumvention library after the initial visit to the participating website.

27
After the ServiceWorker is downloaded and activated, it handles all `fetch()` events by first trying to use the regular HTTPS request to the original website. If that fails for whatever reason (be it a timeout or a `4xx`/`5xx` error), the plugins kick in, attempting to fetch the content via any means available.
28

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
29
A more complete overview of the architecture and technicalities of Samizdat is available [here](./docs/ARCHITECTURE.md).
30

31
32
## Draft API

33
The plan is to have an API to enable the use of different strategies for getting content. There are two basic functions a plugin needs to perform:
34
 - **resolution**  
35
   *where* a given piece of content (image, stylesheet, script, HTML file, etc.) is to be found
36
37
38
 - **delivery**  
   *how* to get it

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
39
These need to be closely integrated. For example, if using Gun and IPFS, resolution is performed using Gun, and delivery is performed using IPFS. However, Gun needs to resolve content to something that is usable with IPFS. If, alternatively, we're also using Gun to resolve content available on BitTorrent, that will have to be a separate namespace in the Gun graph, since it will have to resolve to magnet links.
40

41
Therefore, it doesn't seem to make sense to separate resolution and delivery. Thus, a Samizdat plugin would need to implement the whole pipeline, and work by receiving a URL and returning a Promise that resolves to a valid Response object containing the content.
42

43
It should be possible to chain the plugins (try the first one, in case of error try the next, and so on), or run them in parallel (fire requests using all available plugins and return the first complete successful response). Running in parallel might offer a better user experience, but will also be more resource-intensive.
44

45
An additional part of the API is going to deal with reporting the status of the plugins, their versions, and how a given piece of content was fetched (using which plugin). This will require modifying actual content from the ServiceWorker to pass that data to the DOM.
46
47
48

### Content versioning

49
Implementing content versioning might be necessary. Some delivery mechanisms (IPFS, BitTorrent) might be slow to pick up newly published content, and while information about this might be available, it might be faster to fetch and display older content that has already propagated across multiple peers or network nodes, with a message informing the reader that new content is available and that they might want to retry fetching it.
50
51

An important consideration related to content versioning is that it needs to be consistent across a full set of published pieces of content.
52
53
54
55
56

For example, consider a simple site that consists of an `index.html`, `style.css`, and `script.js`. Non-trivial changes in `index.html` will render older versions of `style.css` and `script.js` broken. A particular version of the whole published site needs to be fetched, otherwise things will not work as expected.

This will probably need to be fleshed out later on, but the initial API needs to be designed in a way where content versioning can be introduced without breaking backwards compatibility with plugins.

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
57
### Status information
58

59
Status information should be available to users, informing them that the content is being retrieved using non-standard means that might take longer.
60

61
Samizdat information is kept per-request in the ServiceWorker, meaning it is transient and does not survive ServiceWorker restarts, which might happen multiple times over the lifetime of an open tab. However, each update is communicated to the browser window context that is relevant for a given request via [`client.postMessage`](https://developer.mozilla.org/en-US/docs/Web/API/Client/postMessage) calls. This is also how information on ServiceWorker commit SHAs and available plugins are made available to the browser window context.
62

63
The data provided (per each requested URL handled by the ServiceWorker) is:
64
65
66
67
 - `clientId` – the [Client ID](https://developer.mozilla.org/en-US/docs/Web/API/FetchEvent/clientId) for the request (that is, the Client ID of this browser window)
 - `url` – the URL of the request
 - `serviceWorker` – the commit SHA of the ServiceWorker that handled the request
 - `fetchError` – `null` if the request completed successfully via regular HTTPS; otherwise the error message
68
 - `method` – the method by which the request was completed: "`fetch`" is regular HTTPS `fetch()`, `gun-ipfs` means Gun and IPFS were used, etc.
69
 - `state` – the state of the request (`running`, `error`, `success`)
Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
70

71
The code in the browser window context is responsible for keeping a more permanent record of the URLs requested, the methods used, and the status of each, if needed.
72

73
## Review of possible resolution/delivery methods
74
75

 - **[Gun](https//gun.eco/)**  
76
   Better suited for resolution than for delivery, although it could handle both. Pretty new project, dynamically developed. No global network of public peers available currently. Content is cryptographically signed.
77
78

 - **[IPNS](https://docs.ipfs.io/guides/concepts/ipns/)**  
Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
79
   Only suitable for resolution. Experimental, not fully functional in the browser yet. Fits like a hand in a glove with IPFS.
80
81

 - **[DNSLink](https://docs.ipfs.io/guides/concepts/dnslink/)**  
Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
82
   Only suitable for resolution. Deployed, stable, and well-documented. Fits like a hand in a glove with IPFS. The downside is that it requires publishing of DNS records to work (every time any new content is published), which means it might not be useful in most situations where censorship is involved – depending on where the DNSLink-to-IPFS address resolution happens.
83
84

 - **[IPFS](https://ipfs.io/)**  
Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
85
   Only suitable for delivery, since it is content-addressed. Resolution of a content URI to an IPFS address needs to be handled by some other technology (like Gun or IPNS, or using [gateways](https://ipfs.github.io/public-gateway-checker/)). Deployed and well-documented, with a large community of developers. Redeploying a new content package with certain files unchanged does not change the addresses of the unchanged files, meaning that small changes in content do not lead to the whole content tree needing to be re-seeded.
86
87

 - **[WebTorrent](https://github.com/webtorrent/webtorrent)**  
Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
88
   Only suitable for content delivery. It seems possible to fetch a particular file from a given torrent, so as not to have to download a torrent of the whole website just to display a single page with some CSS and JS. Requires a resolver to point to the newest torrent since torrents are immutable. Even small changes (for example, only a few files changed in the whole website tree) require creating a new torrent and re-seeding, which is obviously less than ideal.
89
90

 - **Plain files via HTTPS**  
91
   This delivery method is obvious if we're talking simply about the originating site and it serving the files, but this can also mean non-standard strategies like pushing static HTML+CSS+JS to CloudFront or Wasabi, and having a minimal resolver kick in if the originating site is blocked, to fetch content seamlessly from alternative locations (effectively implementing domain fronting and collateral freedom in the browser). However, this will require some thought being put into somehow signing content deployed to third-party locations – perhaps the resolver (like Gun) could be responsible for keeping SHA sums of known good content, or perhaps we should just address it using the hashes, effectively imitating IPFS.
92

93
94
## Limitations

95
There are certain limitations to what can be done with Samizdat:
96
97
98

### Service worker cannot be updated if origin is blocked

99
ServiceWorker script apparently cannot be delivered using any of the censorship circumvention plugins, [since](https://gist.github.com/Rich-Harris/fd6c3c73e6e707e312d7c5d7d0f3b2f9#the-new-service-worker-isnt-fetched-by-the-old-one):
100
101
102

> when you call `navigator.serviceWorker.register('service-worker.js)` the request for service-worker.js isn't intercepted by any service worker's fetch event handler.

103
So, the ServiceWorker script will be un-updateable via Samizdat in case the origin site is blocked, unless we find a way to hack around it with caches etc.
104

105
### JS implementations of decentralized protocols are still bootstrapped using servers
106
107
108

Gun and IPFS (and probably other potential Samizdat strategies) still use bootstrapping servers (STUN/TURN, and other kinds of public nodes), so technically it would be possible to block all of these along with origin sites, thus rendering Samizdat ineffective. This is a limitation of browsers and is related to IPv4 and NATs.

109
One way to deal with this is to have a large list of such public nodes and send only 2-3 each time Samizdat calls home (including via already working decentralized means), so that finding out all the possible nodes would become prohibitively complicated for a censor.
110

111
Plus, the ever-increasing adoption of IPv6 will also partially fix this.
112

Michał "rysiek" Woźniak's avatar
Michał "rysiek" Woźniak committed
113
Finally, [NetBlocks](https://netblocks.org/) deployed a very similar tool (ServiceWorker pulling content from a few specific IP addresses in case of upstream domain blocking) and reportedly it worked rather well; the fallback IP addresses apparently were not blocked, proving that censors move slow.
114

115
116
117
118
## Related developments

 - https://ipfs.io/ipfs/QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5bgFYiZ1/its-time-for-the-permanent-web.html
 - https://blog.archive.org/2015/02/11/locking-the-web-open-a-call-for-a-distributed-web/
119
 - https://censorship.no/  
120
121
122
123
   This seems to be a browser, and as such it requires users to download specific software (i.e. the browser) before censorship circumvention can kick in.
 - https://netblocks.org/   
   Former(?) Lazarus project. Basically the same idea as Samizdat: a ServiceWorker that tries to fetch content from the website, and if it's unavailable, fetches it from somewhere else (in the case of Lazarus, from a few specified IP addresses). Used to be deployed in production and used successfully by users in the field.

124
125
## Special thanks and acknowledgements

126
The name "Samizdat" [was suggested for the project by Doc Edward Morbius](https://mastodon.cloud/@dredmorbius/102949927295700792) and was clearly the right choice. There were many other great suggestions (see [the relevant thread](https://mastodon.social/@rysiek/102916750160299480)). We'd like to thank everyone who suggested names, or took part in the poll!