README.md 6.58 KB
Newer Older
jvoisin's avatar
jvoisin committed
1
```
atenart's avatar
atenart committed
2
 _____ _____ _____ ___
georg's avatar
georg committed
3
|     |  _  |_   _|_  |  Keep your data,
jvoisin's avatar
jvoisin committed
4
5
| | | |     | | | |  _|     trash your meta!
|_|_|_|__|__| |_| |___|
atenart's avatar
atenart committed
6

jvoisin's avatar
jvoisin committed
7
8
```

jvoisin's avatar
jvoisin committed
9
10
# Metadata and privacy

atenart's avatar
atenart committed
11
12
13
14
Metadata consist of information that characterizes data.
Metadata are used to provide documentation for data products.
In essence, metadata answer who, what, when, where, why, and how about
every facet of the data that are being documented.
jvoisin's avatar
jvoisin committed
15

atenart's avatar
atenart committed
16
17
18
19
Metadata within a file can tell a lot about you.
Cameras record data about when a picture was taken and what
camera was used. Office documents like PDF or Office automatically adds
author and company information to documents and spreadsheets.
20
Maybe you don't want to disclose those information.
jvoisin's avatar
jvoisin committed
21

georg's avatar
georg committed
22
This is precisely the job of mat2: getting rid, as much as possible, of
jvoisin's avatar
jvoisin committed
23
metadata.
jvoisin's avatar
jvoisin committed
24

25
26
27
mat2 provides a command line tool, and graphical user interfaces via a service
menu for Dolphin, the default file manager of KDE, and an extension for
Nautilus, the default file manager of GNOME.
28

jvoisin's avatar
jvoisin committed
29
30
31
32
33
# Requirements

- `python3-mutagen` for audio support
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
- `gir1.2-gdkpixbuf-2.0` for images support
jvoisin's avatar
jvoisin committed
34
- `gir1.2-rsvg-2.0` for svg support
atenart's avatar
atenart committed
35
- `FFmpeg`, optionally, for video support
jvoisin's avatar
jvoisin committed
36
- `libimage-exiftool-perl` for everything else
jvoisin's avatar
jvoisin committed
37
- `bubblewrap`, optionally, for sandboxing
jvoisin's avatar
jvoisin committed
38

georg's avatar
georg committed
39
Please note that mat2 requires at least Python3.5.
jvoisin's avatar
jvoisin committed
40

41
42
43
44
45
46
# Requirements setup on macOS (OS X) using [Homebrew](https://brew.sh/)

```bash
brew install exiftool cairo pygobject3 poppler gdk-pixbuf librsvg ffmpeg
```

atenart's avatar
atenart committed
47
# Running the test suite
jvoisin's avatar
jvoisin committed
48
49
50
51

```bash
$ python3 -m unittest discover -v
```
jvoisin's avatar
jvoisin committed
52

53
54
55
56
57
58
59
And if you want to see the coverage:

```bash
$ python3-coverage run --branch -m unittest discover -s tests/
$ python3-coverage report --include -m --include /libmat2/*'
```

georg's avatar
georg committed
60
# How to use mat2
jvoisin's avatar
jvoisin committed
61

62
63
64
```
usage: mat2 [-h] [-V] [--unknown-members policy] [--inplace] [--no-sandbox]
            [-v] [-l] [--check-dependencies] [-L | -s]
jvoisin's avatar
jvoisin committed
65
            [files [files ...]]
jvoisin's avatar
jvoisin committed
66
67
68
69

Metadata anonymisation toolkit 2

positional arguments:
jvoisin's avatar
jvoisin committed
70
  files                 the files to process
jvoisin's avatar
jvoisin committed
71
72

optional arguments:
jvoisin's avatar
jvoisin committed
73
74
75
76
  -h, --help            show this help message and exit
  -V, --verbose         show more verbose status information
  --unknown-members policy
                        how to handle unknown members of archive-style files
atenart's avatar
atenart committed
77
78
                        (policy should be one of: abort, omit, keep) [Default:
                        abort]
79
  --inplace             clean in place, without backup
80
  --no-sandbox          Disable bubblewrap's sandboxing
81
82
83
84
  -v, --version         show program's version number and exit
  -l, --list            list all supported fileformats
  --check-dependencies  check if mat2 has all the dependencies it needs
  -L, --lightweight     remove SOME metadata
georg's avatar
georg committed
85
  -s, --show            list harmful metadata detectable by mat2 without
jvoisin's avatar
jvoisin committed
86
                        removing them
jvoisin's avatar
jvoisin committed
87
88
```

georg's avatar
georg committed
89
Note that mat2 **will not** clean files in-place, but will produce, for
90
91
92
example, with a file named "myfile.png" a cleaned version named
"myfile.cleaned.png".

jvoisin's avatar
jvoisin committed
93
94
95
96
97
## Web interface

It's possible to run mat2 as a web service, via
[mat2-web](https://0xacab.org/jvoisin/mat2-web).

Romain's avatar
Romain committed
98
99
100
101
102
## Desktop GUI

For GNU/Linux desktops, it's possible to use the
[Metadata Cleaner](https://gitlab.com/rmnvgr/metadata-cleaner) GTK application.

103
104
105
106
107
108
109
# Supported formats

The following formats are supported: avi, bmp, css, epub/ncx, flac, gif, jpeg,
m4a/mp2/mp3/…, mp4, odc/odf/odg/odi/odp/ods/odt/…, off/opus/oga/spx/…, pdf,
png, ppm, pptx/xlsx/docx/…, svg/svgz/…, tar/tar.gz/tar.bz2/tar.xz/…, tiff,
torrent, wav, wmv, zip, …
  
110
111
# Notes about detecting metadata

georg's avatar
georg committed
112
113
While mat2 is doing its very best to display metadata when the `--show` flag is
passed, it doesn't mean that a file is clean from any metadata if mat2 doesn't
114
115
116
117
118
119
show any. There is no reliable way to detect every single possible metadata for
complex file formats.

This is why you shouldn't rely on metadata's presence to decide if your file must
be cleaned or not.

120
121
122
123
124
125
126
127
128
# Notes about the lightweight mode

By default, mat2 might alter a bit the data of your files, in order to remove
as much metadata as possible. For example, texts in PDF might not be selectable anymore,
compressed images might get compressed again, …
Since some users might be willing to trade some metadata's presence in exchange
of the guarantee that mat2 won't modify the data of their files, there is the
`-L` flag that precisely does that.

atenart's avatar
atenart committed
129
# Related software
jvoisin's avatar
jvoisin committed
130

131
- The first iteration of [MAT](https://mat.boum.org)
jvoisin's avatar
jvoisin committed
132
133
- [Exiftool](https://sno.phy.queensu.ca/~phil/exiftool/mat)
- [pdf-redact-tools](https://github.com/firstlookmedia/pdf-redact-tools), that
134
	tries to deal with *printer dots* too.
jvoisin's avatar
jvoisin committed
135
136
- [pdfparanoia](https://github.com/kanzure/pdfparanoia), that removes
	watermarks from PDF.
137
138
- [Scrambled Exif](https://f-droid.org/packages/com.jarsilio.android.scrambledeggsif/),
	an open-source Android application to remove metadata from pictures.
jvoisin's avatar
jvoisin committed
139
140
- [Dangerzone](https://dangerzone.rocks/), designed to sanitize harmful documents
  into harmless ones.
jvoisin's avatar
jvoisin committed
141

jvoisin's avatar
jvoisin committed
142
143
# Contact

jvoisin's avatar
jvoisin committed
144
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
jvoisin's avatar
jvoisin committed
145
or the [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev)
jvoisin's avatar
jvoisin committed
146
Should a more private contact be needed (eg. for reporting security issues),
147
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2@dustri.org`,
jvoisin's avatar
jvoisin committed
148
149
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.

jvoisin's avatar
jvoisin committed
150
151
152
153
# Donations

If you want to donate some money, please give it to [Tails]( https://tails.boum.org/donate/?r=contribute ).

jvoisin's avatar
jvoisin committed
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# License

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

169
Copyright 2018 Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org>  
georg's avatar
georg committed
170
Copyright 2016 Marie-Rose for mat2's logo
jvoisin's avatar
jvoisin committed
171

jvoisin's avatar
jvoisin committed
172
173
174
The `tests/data/dirty_with_nsid.docx` file is licensed under GPLv3,
and was borrowed from the Calibre project: https://calibre-ebook.com/downloads/demos/demo.docx

jvoisin's avatar
jvoisin committed
175
176
The `narrated_powerpoint_presentation.pptx` file is in the public domain.

jvoisin's avatar
jvoisin committed
177
178
# Thanks

georg's avatar
georg committed
179
mat2 wouldn't exist without:
jvoisin's avatar
jvoisin committed
180

jvoisin's avatar
jvoisin committed
181
182
- the [Google Summer of Code](https://summerofcode.withgoogle.com/);
- the fine people from [Tails]( https://tails.boum.org);
jvoisin's avatar
jvoisin committed
183
184
185
- friends

Many thanks to them!