mat2 issueshttps://0xacab.org/jvoisin/mat2/-/issues2018-10-02T12:54:58Zhttps://0xacab.org/jvoisin/mat2/-/issues/72Should we warn about local references in office documents?2018-10-02T12:54:58ZjvoisinShould we warn about local references in office documents?I just stumbled upon a document with the following `word/_rels/settings.xml.rels` file:
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
...I just stumbled upon a document with the following `word/_rels/settings.xml.rels` file:
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship
Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate"
Target="file:///C:\DOCUME~1\easte\LOCALS~1\Temp\TCD59E.tmp\Employment%20application.dot"
TargetMode="External"
/>
</Relationships>
```
The `Target` attribute leaks some information by referring to a local file, and albeit those are not metadata, I think that it's worth discussing what mat2 should do about it.
In my opinion, MAT2 should remove the whole `Relationship`, since it's a dead link anyway.1.0 - Ponyhttps://0xacab.org/jvoisin/mat2/-/issues/91Add support for RTF files2019-02-21T00:19:36ZjvoisinAdd support for RTF filesApparently, [it's possible]( https://en.wikipedia.org/wiki/Rich_Text_Format ) to embed metadata in rtf files.Apparently, [it's possible]( https://en.wikipedia.org/wiki/Rich_Text_Format ) to embed metadata in rtf files.1.0 - Ponyhttps://0xacab.org/jvoisin/mat2/-/issues/117Change the way we're dealing with "backup" file2019-10-25T23:34:27ZjvoisinChange the way we're dealing with "backup" fileCurrently, when running `mat2` on `myfile.jpg`, two files are outputted:
- `myfile.jpg`: the original file
- `myfile.cleaned.jpg`: the cleaned file
I think that it would make more sense to have this instead:
- `myfile.jpg`: the cleaned ...Currently, when running `mat2` on `myfile.jpg`, two files are outputted:
- `myfile.jpg`: the original file
- `myfile.cleaned.jpg`: the cleaned file
I think that it would make more sense to have this instead:
- `myfile.jpg`: the cleaned file
- `myfile.jpg.bak`: the original file
The main drawback is that if the cleaning process fails, the user will be left with a `myfile.jpg.bak` and might wonder where their file is.
The reason why I'm suggesting this change is that some users have been confused by the current scheme, and I think that the new one makes more sense.
Any opinions?1.0 - Ponyhttps://0xacab.org/jvoisin/mat2/-/issues/118Randomize xml:id in LibreOffice documents2020-06-17T21:08:42ZgeorgRandomize xml:id in LibreOffice documentsReading #71, I learnt that LibreOffice has a similar problem, which we should probably take care of.
http://officeopenxml.com/WPnumbering.phpReading #71, I learnt that LibreOffice has a similar problem, which we should probably take care of.
http://officeopenxml.com/WPnumbering.php1.0 - Ponyhttps://0xacab.org/jvoisin/mat2/-/issues/152[PDF] Invalid xref table2022-09-13T02:15:02ZHabere Dispertire[PDF] Invalid xref tablePDF files generated from Tex (lualatex) are reporting issues after --lightweight cleaning, such as:
- exiftool: Warning : Invalid xref table
- verapdf: WARNING: hello-world.pdf doesn't appear to be a valid PDF.
- pdfcrop: Syntax Error: ...PDF files generated from Tex (lualatex) are reporting issues after --lightweight cleaning, such as:
- exiftool: Warning : Invalid xref table
- verapdf: WARNING: hello-world.pdf doesn't appear to be a valid PDF.
- pdfcrop: Syntax Error: Couldn't read xref table. Syntax Warning: PDF file is damaged - attempting to reconstruct xref table...
I'm using the brew version of mat2: 0.12.1. I have enclosed the source, a minimal working example and the resultant problematic file.
Thank you for this tool.
[hello-world.tex](/uploads/67e96ef69faea6009fd97e621004b6d7/hello-world.tex)
[hello-world.pdf](/uploads/940126c5d34c488a1a76001b01d4d381/hello-world.pdf)
[hello-world.cleaned.pdf](/uploads/f43babeeb71ed79676137271bc894d00/hello-world.cleaned.pdf)https://0xacab.org/jvoisin/mat2/-/issues/157Evaluate the relevance of mat2 wrt. the USA Library of Congress most used for...2023-05-03T20:42:27ZjvoisinEvaluate the relevance of mat2 wrt. the USA Library of Congress most used formatsThere is a [really nice paper]( https://osf.io/cxh9s/ ) ([local mirror](/uploads/726ca748875f2aaa54a01068c823cc09/39_Mark_Cooper_LP.pdf)) about the most used fileformats at the USA's Library of Congress. We should take a look at it, and ...There is a [really nice paper]( https://osf.io/cxh9s/ ) ([local mirror](/uploads/726ca748875f2aaa54a01068c823cc09/39_Mark_Cooper_LP.pdf)) about the most used fileformats at the USA's Library of Congress. We should take a look at it, and implement formats used there but not supported by mat2.
It boils down to:
- [ ] jp2
- [x] tif
- [x] jpg
- [ ] xml - we can't really support it
- [x] pdf
- [x] txt
- [x] gif
- [x] gz
- [ ] i41
- [ ] mxf
- [ ] mpg
- [ ] wav
- [ ] mov
- [ ] iso
- https://github.com/clalancette/pycdlib
- [ ] dv
- [x] gz
- [x] zip
- [ ] rar - python's library to handle this format, [rarfile](https://rarfile.readthedocs.io/api.html), doesn't provide enough control to remove all the metadata.
- [x] tarhttps://0xacab.org/jvoisin/mat2/-/issues/163Dolphin (KDE) right click menu on Manjaro doesn't work2022-01-10T23:15:26ZPol GZDolphin (KDE) right click menu on Manjaro doesn't workHi, guys
I've been a long time user of this usefull tool. I've recently installed manjaro KDE on my laptop and I can't get the right click menu "remove metadata" to work from the explorer.
I can prefectly run mat2 from the terminal and...Hi, guys
I've been a long time user of this usefull tool. I've recently installed manjaro KDE on my laptop and I can't get the right click menu "remove metadata" to work from the explorer.
I can prefectly run mat2 from the terminal and it works as expected. But it says it should be integrated with both dolphin and nautilus in KDE/GNOME, right?
I've asked in Manjaro matrix room, and another user instelled it and confirmed same behaviour: clicking on "remove metadata" doesn't seem to do anything. If you still list the file metadata with `mat2 -s` from the terminal, it still shows everything, and a `.cleaned` copy is not created (I know dolphin doesn't "refresh", but closing and opening it doesn't reveal any new file).
- 5.15.7-1-MANJARO
- mat2 0.12.2
- dolphin 21.12.0https://0xacab.org/jvoisin/mat2/-/issues/164ICO format support2022-03-16T19:35:13ZRomainICO format supportA user on Metadata Cleaner's matrix channel [asked about metadata in ICO files](https://matrix.to/#/!XDGcWYIURqLwjtlmIW:gnome.org/$bR5Rtm3k7KfkwcOHrDyGDv1gsujgX1aABPJdrir0hfA?via=matrix.org&via=gnome.org&via=t2bot.io) (the format used by...A user on Metadata Cleaner's matrix channel [asked about metadata in ICO files](https://matrix.to/#/!XDGcWYIURqLwjtlmIW:gnome.org/$bR5Rtm3k7KfkwcOHrDyGDv1gsujgX1aABPJdrir0hfA?via=matrix.org&via=gnome.org&via=t2bot.io) (the format used by Windows applications for their icons and websites for their favicons).
According to Wikipedia, the format can [store PNG images](https://en.wikipedia.org/wiki/ICO_(file_format)#PNG_format), so in theory metadata could be passed that way.
After some testing with [icoutils](https://www.nongnu.org/icoutils/), when creating an ICO from a PNG containing metadata with the raw option (`icotool -c -r source.png -o icon.ico`) and extracting it again (`icotool -x icon.ico -o extracted.png`), the metadata is still present in the extracted PNG. However, the generated ICO appears to be broken. Without the raw option, metadata is stripped, but the ICO works.
It may be possible for mat2 to use `icotool` to extract images from `.ico` and `.cur` files and create a new file from them, but that would require some testing and studying the command line options as the format can include several images at different sizes plus some metadata, different if the file is an icon or a cursor.https://0xacab.org/jvoisin/mat2/-/issues/165WebP support?2023-03-07T11:36:44ZRachel VeerWebP support?Came across a webp image I wanted *clean*, but mat2 says webp isn't supported. A simple workaround is saving in another format (ala .jpg/png) and cleaning from there, though I still wanted to inquire: will there be any webp support in th...Came across a webp image I wanted *clean*, but mat2 says webp isn't supported. A simple workaround is saving in another format (ala .jpg/png) and cleaning from there, though I still wanted to inquire: will there be any webp support in the future?https://0xacab.org/jvoisin/mat2/-/issues/166Keep links in PDF documents2022-03-16T19:35:34ZRomainKeep links in PDF documentsA user of Metadata Cleaner [reported that links are removed after cleaning](https://gitlab.com/rmnvgr/metadata-cleaner/-/issues/22).
Of course it is impossible to have links with the regular cleaning, but I wonder if there's a way with ...A user of Metadata Cleaner [reported that links are removed after cleaning](https://gitlab.com/rmnvgr/metadata-cleaner/-/issues/22).
Of course it is impossible to have links with the regular cleaning, but I wonder if there's a way with lightweight cleaning. Maybe by using [`Poppler.Page.render()`](https://poppler.freedesktop.org/api/glib/poppler-Poppler-Page.html#poppler-page-render) instead of `Poppler.Page.render_for_printing()`? I haven't tried it and don't know the implications of using it.https://0xacab.org/jvoisin/mat2/-/issues/169Processing of mp4 files with attached pictures fails + workaround2022-04-03T22:47:02ZDavid SchmidtProcessing of mp4 files with attached pictures fails + workaroundHey,
I just tried to use mat2 with some video files which (apparently) have an attached picture that causes mat2/ffmpeg to fail.
I am referring to mat2 0.12.3 with ffmpeg version n5.0 and Python 3.10.4 under Arch Linux with kernel 5.1...Hey,
I just tried to use mat2 with some video files which (apparently) have an attached picture that causes mat2/ffmpeg to fail.
I am referring to mat2 0.12.3 with ffmpeg version n5.0 and Python 3.10.4 under Arch Linux with kernel 5.17.1-zen1-1-zen.
Stream description: `Stream #0:15[0x0]: Video: bmp, bgra, 340x192, 90k tbr, 90k tbn (attached pic)`
Error message when loglevel panic is commented out:
```
Could not find tag for codec bmp in stream #15, codec not currently supported in container
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:15 --
```
I played around with the command in the video.py file and found a workaround fixing the issue for me. According to [the decumentation](https://ffmpeg.org/ffmpeg.html#toc-Stream-specifiers-1) the filter "’V’ only matches video streams which are not attached pictures" which is exactly what needs to be done to get rid of the issue. Therefore, I suggest to replace the line `'-map', '0',` in the command definition in video.py with
```
'-map', '0:V?', # copy video sections which are not attached pictures from input to output
'-map', '0:a?', # copy audio from input to output
'-map', '0:s?', # copy subtitles from input to output
'-map', '0:d?', # copy data from input to output
'-map', '0:t?', # copy attachments from input to output
```
as attached images might also be seen as metadata. Maybe even the last two lines should be left out as I am not sure video files with data or actual attachments will work...
However, thanks in advance and thanks for the great tool!
dvs23https://0xacab.org/jvoisin/mat2/-/issues/171Excel sheets' contents are being wiped out2022-04-20T07:59:59ZRJGCExcel sheets' contents are being wiped outI'm trying to clean the metadata from Excel files in Ubuntu 20 with Mat2 0.12.3, but the cleaned resulting files have not only the metadata but also the content of the sheets wiped out. If it's a three sheets excel, the three sheets will...I'm trying to clean the metadata from Excel files in Ubuntu 20 with Mat2 0.12.3, but the cleaned resulting files have not only the metadata but also the content of the sheets wiped out. If it's a three sheets excel, the three sheets will remain but they will be just empty.
I've tried also with the --lightweight tag but it didn't help. The --check-dependencies tag confirms everything is installed, including optional dependencies. --verbose doesn't give any info. It apparently creates successfully the .cleaned.xlsx file, but the data is gone. Always trying from console.
The weird thing is, that judging by the size of the metadata-wiped results, the data should be there. E.G. (with --lightweight), a 28kB .xlsx returns a 19kB .cleaned.xlsx. A 14kB one turns into 12kB cleaned.
I've also tried to save this cleaned XLSX files to a CSV to see if the data emerged but just got empty CSVs.https://0xacab.org/jvoisin/mat2/-/issues/172CR2 (RAW) image format support2022-04-19T22:40:16ZJay KokCR2 (RAW) image format supportIt would be nice to have support for Canon CR2 (mime-type: image/x-canon-cr2) and other RAW image formats. They usually contain a lot of sensitive information from camera (camera/lens type and serial numbers, GPS location, etc.).
Exiftoo...It would be nice to have support for Canon CR2 (mime-type: image/x-canon-cr2) and other RAW image formats. They usually contain a lot of sensitive information from camera (camera/lens type and serial numbers, GPS location, etc.).
Exiftool declares r/w support of the CR2 format (except of a message "Warning: [minor] Can't delete IFD0 from CR2").
Thanks for your valuable time and effort.https://0xacab.org/jvoisin/mat2/-/issues/173Things to do when we switch to Python3.82022-05-05T19:59:42ZjvoisinThings to do when we switch to Python3.8- Use [`typing.Final`]( https://docs.python.org/3/library/typing.html#typing.Final) for constants.- Use [`typing.Final`]( https://docs.python.org/3/library/typing.html#typing.Final) for constants.jvoisinjvoisinhttps://0xacab.org/jvoisin/mat2/-/issues/175Request to make mat2 installation easy for non technical Windows users and In...2022-08-16T13:01:41ZANISH M CODERequest to make mat2 installation easy for non technical Windows users and Installation docs for windows users.Hi , mat2 is mentioned as os independent in pypi , However while installing mat2 from pip in windows , it asks for Visual studio build tools. I installed Visual studio build tools in 2022 but still getting build error. The error log is a...Hi , mat2 is mentioned as os independent in pypi , However while installing mat2 from pip in windows , it asks for Visual studio build tools. I installed Visual studio build tools in 2022 but still getting build error. The error log is attached here. I am using python 3.10.6
```
error: subprocess-exited-with-error
Building wheel for PyGObject (pyproject.toml) did not run successfully.
exit code: 1
[41 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-310
creating build\lib.win-amd64-cpython-310\pygtkcompat
copying pygtkcompat\generictreemodel.py -> build\lib.win-amd64-cpython-310\pygtkcompat
copying pygtkcompat\pygtkcompat.py -> build\lib.win-amd64-cpython-310\pygtkcompat
copying pygtkcompat\__init__.py -> build\lib.win-amd64-cpython-310\pygtkcompat
creating build\lib.win-amd64-cpython-310\gi
copying gi\docstring.py -> build\lib.win-amd64-cpython-310\gi
copying gi\importer.py -> build\lib.win-amd64-cpython-310\gi
copying gi\module.py -> build\lib.win-amd64-cpython-310\gi
copying gi\pygtkcompat.py -> build\lib.win-amd64-cpython-310\gi
copying gi\types.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_constants.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_error.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_gtktemplate.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_option.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_ossighelper.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_propertyhelper.py -> build\lib.win-amd64-cpython-310\gi
copying gi\_signalhelper.py -> build\lib.win-amd64-cpython-310\gi
copying gi\__init__.py -> build\lib.win-amd64-cpython-310\gi
creating build\lib.win-amd64-cpython-310\gi\repository
copying gi\repository\__init__.py -> build\lib.win-amd64-cpython-310\gi\repository
creating build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\Gdk.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\GdkPixbuf.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\GIMarshallingTests.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\Gio.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\GLib.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\GObject.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\Gtk.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\keysyms.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\Pango.py -> build\lib.win-amd64-cpython-310\gi\overrides
copying gi\overrides\__init__.py -> build\lib.win-amd64-cpython-310\gi\overrides
running build_ext
pycairo: trying include directory: 'C:\\Users\\Ukraine\\AppData\\Local\\Temp\\pip-build-env-3f8eggr8\\overlay\\Lib\\site-packages\\cairo\\include'
pycairo: found 'C:\\Users\\Ukraine\\AppData\\Local\\Temp\\pip-build-env-3f8eggr8\\overlay\\Lib\\site-packages\\cairo\\include\\py3cairo.h'
building 'gi._gi' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for PyGObject
ERROR: Could not build wheels for PyGObject, which is required to install pyproject.toml-based projects
```https://0xacab.org/jvoisin/mat2/-/issues/179support for MKV ?2022-11-29T17:07:03Za beta-testersupport for MKV ?it looks like mat2 can not remove metadata from MKV files (or its included mp4 files)it looks like mat2 can not remove metadata from MKV files (or its included mp4 files)https://0xacab.org/jvoisin/mat2/-/issues/181Skip errors instead of exiting program2022-12-31T11:06:53ZMegamindSkip errors instead of exiting programI have came across a few instances (#180) where the program would throw an error for one attribute, but would otherwise be able to clean the many other ones. When this happens the program just exits. This can be a bit annoying, so I sugg...I have came across a few instances (#180) where the program would throw an error for one attribute, but would otherwise be able to clean the many other ones. When this happens the program just exits. This can be a bit annoying, so I suggest remove all possible metadata attributes but leave out and notify the user about ERRORs. This way all the possible metadata could be removed.2.0 - Eaglehttps://0xacab.org/jvoisin/mat2/-/issues/189Datetime fails2023-03-15T19:15:44ZMistress ChiefDatetime failsHey there! It's a wonderful project. I liked it.
Here's a small issue with dates:
https://0xacab.org/jvoisin/mat2/-/blob/master/libmat2/archive.py#L433
```python
if member.date_time != (1980, 1, 1, 0, 0, 0):
metadata['date_time'] ...Hey there! It's a wonderful project. I liked it.
Here's a small issue with dates:
https://0xacab.org/jvoisin/mat2/-/blob/master/libmat2/archive.py#L433
```python
if member.date_time != (1980, 1, 1, 0, 0, 0):
metadata['date_time'] = str(datetime.datetime(*member.date_time))
```
I just put the `try-except` block with some arbitrary value.https://0xacab.org/jvoisin/mat2/-/issues/190CI: reduce code duplication2023-03-19T18:31:42ZgeorgCI: reduce code duplicationgeorggeorghttps://0xacab.org/jvoisin/mat2/-/issues/194Non-utf8 PDF can't be lightweight-cleaned2023-09-07T14:51:11ZjvoisinNon-utf8 PDF can't be lightweight-cleanedA user reported privately that the following file can't be cleaned: [fail_b_올해_상반기_한국_입국_탈북민_99명-향후_입국_추이_지켜봐야____RFA_자유아시아방송_Safari.pdf](/uploads/b35f7f486813a9b232d71323810b4b5b/fail_b_올해_상반기_한국_입국_탈북민_99명-향후_입국_추이_지켜봐야____RFA_자유아시아방송_...A user reported privately that the following file can't be cleaned: [fail_b_올해_상반기_한국_입국_탈북민_99명-향후_입국_추이_지켜봐야____RFA_자유아시아방송_Safari.pdf](/uploads/b35f7f486813a9b232d71323810b4b5b/fail_b_올해_상반기_한국_입국_탈북민_99명-향후_입국_추이_지켜봐야____RFA_자유아시아방송_Safari.pdf)
```console
$ mat2 ./mat2 -L ./fail\ b\ 올해\ 상반기\ 한국\ 입국\ 탈북민\ 99명-향후\ 입국\ 추이\ 지켜봐야”\ —\ RFA\ 자유아시아방송\ Safari.pdf
[-] ./fail b 올해 상반기 한국 입국 탈북민 99명-향후 입국 추이 지켜봐야” — RFA 자유아시아방송 Safari.pdf can't be cleaned: input string not valid UTF-8
[255]
$
```
It seems that cairo/poppler do not like CJK stuff, sigh.