diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md index bc836712ffdff89737ae500f6be7d05595a48607..b3856598bd30f7f0b59f3126b839507adaf3bd28 100644 --- a/doc/implementation_notes.md +++ b/doc/implementation_notes.md @@ -25,6 +25,10 @@ handle PDF. But apparently, people are ok with [pdf redact tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply transform the PDF into images. So this is what's MAT2 is doing too. +Of course, it would be possible to detect images in PDf file, and process them +with MAT2, but since a PDF can contain a lot of things, like images, videos, +javascript, pdf, blobs, … this is the easiest and safest way to clean them. + Images handling ---------------