Option / help request: "--keep-essential-metadata" or "--extralightweight" mode for specific use cases
Could a mat2 --keep-essential-metadata
option be introduced (for MAT2 4) that only strips (direct) user identifiers and preserves some functionality and information that some users might want to keep: creation and last redaction date, the ability to continue using the track changes function.
The background is a discussion about use cases for MAT2 in archives, where anonymisation of records is sometimes necessary, but also preservation of material features of these records is a goal: see discussion here: dkg/mat2#1 (closed)
As after five or more years the technical context (the concrete machine and data context) that could lead to identification does (probably) not exist anymore, preservation of basic functionality (RSID tags) and metadata that places the record historically (creation, last modified) becomes a valuable and legitimate resource for historical research (see "Good" reasons: https://33bits.wordpress.com). This would be a much lower level of anonymisation than what MAT2 usually does and it would certainly not prevent identification by mosaic effects or other indirect identification via context (which in the archive situation is in most cases not reconstructable). Total metadata erasure is not the focus in the archive, for various reasons it is just the right level of protection in such a case.
Could this be done? I am working with an archive that has large born-digital assets and I would like to propose MAT2 as a standard tool to anonymise their records so researchers can access them. Giving the archive the option to give researchers either completely anonymised (default) or anonymised with essential metadata (specific research interest) would make it easier for the archivists.
Related help request:
I have to make this happen for myself in my research project as well. I already have, for my use case, stripped MAT2 4.0 (which works best, producing docx output atht opens without error messages), of its RSID removal function. Could someone help me with a code snippet that preserves the abovementioned metadata (revision count, creation, last modified date) in core.xml? MAT2 4.0 scrubs the content of this file completely and I grapple with repurposing the regex function I found elsewhere in the code.