@joe pointed out that MAT2 doesn't handle (as in "remove") revisions from office files.
What should we do about this? Shall we keep the revisions and pretend that they are data, or shall we only keep the latest one?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I asked some colleagues (My dayjob is something like "computer security consultant".) during lunch today, and we thought that according to the principle of least astonishment, MAT2 must remove the revisions.
Either our users aren't aware of the revisions, are thus they should be deleted. Think about journalists that will edit a document to erase sources mentions.
Or they are aware of it, and will likely not expect MAT2 to be able to keep the revisions, that are basically traces about how, when and who edited the document.
(Please don't even think at hacking something ugly like rebasing revisions to Epoch and increment the timestamp by one second for each of them.)
The changes are tracked in the content.xml file for libreoffice document, inside the <text:tracked-changes> element:
<?xml version="1.0" ?><office:document-contentoffice:version="1.2"xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0"xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"xmlns:css3t="http://www.w3.org/TR/css3-text/"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:dom="http://www.w3.org/2001/xml-events"xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"xmlns:drawooo="http://openoffice.org/2010/draw"xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0"xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0"xmlns:grddl="http://www.w3.org/2003/g/data-view#"xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0"xmlns:math="http://www.w3.org/1998/Math/MathML"xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"xmlns:officeooo="http://openoffice.org/2009/office"xmlns:ooo="http://openoffice.org/2004/office"xmlns:oooc="http://openoffice.org/2004/calc"xmlns:ooow="http://openoffice.org/2004/writer"xmlns:rpt="http://openoffice.org/2005/report"xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"xmlns:tableooo="http://openoffice.org/2009/table"xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"xmlns:xforms="http://www.w3.org/2002/xforms"xmlns:xhtml="http://www.w3.org/1999/xhtml"xmlns:xlink="http://www.w3.org/1999/xlink"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><office:scripts/><!-- blablabla --><office:body><office:text><office:formsform:apply-design-mode="false"form:automatic-focus="false"/><text:tracked-changes><text:changed-regiontext:id="ct94517144844320"xml:id="ct94517144844320"><text:insertion><office:change-info><dc:creator>Unknown Author</dc:creator><dc:date>2018-06-22T21:31:15</dc:date></office:change-info></text:insertion></text:changed-region><text:changed-regiontext:id="ct94517132807168"xml:id="ct94517132807168"><text:deletion><office:change-info><dc:creator>Unknown Author</dc:creator><dc:date>2018-06-22T21:31:15</dc:date></office:change-info><text:ptext:style-name="P1">Hello World</text:p></text:deletion></text:changed-region></text:tracked-changes><text:sequence-decls><text:sequence-decltext:display-outline-level="0"text:name="Illustration"/><text:sequence-decltext:display-outline-level="0"text:name="Table"/><text:sequence-decltext:display-outline-level="0"text:name="Text"/><text:sequence-decltext:display-outline-level="0"text:name="Drawing"/></text:sequence-decls><draw:framedraw:name="graphics1"draw:style-name="fr1"draw:z-index="0"svg:height="0.4047in"svg:width="0.4063in"svg:x="3.7346in"svg:y="1.2354in"text:anchor-page-number="1"text:anchor-type="page"><draw:imageloext:mime-type="image/png"xlink:actuate="onLoad"xlink:href="Pictures/100000000000003200000031BCB5162265471AD2.png"xlink:show="embed"xlink:type="simple"/></draw:frame><text:ptext:style-name="P1"><text:change-starttext:change-id="ct94517144844320"/><text:spantext:style-name="T1">This is an edited document, you shouldn’t be able to see an other text than this one.</text:span></text:p><text:ptext:style-name="P1"><text:change-endtext:change-id="ct94517144844320"/><text:changetext:change-id="ct94517132807168"/></text:p></office:text></office:body></office:document-content>
Now I need to learn some xpath-fu to select and remove this element using one of python's xml libraries.
<?xml version="1.0" ?><w:documentmc:Ignorable="w14 wp14"xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"xmlns:o="urn:schemas-microsoft-com:office:office"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:v="urn:schemas-microsoft-com:vml"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"xmlns:w10="urn:schemas-microsoft-com:office:word"xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"><w:body><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:insw:author="Unknown Author"w:date="2018-06-28T23:47:58Z"w:id="0"><w:r><w:rPr/><w:txml:space="preserve">This is a </w:t></w:r></w:ins><w:insw:author="Unknown Author"w:date="2018-06-28T23:48:00Z"w:id="1"><w:r><w:rPr/><w:t>modified text. You should only see this one and nothing else.</w:t></w:r></w:ins></w:p><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:delw:author="Unknown Author"w:date="2018-06-28T23:47:58Z"w:id="2"><w:r><w:rPr/><w:delText>I'm a text document : please love me.</w:delText></w:r></w:del></w:p><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:delw:author="Unknown Author"w:date="2018-06-28T23:47:58Z"w:id="3"><w:r><w:rPr/></w:r></w:del></w:p><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:r><w:rPr/></w:r></w:p><w:sectPr><w:typew:val="nextPage"/><w:pgSzw:h="15840"w:w="12240"/><w:pgMarw:bottom="1134"w:footer="0"w:gutter="0"w:header="0"w:left="1134"w:right="1134"w:top="1134"/><w:pgNumTypew:fmt="decimal"/><w:formProtw:val="false"/><w:textDirectionw:val="lrTb"/><w:docGridw:charSpace="0"w:linePitch="100"w:type="default"/></w:sectPr></w:body></w:document>
I have no idea how we could handle this to be honnest, except maybe trashing the delText items?
LibreOffice can do that, but I'll will leak the number of revisions:
<?xml version="1.0" ?><w:documentmc:Ignorable="w14 wp14"xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"xmlns:o="urn:schemas-microsoft-com:office:office"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:v="urn:schemas-microsoft-com:vml"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"xmlns:w10="urn:schemas-microsoft-com:office:word"xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"><w:body><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:r><w:rPr/><w:t>This is a modified text. You should only see this one and nothing else.</w:t></w:r></w:p><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:r><w:rPr/></w:r></w:p><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:r><w:rPr/></w:r></w:p><w:p><w:pPr><w:pStylew:val="Normal"/><w:rPr/></w:pPr><w:r><w:rPr/></w:r></w:p><w:sectPr><w:typew:val="nextPage"/><w:pgSzw:h="15840"w:w="12240"/><w:pgMarw:bottom="1134"w:footer="0"w:gutter="0"w:header="0"w:left="1134"w:right="1134"w:top="1134"/><w:pgNumTypew:fmt="decimal"/><w:formProtw:val="false"/><w:textDirectionw:val="lrTb"/><w:docGridw:charSpace="0"w:linePitch="100"w:type="default"/></w:sectPr></w:body></w:document>