When reading any ZIP part, normalize the entry name and reject any containing .. or absolute paths. 5. Reference Implementation: A Secure Streaming Downloader We present a pseudo-code implementation for a REST endpoint that generates a simple .docx report from JSON data, using streaming and security best practices. 5.1 System Architecture [JSON Input] -> [Streaming XML Writer] -> [In-memory ZIP stream] -> [HTTP Response] ^ | | (direct write) v [Content Types & Relationships] [File Download] 5.2 Core Implementation (C#/.NET Core Example) [HttpGet("report.docx")] public IActionResult GenerateDocx(string title, string content)
| Method | Peak Memory (MB) | Time (s) | Max Concurrent Requests | | :--- | :--- | :--- | :--- | | (deprecated) | 1,200 | 62 | 2 (serialized) | | Open XML SDK + DOM | 890 | 28 | 8 | | Open XML SDK + Streaming (our method) | 230 | 22 | 35 | office open xml download
<!DOCTYPE doc [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <w:p><w:r><w:t>&xxe;</w:t></w:r></w:p> Always disable external entities and DTDs in your XML parser. When reading any ZIP part, normalize the entry
Critical note: Microsoft Office defaults to , whereas some open-source parsers prefer Strict . For maximum compatibility in download scenarios, target Transitional. 3. The "Download" Problem: Generation and Delivery 3.1 Two Primary Strategies | Strategy | Description | Pros | Cons | | :--- | :--- | :--- | :--- | | Template-based | Load a pre-created .docx template, replace placeholders (e.g., name ). | Preserves complex formatting. | Requires template management; large memory if using DOM. | | Programmatic build | Build XML trees (e.g., using DocumentBuilder libraries). | Full control; scalable. | Steeper learning curve for complex layouts. | 3.2 Performance Bottleneck: DOM vs. Streaming Most naive implementations load the entire document.xml into an XML DOM (Document Object Model). For a 50-page report, this may be ~10 MB; for a 500,000-row Excel sheet, this can exceed 2 GB of RAM. Set a maximum decompression ratio (e.g.
var stream = new MemoryStream(); using (var archive = new ZipArchive(stream, ZipArchiveMode.Create, true)) // 1. [Content_Types].xml var ctEntry = archive.CreateEntry("[Content_Types].xml"); using (var ctWriter = new StreamWriter(ctEntry.Open())) ctWriter.Write(@"<?xml version='1.0' encoding='UTF-8'?> <Types xmlns='http://schemas.openxmlformats.org/package/2006/content-types'> <Default Extension='rels' ContentType='application/vnd.openxmlformats-package.relationships+xml'/> <Default Extension='xml' ContentType='application/xml'/> <Override PartName='/word/document.xml' ContentType='application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml'/> </Types>"); // 2. Relationships (.rels) var relsEntry = archive.CreateEntry("_rels/.rels"); using (var relsWriter = new StreamWriter(relsEntry.Open())) relsWriter.Write(@"<?xml version='1.0'?> <Relationships xmlns='http://schemas.openxmlformats.org/package/2006/relationships'> <Relationship Id='rId1' Type='http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument' Target='word/document.xml'/> </Relationships>");
Office Open XML, OOXML, Document Generation, File Download, XML Security, ZIP Compression, REST API. 1. Introduction In enterprise web applications, generating downloadable office documents from structured data (e.g., invoices, reports, spreadsheets) is a ubiquitous requirement. Prior to OOXML, server-side generation often relied on binary formats ( .doc , .xls ) via COM interop (unreliable and non-scalable) or HTML-to-PDF converters (loss of semantic fidelity). The introduction of OOXML solved this by providing an open, royalty-free, XML-based standard.
Set a maximum decompression ratio (e.g., ZipFile.Extract with ExtractEntry limits). For generation, do not decompress untrusted archives. 4.3 Path Traversal in ZIP Entries Evil entries like ../../config/secret.xml inside a ZIP can overwrite files.