2

I am extending the Apache Tika parser for DWG files so that I can parse text out of DWG files.

Currently someone else has written some basic metadata parsing which is great. But the newer versions of DWG are not yet supported.

I would like to add that support, and extend what is possible to parse.

I read around online and the Spec for DWG file format is proprietary as far as I can tell, so does that mean there is no open specification document?

Any help anyone can contribute to help me parse DWG files would be appreciated!

1 Answers1

2

While DWG files are a proprietary binary format by Autodesk, they gave some visibility at some point to the Open DesignAlliance and allowed them to make an open spec.

https://www.opendesign.com/files/guestdownloads/OpenDesign_Specification_for_.dwg_files.pdf

It's not perfect. You'll see situations where they weren't given enough information to prefect define something... such as:

We have occasionally seen other values here but their meaning (and importance) is unclear.

But it's better than nothing!