April 1, 2020
Estimated Post Reading Time ~ 1 mins

What Happens When a PDF is Uploaded into CQ DAM

The following things happen when a PDF file named water_quality_in_Ontario.pdf is uploaded to /content/dam/pdf/ in CQ Digital Asset Manager (DAM):

1) The PDF is saved to /content/dam/pdf/water_quality_in_Ontario.pdf/jcr:content/renditions/original/

2) A “Node Created” event is fired by CQ's Java Content Repository (JCR)

3) The “workflow launcher” configured to listen to “Node Created” events in /content/dam(/.*/)renditions/original/ launches the workflow model called “DAM Update Asset”. See below for the serial worksteps in the workflow model “DAM Update Asset”.

8561434d3258a3f456a21fbbe243a3dba9c38c45

The “Continue Updating” step’s sole reason for existence is to check whether the workflow got kicked off because another version of the digital asset just got restored. If yes, the step will end the workflow right there and then.

4) The “Metadata extraction” workstep extracts the PDF document’s metadata and saves it to /content/dam/pdf/water_quality_in_Ontario.pdf/jcr:content/metadata

5) The “workflow launcher” configured to listen to “Node Modified” events in /content/dam(/.*/)metadata launches the workflow model called “DAM Metadata Writeback”. This creates additional See below for the serial worksteps in the workflow model “DAM Metadata Writeback”.

6e8e161136fd6c9f52d908898a80bd9dd5dc1c2a

6) The “Thumbnail Creation” workstep creates three thumbnails (renditions) of the following dimensions (these dimensions are editable properties of the workstep):

- [140 pixels : 100 pixels]
- [48 pixels : 48 pixels]
- [319 pixels : 319 pixels]

7) The rest of the worksteps are skipped as they only relate to images and InDesign documents

8) Behind the scenes, Apache Tika converts the contents of the PDF into plain text

9) Apache Lucene then indexes the extracted plain text so that full text search of the PDF document contents is possible.

All of the above will create the following JCR structure:

a8893481e4a888adb81c511a21fbd1c262d8d252

The “metadata” JCR node will have the following properties (the property values will of course vary with the document):

1268971b86c60a1818098694f8a7d16caf9a2b21

The node jcr:content under the “original” node will have a property called jcr:data of data type “Binary”. This property will hold the binary contents of the PDF. You can click the link “view” (property value) to download the PDF. See below:

343ac9a130cf8f60121b55657b05f87db5201293
aem4beginner.blogspot


By aem4beginner

No comments:

Post a Comment

If you have any doubts or questions, please let us know.

Ad Blocker Detected :(

Please consider supporting us by disabling your ad blocker.

Please Disable your adblocker and Refresh the page to view the site content.