Hi,
I have books in pdf format which was processed using OCR. For each page of the book i have hocr file and image file. I am going to use Miraror as previewer with IIIF support. What is the recommended approach to store these files ? I read that InvenioRDM record limite the number of files to 100. The manifiest.json and the pdf will be stored as records. However, I am not sure how to store the images and hocr fiels . Could you advice about this ? What are the compromises if I don’t store hocr and image files as records ?
Hi AlABarazi - welcome and sorry not to have spotted your message sooner.
This is all very possible, but still requires a little bit of customisation. I have set up a number of instances with Mirador previewing image content via RDM’s IIIF implementation.
You can easily increase the maximum number of files with these settings in your invenio.cfg file, these are the 2 settings you need -
RDM_RECORDS_MAX_FILES_COUNT = 500
APP_RDM_DEPOSIT_FORM_QUOTA['maxFiles'] = RDM_RECORDS_MAX_FILES_COUNT
As you may know, if you have images attached to a record, RDM will automatically generate an IIIF manifest for them, which is perfect for Mirador. However, currently RDM doesn’t include a Mirador previewer, but you can make a custom previewer to embed it, I can give you some pointers on this.
Please feel reach out to me on the Invenio Discord channel.
Hi Dan,
Thank you! I figured out these settings after some trial and error experiments now I can upload the files to the record. We are planning to use the pdf file directly with Cantaloupe instead of converting the pdf to images. Cantaloupe can do this on the fly directly with pdf file. It is going to be slower than IIPimage but our pdf files are produced by us (unified pdfs) so they are not going to be large files.
I have developed Mirador plugin previewer for InvenioRDM but it is not ready yet for production.
Could you please refer me to some documentation about RDM manifest generation process after uploading the images ?
P.S I sent you friend request on Discord.
Best regards,
Ala