Skip to main content

Extract Metadata Using Smart Scan When Uploading a PDF

Learn how to extract available metadata from a PDF while uploading it using the 'Smart Scan' option.

Updated yesterday

If you upload a PDF, you can extract the available metadata from its text content using the 'Smart Scan' option and update it as attributes for the selected file in your Asite project. 'Smart Scan' stands for Smart Optical Character Recognition technology. It helps you extract text from a PDF and saves time and effort by reducing or eliminating the need to enter information manually.

To extract text while uploading a PDF, you will use a rectangle shape and choose an area of the file containing the required text. You will then match the text value extracted from that area on the PDF with the text attribute you want to apply to the selected file.

While using the 'Smart Scan' feature, there are some pointers that you need to consider to get the desired output from your file content:

  1. Supported Formats

    • Only the PDF file format is supported.

  2. Minimum Character Selection

    • Extraction requires selecting more than two characters.

  3. Font Simplicity

    1. Use simple, standard fonts such as Times New Roman. The recommended text size is 10 points minimum.

  4. Small Text Selection

    • Small text selections may result in incorrect extractions.

  5. Rotation Consistency

    • All documents must share the same rotation angle.

  6. Unrotated Text

    • The text for extraction must be horizontal.

  7. Zooming for Clarity

    • You must zoom in until the text is visible.

  8. Image Quality

    • Low-resolution images can result in poor text extraction. Please ensure that images in the file are visible and of high quality, with good resolution. The recommended resolution is a minimum of 300 DPI (dots per inch).

  9. Uniform Document Dimensions

    • All documents must have consistent height and width.

  10. Lighting Conditions

    • Poor lighting, shadows, or overexposure in scanned images may lead to errors in text recognition. Please ensure the image is evenly lit.

Below are the steps to extract metadata using the 'Smart Scan' option when publishing a PDF:

  1. You can choose a PDF from your local device storage with the required metadata in the text content of each file, and follow the steps to publish the selected file.

    • Select only one PDF file to get the 'Smart Scan' option.

  2. On the 'Upload' window, click the 'Smart Scan' option at the bottom right.

  3. A layered screen titled 'Smart Scan' appears.

    It lets you view the file you selected for publishing, along with a list of applicable text attributes displayed on the right side. By default, the file is open for viewing.

    • Click the 'Right' arrow icon on the left-hand side to see the left panel containing details of the PDF selected for publishing. To hide the left panel, click the 'Left' arrow icon on the left-hand side.

    • Below is a description of the available tools for Smart Scan and Smart Split:

      • Smart Scan

        • The 'Hand' tool is for dragging/moving the file across the frame.

        • The 'Rectangle' tool can be used to map an area. To use, click the tool and drag it across the part of the file.

      • Smart Split

        • The 'Split' tool splits a bundled PDF into individual documents using the document metadata.

  4. Select the 'Rectangle' icon in the bottom bar. Then, choose the area of the file page open in the viewer containing the required text attribute information similar to the example shown below.

    • Any text in the selected area for text extraction from the file page should be clear and legible to avoid any issues with text extraction.

    • For better clarity, you can zoom in on the file page by clicking the 'Plus' icon at the bottom if required.

    In this example, we have chosen the area of the file page containing the 'Vision' word.

    • If valid text content can't be found in the selected area of the file, an error message appears accordingly.

  5. A prompt appears. Select and confirm a file attribute to map with the text extracted from the chosen area of the file.

    In this example, we have selected 'Doc Ref' as the file attribute to map.

  6. Click 'Assign' to assign the extracted text value from the chosen area of your file to the selected attribute for the file under publishing. Otherwise, click 'Cancel' to go back to the previous screen.
    The text extracted from your chosen area of the file page is mapped to the selected file attribute and displayed in the right panel.

    As you can see in the above image, the 'Doc Ref' value has been updated to the word 'Vision'. It matches the text value we selected, as shown in step 4 in our example above.
    For system attributes of a file, such as 'Status' and 'Purpose of Issue', if the provided attribute text value doesn't match any available values in the project configuration, a new value is added based on the provided text, depending on your access. For example, if 'Work In Progress' is the provided text value for 'Status' and if it is not configured as a file status in the project configuration at the time of upload, then 'Work In Progress' will be created as a new file status automatically in the selected project.

    Follow similar steps to update the required value for the remaining attributes. Using the rectangle shape, select the area of the file page and map it with a chosen related file attribute.

    • If the extracted text output for the mapped attribute is not as expected, you can remove the current mapping by clicking the 'Cross' icon next to the attribute field in the right panel. You can zoom in again, use the rectangle shape, and try choosing the related file area more precisely to get the required value.

    If you want to split the pages and create more PDF files from the selected file based on a chosen mapped attribute, click the 'Split File' icon at the top right. It opens a 'Smart Split' screen similar to the one below.

    If multiple attributes are mapped, you must select an attribute based on which you want to split the file.

    You can click 'Confirm' once done.
    From the left panel on the 'Smart Split' screen, you can see the thumbnail for each page from the selected file with its mapped text value and page size. You can also click each thumbnail to view the pages in the chosen file as needed.
    From the top right, click the 'Confirm Split' option to continue scanning the selected file or the 'Cancel Split' option to cancel scanning the selected file.
    Once you click 'Continue', you are directed back to the 'Smart Scan' screen. The pages from the selected file are split and created as new PDF files. They will appear in the left panel under the 'Unconfirmed' tab.

    You can view each split file from the left panel by clicking the file names individually. Alternatively, you can use the 'Next File' and 'Previous File' links at the top right.
    The files are categorised as follows:

    • The 'File with the Zip' icon represents the main parent file.

    • The 'File' icon represents the split files.

    Mouse over the yellow exclamation icon in the file list to view the status of the corresponding file.
    You can apply the attribute choices captured in the original document to the rest of the split files. To do this:

    • Click on the 'Edit' icon next to 'Split Files'.

    • In the edit view, choose the target files to extract attributes to.

    Apply Attributes to Selected Files

    • After selecting the files, click 'Extract' at the bottom of the panel.

    • Once the extraction process is complete on all files, click 'Done'.

    Review and Confirm Attribute Assignments

    • After the attributes are applied, the files appear in the left-hand panel.

    • Review the attributes for each file. If correct, click 'Confirm' to finalise.

  7. The files with confirmed attributes will appear in the 'Confirmed' tab in the left panel.

    • An error message appears if no valid text is found in the chosen area for the selected files. You can fix such errors by going to each file individually and choosing the area more precisely with valid text content as required. Once fixed, follow the same steps for mapping and try again.

    All files with confirmed attributes appear in the 'Confirmed' tab in the left panel, while the files for which attributes can't be confirmed appear in the 'Unconfirmed' tab.

  8. Once the required attributes are updated and file splitting is complete (if applicable), you can continue the file upload process by clicking the 'Continue to Upload' icon at the top right. Otherwise, click the 'Cancel' icon to cancel the Smart Scan process and go to the previous 'Upload' screen.

  9. If you continue to upload, based on the attributes updated in the Smart Scan process, the 'Upload' window displays the assigned file attributes, and you can continue the upload process.

    • After you hit the 'Continue to Upload' icon, you won't be able to return to the 'Smart Scan' screen since you have already applied the attributes using the Smart Scan process. However, you can cancel the file upload from the 'Upload' screen and start again if needed.

FAQs

Q1. Using the Smart Scan feature, how many attributes can I update in one go?

A1. You can update one attribute at a time based on the extracted text from your selected area. You can continue to update other attributes by following similar steps after selecting the relevant area each time and updating the required value for the remaining attributes.

Q2. Why can't I see the 'Smart Scan' option when using the drag-and-drop method to upload a PDF?

A2. This may happen if the 'Simple Upload' option is enabled in the folder selected for uploading the file, and no mandatory attributes are defined on the folder. However, in such scenarios, you can use the 'Files' option from the 'Upload' menu at the top right corner of the files listing to access the 'Smart Scan' option.



Did this answer your question?