OCR Data fields

This reference contains attribute descriptions of the ocr_object for each type of W-2 and 1099 form supported by Argyle's OCR for Documents solution.

General attributes


form_type string optional

The type of form that was processed with OCR.

Possible values: W-2, W-2c, 1099-NEC, 1099-SSA, 1099-MISC, 1099-INT, 1099-R, 1099-G, 1099-K.


omb_no string optional

The OMB form and information collection number of the specific form type that was processed with OCR.


year string optional

The calendar year the form is valid for.


recipient.name string optional

The employee’s legal name. All form types contain this field.


recipient.address object

Contains the employee’s address. All form types contain this field.

Data fields within the recipient.address object:

  • address.city
  • address.country
  • address.line1
  • address.line2
  • address.postal_code
  • address.state

recipient_tin string optional

The employee’s Taxpayer Identification Number. All form types contain this field.


payer.name string optional

The employer’s name. All form types contain this field, except for 1099-SSA.


payer.address object

Contains the employer’s address. All form types contain this field, except for 1099-SSA.

Data fields within the payer.address object:

  • address.city
  • address.country
  • address.line1
  • address.line2
  • address.postal_code
  • address.state

payer_tin string optional

The employer's Taxpayer Identification Number. All form types contain this field, except for 1099-SSA.


{
   "form_type":"1099-NEC",
   "omb_no":"1545-0116",
   "year":"2020",
   "form":{
      "recipient":{
         "name":"John Smith",
         "address":{
            "city":"Norton",
            "country":null,
            "line1":"4321 S. Jackson St Apt 987C",
            "line2":null,
            "postal_code":"45678",
            "state":"MA"
         }
      },
      "recipient_tin":"***-**-0193",
      "payer":{
         "name":"Peak Performance Publishing LLC",
         "address":{
            "city":"Ambler",
            "country":null,
            "line1":"118 Mary Ambler Way",
            "line2":null,
            "postal_code":"19002",
            "state":"PA"
         }
      },
      "payer_tin":"46-5237939"
   }
}

Quality attributes

The ocr_data object provides details on how successful the data retrieval through OCR was. Whenever the OCR process encounters an issue, it raises a warning. Some of these warnings may be errors, others are just information about a missing field in the document.

📘

The type of the warning is detailed in the warnings.message field and determines the severity of that warning. warnings.severity indicates whether a warning is considered an error or not. Based on the number of errors raised during the process, OCR will indicate an estimated level of parsing success in the confidence field.


warnings object

This object contains the list of warnings for fields where warnings were raised.


warnings.field_name string optional

Indicates the name of the field where a warning was raised.


warnings.message string optional

Includes the field_name and the type of warning that affects the field.

Possible values:

  • not found signifies that a field was not found within the document.
  • value empty/bad format signifies that the field was found, but the value was empty, or misread, or in an incorrect format.

These warning types are specifically used for the recipient_tin field to show the level of obfuscation of the Taxpayers Identification Number:

  • obfuscated is used when the value of recipient_tin is completely obfuscated.
  • obfuscated, 2 last digits given is used when the last 2 digits of recipient_tin are visible.
  • obfuscated, 4 last digits given is used when the last 4 digits of recipient_tin are visible.

These warning types are specifically used when numeric validation fails for fields that are known to have strictly numeric values. Each warning shows the affected field's name and the original value in that field. The OCR solution will attempt to remove non-numeric characters, and based on the result, it will raise these warnings:

  • <field_name> had non numeric chars removed, original value: <value> is used when the OCR solution successfully filtered the invalid characters from the numeric field.
  • <field_name> invalid - non numeric value, original value: <value> is used when the OCR solution did not succeed in filtering the invalid characters from the numeric field.

warnings.severity string optional

Indicates whether a warning is considered an error or not.

  • ERROR is shown as 1.0. For example, a not found warning is considered an error.
  • NOT ERROR is shown as 0.0. For example, a value empty/bad format, or an obfuscated warning are not considered errors.
  • UNKNOWN is shown as 0.5. UNKNOWN is used when a field or its value is not found, but it is not clear whether the information should be present in the document.

confidence string optional

Indicates an estimated level of parsing success, based on the number of errors found.

Possible values:

  • EXACT_MATCH - No error was found.
  • HIGH - 1 error was found.
  • MEDIUM - 2 errors were found.
  • LOW - 3 errors were found.
  • NO_MATCH - 4 or more errors were found, which usually means that the document is the wrong type.
{ 
   "warnings":[
      {
         "field_name":"recipient_tin",
         "message":"recipient_tin obfuscated, 2 last digits given",
         "severity":0.0
      },
      {
         "field_name":"benefits_3",
         "message":"benefits_3 had non numeric chars removed, original value: *1507.40",
         "severity":0.0
      },
      {
         "field_name":"state_inc_7",
         "message":"state_inc_7 value empty/bad format",
         "severity":0.0
      }
   ],
   "confidence":"EXACT_MATCH"
}

error string optional

When the OCR process fails, error field will return the specific type of error that happened.

Possible values:

  • type_mismatch - The document type the user selected while uploading does not match the uploaded document type. Example: the user selected W-2, but uploaded a 1099 document instead.

  • unrecognized_document_type - The user uploaded a document type that cannot be recognized as any of the currently supported document types. For example, a 1095 form would be considered unrecognized.

  • unsupported_document_subtype - The user uploaded a specific, currently unsupported subtype of a document type that is otherwise supported. For example, while the most common 1099 documents are supported, 1099-DIV document subtype is not currently supported

    Consult Supported document types to see all supported document subtypes.

  • invalid_document - The document is not valid. It could be incomplete, cropped incorrectly and missing the required fields, or not filled at all.

  • unknown - An internal issue occurred during document processing. Argyle is investigating.

{
   "ocr_data":{
      "error":"type_mismatch"
   }
}