Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted from v. 21

...

File Format: Allows you to configure the file format and source schema. Select the required file format from the dropdown.

Info

Once the File Format is selected, all the related configurations will appear.

  • Parquet

Parquet:
Expand
titleFor Example: Parquet, Delimited, Json
Panel
panelIconId1f517
panelIcon:link:
panelIconText🔗
bgColor#FFFFFF
A parquet file is a type of column-oriented file format where the data in each column is stored independently. It contains a table-type format to store the data.
panelImage AddedpanelIconId
  • Delimited

1f517Delimited: sequential file with column delimiters. The lines in the file represent rows, and the columns are created by separating the values on each line by a specific character, like a Pipe.
Expand
panelIcon:link:
panelIconText🔗
bgColor#FFFFFF
titleA delimited file is a
Panel
panelIconId1f517
panelIcon:link:
panelIconText🔗
bgColor#FFFFFF

Json: : Plain text file is written in the syntax of JSON

  • Parquet

Expand
titleA parquet file is a type of column-oriented file format where the data in each column is stored independently. It contains a table-type format to store the data.
Image Removed
  • Delimited

Image Removed

Header Row:

Delimiter:

Custom Delimiter:
Expand
titleA parquet file is a type of column-oriented file format where the data in each column is stored independently. It contains a table-type format to store the data.
text file format where the data in each column is seperated by a delimiter such as comma, tab, pipe etc.
Image Added

Header Row: Allows to specify that the file's first row data must be considered a header row. Data available in the header row separated by delimiter are column names.

Delimiter: Delimiter is a character separator that separates the values stored in rows. Allows to define the predefined and custom delimiter for the data. Select the required delimiter from the dropdown.

  • Comma: Allows to select a Comma (,) delimiter for the data.

  • Tab: Allows to select a Tab ( ) delimiter for the data.

  • Custom: Allows to define the custom delimiter. For Example: Semicolon, Colon, Pipe, Forward Slash.

Info

Selection of Custom enables the Custom Delimiter.

Custom Delimiter: Allows to define a new custom delimiter for the data.

Selection of Custom from the Delimiter dropdown is mandatory to define the custom delimiter.

Anchor
Escape-Character
Escape-Character
Escape Character: An Escape character and a Text Qualifier create a sequence recognized and ignored during parsing. Its purpose is to allow the use of escape sequences in the data that would otherwise be seen as delimiter occurrences.

Info

The character preceded by a backslash (\) is known as an escape character.

An The use of escape character will come into the picture when there is a delimiter in the data. It is used to escape the delimiter onlyminimises the need of switching quotation marks for enclosing the strings, which contains special punctuation marks.
So you can use any quotation marks to enclose your string and escape the collision that comes in the middle by using escape character.

It also avoids the delimiter collision.

Anchor
Text-Qualifier
Text-Qualifier
Text Qualifier: Text qualifier is used in the event that delimiters are contained within the row cell. If the cell contains a delimiter and a text qualifier is not used, then the data that occurs after the delimiter will spill into the next column.

Info

The Text Qualifiers are single (' ') or double (“ “) quotation marks around data elements to identify the element as text- or character-based data.

A text qualifier is a character used at the beginning and end of a field value.

...

Note

If the data has used single or double quotation marks to enclose the delimiter character or any special character, same must be defined in the text qualifier.

  • JSON

Expand
titlePlain text file is written in the syntax of JSON
Image Added

Supported Structure

  1. Characteristics of supported JSON is one record per row.

Code Block
{"first_name": "Bradley", "priority": 1683, "subscribe": true, "income": 955289.05, "address": {"City": "Nicolestad", "State": "Massachusetts"}, "countries_visited": ["Turks and Caicos Islands", "Spain", "New Caledonia"], "date_of_birth": "1988-02-19 00:00:00", "null_key": null}
{"first_name": "Jennifer", "priority": 2756, "subscribe": true, "income": 15248.17, "address": {"City": "Burnsborough", "State": "Idaho"}, "countries_visited": ["Mauritania", "Turkey", "Guinea"], "date_of_birth": "1994-08-31 00:00:00", "null_key": null}
{"first_name": "Tyler", "priority": 2628, "subscribe": false, "income": 248173.49, "address": {"City": "Ericahaven", "State": "California"}, "countries_visited": ["Sudan", "Afghanistan", "Chad"], "date_of_birth": "1978-06-30 00:00:00", "null_key": null}
{"first_name": "Lisa", "priority": 1518, "subscribe": false, "income": 338300.85, "address": {"City": "Tracyton", "State": "Oklahoma"}, "countries_visited": ["Honduras", "Samoa", "Congo"], "date_of_birth": "1991-08-06 00:00:00", "null_key": null}
{"first_name": "William", "priority": 1714, "subscribe": false, "income": 950738.18, "address": {"City": "Lake Tina", "State": "Nevada"}, "countries_visited": ["Seychelles", "Vietnam", "Lebanon"], "date_of_birth": "1981-02-09 00:00:00", "null_key": null}

This is a widely used format for data ingestion.

Not Supported Structure

  1. JSON files with formatted records (that span over multiple rows).

    Code Block
    {
       "first_name":"Rachel",
       "priority":2619,
       "subscribe":false,
       "income":435324.12,
       "address":{
          "City":"Smithstad",
          "State":"Michigan"
       },
       "countries_visited":[
          "Belize",
          "Eritrea",
          "Egypt"
       ],
       "date_of_birth":"1976-06-19 00:00:00",
       "null_key":null
    }
  2. JSON contains data in arrays.

    Code Block
    [{"id":1,"name":"John Doe","email":"john.doe@example.com"},{"id":2,"name":"Jane Doe","email":"jane.doe@example.com"},{"id":3,"name":"Mike Smith","email":"mike.smith@example.com"}]
    [{"id":7,"name":"Peter Green","email":"peter.green@example.com"},{"id":8,"name":"Susan Black","email":"susan.black@example.com"},{"id":9,"name":"Michael White","email":"michael.white@example.com"},{"id":10,"name":"Jessica Green","email":"jessica.green@example.com"}]

Upload: Click the Upload button to upload the data file. Select the required file needed to upload and click Upload.

Note

The selected file size must be less than 2MB.

We recommend that the user must avoid uploading the Gzip file unless it has been decrypted.

Info

The View Batch Schema button gets enabled after uploading the selected file.

...

Column list: As required, you can select or deselect the parent column name list required columns of the left-hand side drawer.

By default, all the columns available in the source data format are selected.

Selection of atleast one column is mandatory to save the schema in the batch listener configuration.

In case of nested data structures, only the parent column is displayed on the left side.

JSON: JSON schema will update dynamically when columns/keys are selected or deselected from the left-hand side drawer. It displays the standardized schema structure that needs to be mapped in the project schema.

...

  • After changes, copy the created entire schema and paste it into the defined project schema at the right-hand side drawer.

  • Click Save Changes to save the configuration.

...

Related Topics

Batch Listener

Parsing Inputs

Path Pattern (Regex)

Interval

Output