How should I structure my tabular data before uploading it to the platform?

Modified on Thu, 17 Aug 2023 at 10:00 AM

Although the platform supports multiple file formats, the uploaded file must follow a structure in order for models to be successfully generated. Machine Learning tabular guidelines are not always compatible with typical engineering tabular practices. Ideally, the data should be uploaded as Tabular of 3D data. More details on file formats can be found here.

Requirements

When uploading tabular data to Monolith’s platform, please make sure it satisfies the following requirements.

General

  • Each column should only contain one variable
  • Rows should contain indices of experiment or sequence stamp (e.g. test number, simulation ID, time)
  • Each column of the dataset should be classified as a type of data (e.g. string, float, etc.), thus it is not recommended to have more than one data type in the same column
  • When using categorical data (i.e. strings), consistency is key. The platform assimilates “spaces” and is case sensitive, so “MonolithAI”, “Monolith AI” and “Monolith ai” are all classified as different categories
  • The labelling of columns should only be 1 row (i.e. nested categorization should not be used in labelling)

Excel files

  • The uploaded sheet should not have merged cells
  • Font colour, font size and font type will not affect the data
  • For Excel file: In case there are multiple sheets on the file being uploaded, only the first sheet will be uploaded
  • Uploaded files should not contain images

Examples

The examples below highlight in red regions of the data that do not satisfy the requirements:

  • Nested categorization of columns:

    Front sideRear side
    TimeHeightForceTempHeightForceTemp
    0.898260434149.1872944161062.50950145102.9695565143569.99225872
    0.06824658126.9437422174369.81139095100.1677129171972.39496438
    0.223245541131.2395906170470.92918715104.4267423135579.02104831
    0.198662073138.8477895134765.37057207146.4874195138062.5873467
    0.584823038132.2786137140670.06819137131.7884798176561.55247709
  • Cells merged (and missing data):
    TimeF_HeightF_ForceF_TempR_HeightR_ForceR_Temp
    0.383839842121.1055164168863.7233866101.46120841496
    0.484601597125.11046641638141.86516381495
    0.830386192103.66756871773136.6600568179860.14558905
    0.93157364122.41552211791137.486305133878.97866845
    0.067163228105.3975624139272.02537429139.48317241501
  • Inconsistent strings in a categorical column:
    FailureHeightForceTemp
    Compression138.5704731136276.61910893
    compression117.1162992142262.95551895
    Tension108.2897529147574.16961723
    in compression104.5295069167761.75505769
    tension123.3339419177078.61567623
  • The example below is typical format of software outputs. However, for the platform each column must be a variable and rows must be a sequential stamp (e.g. time) or different experiments.
    Step 1
    Time0.72615801
    Height104.614631Force:1756Temp:72.38695876
    Step 2
    Time0.452290895
    Height139.1372833Force:1442Temp:74.92448148
    Step 3
    Time0.624463615
    Height147.5907217Force:1330Temp:64.89705344
    Step 4
    Time0.323668304
  • The example below follows the correct structure that data should be uploaded to the platform
    TimeF_HeightF_ForceF_TempR_HeightR_ForceR_Temp
    0.519042911123.9004088156771.39432069120.7590165167970.3345152
    0.902826988124.4544405138261.637524121.0811838147064.33987705
    0.869587281109.1762983134274.69825678119.9179117170762.49954938
    0.083198722145.9851669140760.32229119144.0284513136772.07843594
    0.18762859122.550792158465.540395110.247237177867.59115131

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article