Data Ingestion Best Practices
Overview
Optimized data management is at the core of every successful e-commerce operation. For fabric users managing extensive product catalogs, finely tuned data ingestion is paramount.
Adhering to fabric’s best practices will ensure the fastest processing speed, optimum resource management, and enhanced accuracy when importing your data.
This topic covers subjects such as file size restrictions, types of import actions, reconciling errors, and most importantly, the best method of data ingestion: delta updates.
File Size and Upload Guidelines
Before you upload your first file, it’s important to understand file size restrictions and how fabric handles files that exceed those restrictions.
- No files larger than 300MB Limit the size of your uploads to 300MB.
- Split files larger than 300MB into smaller ones Splitting large files into smaller ones before uploading them is the quickest way to import large amounts of data. For fastest processing, the ideal file size is between 80-100MB.
- Parallel processing fabric can process multiple files in parallel. The number of parallel files depends on your package. When the limit has been reached, subsequent files will be in a “scheduled” status until moved into the queue. Reach out to your account representative to learn more.
- Automatic file chunking is available fabric can automatically chunk files larger than 300MB into smaller files to improve performance. This feature is only available in select packages. Reach out to your account representative to learn more.
Delta Updates
A delta update involves transmitting only the changed data fields when making an update. This is in contrast to the more traditional “full feed” updates that send the entire dataset. By sending only the changed data fields, fabric can process updates without reprocessing unchanged data.
Delta updates are the preferred method for all uploads.
Delta updates vs. full feed updates
Full Feed Data Updates | Delta Data Updates | |
---|---|---|
Resource Usage | Requires more resources | Requires fewer resources |
Processing Time | Longer processing time | Shorter processing time |
Data Transmission | Transmits entire dataset | Transmits only modified data fields |
Network Bandwidth | Consumes more network bandwidth | Requires less network bandwidth |
Storage | Requires more storage space | Requires less storage space |
Error Handling | Prone to errors during full data transmission | Less prone to errors due to focused updates |
Scalability | Less scalable for large datasets | More scalable, especially for large datasets |
Data Accuracy | Potential for data redundancy and inconsistency | Enhances data accuracy by focusing on changes |
Operational Efficiency | Lower operational efficiency due to higher resource consumption | Higher operational efficiency due to optimized resource usage |
Incremental Updates | Updates entire dataset each time | Updates only modified data fields incrementally |
Ways to Import Data
You can import data into fabric using the following methods:
- CSV import via API
- Import via RESTful APIs
- CSV import via the Copilot interface
The import method you choose is up to you, but in each case, uploading smaller files and using the delta update method will result in quicker processing, better resource management, and a higher degree of accuracy.
Data Formatting
It is crucial to make sure your dataset is accurate and compatible with fabric’s formatting before initiating the upload process. Validate your data to avoid errors by reviewing the file to identify any changes since the last upload and confirm that the data’s structure and format are correct. See the following pages for formatting guidelines:
Import Actions
The actions you use when importing items, bundles, categories, and collections tell fabric how you are modifying your data. The following actions are available:
- UPSERT: Creates a new product if the product doesn’t exist, or it updates an existing product.
- CREATE: Creates a new product.
- UPDATE: Updates existing product.
- PUBLISH: Publishes an existing product that was in draft state. The product is published to the storefront.
- UNPUBLISH_KEEP_DRAFT: Unpublishes an existing product. If the product already has a draft version, the live version is unpublished and discarded. If product does not already have a draft version, the live version is unpublished and saved as a draft.
- UNPUBLISH_KEEP_LIVE: Unpublishes an existing product. If the product already has a draft version, the draft version is discarded.
- DELETE: Deletes the existing product.
- ATTACH_VARIANT: Adds variants to an existing parent product. The variant column holds the variant to be attached to the SKU.
- DETACH_VARIANT: Detaches existing variants. The variant column holds the variant to be detached from the SKU.
- CHANGE_CATEGORY: Updates the category of existing product.
- ATTACH_CHANNELS: Appends listed channels to the product, allowing it to be available across various sales channels. You can specify the channels to be attached in the Channels column. After attaching channels, users should verify the attachment post-action to confirm the successful addition of channels to the product.
- DETACH_CHANNELS: Removes listed channels from a product. Users specify the channels to be detached in the Channels column. This action is useful when a product needs to be removed from specific sales channels while remaining available on others. Fabric users should verify the detachment of channels post-action to ensure the desired channels are removed from the product.
Reconciling Errors
If there are errors during processing, download the error file and review each error to identify the problem. Correct the errors by updating the CSV file with the necessary changes and validate the corrected CSV file before re-importing.
Was this page helpful?