Large Projects Crashes Import From Spreadsheet
Introduction
In the realm of data ingestion, large projects often pose significant challenges. One such issue is the crash of import from spreadsheet, resulting in incomplete or partially imported projects. This article delves into the details of this critical problem, exploring its causes, affected projects, and potential solutions.
Describe the Bug
While uploading a large spreadsheet (i.e., with > 10K biomaterials), ingestion at one point stops, leading to a 104 error and/or a partially imported project (missing any entity or linking depending on when it stopped). This issue is particularly concerning, as it can result in the loss of valuable data and hinder the progress of ongoing projects.
To Reproduce
To replicate this behavior, follow these steps:
Step 1: Go to Ingest
Navigate to the ingest page at https://contribute.data.humancellatlas.org/.
Step 2: Click on "Register a Project"
Click on the "Register a project" button to initiate the project registration process.
Step 3: Use the Provided Spreadsheet
Use the example spreadsheet provided at https://docs.google.com/spreadsheets/d/1h1qVfzqUMws1vXDS5q_V3Tf8i76VLc3H/edit?usp=drive_link&ouid=101176945353039944995&rtpof=true&sd=true.
Step 4: Wait for Ingest to Crash
Wait for the ingestion process to crash, resulting in an error message and/or an incomplete submission.
Step 5: Observe the Error Message
Observe the error message, which typically indicates a connection reset by peer (104 error).
Expected Behaviour
The expected behavior is for all entities and linkings to be complete and uninterrupted, ensuring a seamless and successful project import.
Affected Projects
The following projects are affected by this issue:
HeartReconstructionPostHF
- Project ID: 41aee104-4013-4f51-be3b-0d8fccaa3869
- Ingest Link: https://contribute.data.humancellatlas.org/projects/detail?uuid=41aee104-4013-4f51-be3b-0d8fccaa3869
- Project Type: Heart Bionetwork
- Urgency: High (to be ready before Atlas release Q2 2025)
dcp1514250681821643
- Project ID: 497007a4-1881-43a1-a417-32958f2bdc77
- Ingest Link: https://contribute.data.humancellatlas.org/projects/detail?uuid=497007a4-1881-43a1-a417-32958f2bdc77&tab=project
- Project Type: Pancreas Bionetwork
- Urgency: High (to be ready before Atlas release Q2 2025)
Fetal/ Maternal Interface
- Project ID: 42a4e912-0f8e-4d23-9481-eb45790d589b
- Ingest Link: https://contribute.data.humancellatlas.org/submissions/detail?uuid=42a4e912-0f8e-4d23-9481-eb45790d589b&project=f83165c5-e2ea-4d15-a5cf-33f3550bffde
- Project Type: Not in any bionetwork
- Urgency: Low (no urgency at present)
Environment
The issue is observed in the production environment.
Browser
The issue is observed in Chrome.
Potential Solutions
To address this critical issue, the following potential solutions can be explored:
1. Optimize Ingestion Process
Optimize the ingestion process to handle large spreadsheets more efficiently, reducing the likelihood of crashes and errors.
2. Implement Load Balancing
Implement load balancing to distribute the ingestion workload across multiple servers, ensuring that no single server becomes overwhelmed and crashes.
3. Increase Server Resources
Increase the server resources (e.g., CPU, memory, and storage) to handle large projects and prevent crashes.
4. Develop a Backup System
Develop a backup system to ensure that data is not lost in case of a crash or error during ingestion.
5. Provide Real-time Feedback
Provide real-time feedback to users during the ingestion process, allowing them to monitor the progress and identify potential issues early on.
Introduction
In our previous article, we discussed the critical issue of large projects crashing import from spreadsheet, resulting in incomplete or partially imported projects. In this article, we will address some of the frequently asked questions (FAQs) related to this issue, providing clarity and insights to help users better understand the problem and its potential solutions.
Q&A
Q1: What is the cause of the crash during large project import?
A1: The crash during large project import is primarily caused by the ingestion process becoming overwhelmed by the sheer volume of data, leading to a connection reset by peer (104 error).
Q2: What are the affected projects, and how can I identify them?
A2: The affected projects are HeartReconstructionPostHF, dcp1514250681821643, and Fetal/ Maternal Interface. You can identify them by checking the project IDs and ingest links provided in our previous article.
Q3: What is the expected behavior during large project import?
A3: The expected behavior is for all entities and linkings to be complete and uninterrupted, ensuring a seamless and successful project import.
Q4: How can I prevent the crash during large project import?
A4: To prevent the crash during large project import, you can try the following:
- Optimize the ingestion process to handle large spreadsheets more efficiently.
- Implement load balancing to distribute the ingestion workload across multiple servers.
- Increase server resources (e.g., CPU, memory, and storage) to handle large projects.
- Develop a backup system to ensure that data is not lost in case of a crash or error during ingestion.
Q5: What are the potential solutions to address this issue?
A5: The potential solutions to address this issue include:
- Optimizing the ingestion process.
- Implementing load balancing.
- Increasing server resources.
- Developing a backup system.
- Providing real-time feedback to users during the ingestion process.
Q6: How can I report issues related to large project import?
A6: To report issues related to large project import, please contact our support team at support@humancellatlas.org. We will do our best to assist you and provide a resolution to the issue.
Q7: What is the estimated timeline for resolving this issue?
A7: We are working diligently to resolve this issue as soon as possible. However, the estimated timeline for resolution is dependent on various factors, including the complexity of the issue and the availability of resources.
Q8: How can I stay updated on the progress of resolving this issue?
A8: To stay updated on the progress of resolving this issue, please follow our social media channels (e.g., Twitter, Facebook, and LinkedIn) or sign up for our newsletter to receive regular updates.
Conclusion
In conclusion, the issue of large projects crashing import from spreadsheet is a critical problem that requires immediate attention. By understanding the causes, affected projects, and potential solutions, users can take proactive steps to prevent the crash and ensure a seamless and successful data ingestion experience. We will continue to work tirelessly to resolve this issue and provide regular updates to our users.