KinShip
Empowering Pet Care with Scalable Data Engineering for Kinship's Canine Health Platform
Home » Case Studies » KinShip
2019 - New York, USA
Pet Care Services
51-200 employees
Overview
Kinship, a division of Mars Petcare, is a leader in pet care technology, leveraging data to enhance pet health insights. Their Pet Insight platform collects vast volumes of activity data from IoT sensors embedded in pet collars, transforming raw information into actionable health insights. As data demands grew, Kinship partnered with Folio3, a trusted technology partner, to re-engineer its data infrastructure. This collaboration optimized data workflows and processing capabilities, enabling faster and more accurate health insights while laying the foundation for continued innovation in pet healthcare.
The Challenge – Efficiently Processing and Scaling Pet Data
Kinship's goal was to enhance its Pet Insight platform by efficiently processing large datasets from IoT sensors, which tracked canine activity over extended time frames. However, the system faced significant obstacles:
Scalability Bottlenecks: Processing data for thousands of dogs over months and years overwhelmed the existing infrastructure.
Slow Data Retrieval: Data older than 90 days took excessive time to fetch, limiting the ability to perform timely analyses.
Data Duplication Risks: The system lacked a reliable mechanism to prevent data duplication, leading to inefficient storage and skewed data integrity.
Inconsistent Data Processing: The system struggled to convert raw sensor data into usable formats, such as PetInsightTimeData (PITD) objects, slowing the flow of actionable information to machine learning models.
The Solution – Cloud-Driven Data Optimization for Scalable Insights
Folio3 conducted a thorough audit of Kinship’s existing infrastructure, pinpointing critical inefficiencies and scalability limitations. The outcome was a robust data engineering solution, leveraging PySpark on Databricks to optimize data processing workflows and enhance overall performance:
This data engineering-focused approach not only addressed Kinship’s immediate data processing challenges but also laid the groundwork for future scalability and advanced analytics capabilities.
Databricks for Advanced
Data Engineering
The entire solution was architected on Databricks, leveraging its unified analytics platform to integrate data engineering and machine learning workflows. PySpark’s distributed processing capabilities enabled the efficient transformation and analysis of massive datasets in near real-time, supporting Kinship’s requirement for agile data-driven decision-making.
Multi-Threaded Data
Ingestion with PySpark
A multi-threaded data ingestion system was designed using PySpark on Databricks to parallelize the processing of data from multiple dogs simultaneously, significantly boosting throughput and reducing retrieval time.
AWS S3
Integration
Folio3 utilized AWS S3 for highly scalable, secure storage of canine activity logs. This allowed for efficient data retrieval and ensured high availability across the platform.
Data Duplication
Prevention
Folio3 introduced a mechanism that checked existing logs in the Delta Table to ensure only unique data was processed, minimizing redundancies and optimizing storage.
PITD Object Processing and
Conversion
Raw sensor data was dynamically converted into PITD objects and subsequently processed into Pandas DataFrames using PySpark, ensuring smooth, efficient handling of complex datasets.
DynamoDB for Data
Querying
To streamline data access, Folio3 implemented DynamoDB for querying log file paths based on parameters like dog ID and timeframes, enabling real-time access to critical data.
Scalable and High-
Performance Data Pipelines
The scalable architecture enabled by Databricks and PySpark allowed Kinship to handle increasing data volumes without compromising performance. The system dynamically allocated resources based on workload demands, ensuring optimal data throughput and processing efficiency.
Technologies Involved In This Case
Amazon S3
DynamoDB
Databricks
Delta Lake
PySpark
Pandas
PySpark
Results & Achievements
Rapid Data Retrieval
With the newly implemented solution, data retrieval times were reduced from hours to minutes, allowing Kinship to access historical records faster than ever before.
Scalable Architecture
The introduction of horizontal auto-scaling ensured Kinship’s platform could effortlessly scale to handle increasing data volumes from thousands of animals, future-proofing the system for continued growth.
Elimination of Data Duplication
By preventing data duplication, Folio3 optimized storage usage and ensured clean, accurate data for Kinship’s machine learning models.
Accelerated Time-to-Insight
The revamped data pipeline enabled Kinship’s data scientists to process and analyze large datasets in real-time, delivering faster, more accurate insights for improved pet care.