Challenge
The previous job scheduling service was built on a GCP-based stack and frequently experienced execution delays, defeating the purpose of timely job completion. The service also lacked critical features such as configurable blackout days to prevent job execution during specified periods (e.g., holidays). The client needed a new, highly scalable solution that would seamlessly integrate with their core platform and provide comprehensive monitoring and alerting capabilities.
Our Solution
Developed a state-of-the-art Schedule Service, deployed on AWS using Elastic Kubernetes Service (EKS).
To address these challenges, we leveraged a modern technology stack:
- Backend: Kotlin/Java with Spring Boot
- Database: PostgreSQL, AWS DynamoDB
- Messaging & Processing: Kafka for job queuing, third-party job executor for execution
- Monitoring & Logging: Splunk, New Relic, and application-specific alerts integrated with Opsgenie
The new Schedule Service provides a collection of RESTful APIs that enable various microservices to schedule jobs with flexible configurations, including minute, hour, day, month, and year-based intervals, as well as one-time executions triggered by core application events.
A key enhancement was the introduction of blackout days, allowing organizations to define periods when scheduled jobs should not run, improving compliance and operational control.
Results
The newly developed AWS-based Schedule Service significantly improved job execution reliability and performance.
Key outcomes included:
- Elimination of execution delays, ensuring jobs are processed on time.
- Scalability to handle enterprise-wide scheduling needs efficiently.
- Enhanced monitoring and alerting through Splunk, New Relic, and Opsgenie, providing real-time visibility and proactive issue resolution.
- Seamless integration with the client’s microservices architecture, enabling organization-wide adoption.
Today, the Schedule Service is an integral component of the client’s platform, supporting a variety of use cases, including notification dispatch and workflow automation. The solution has set a new standard for reliability, scalability, and operational efficiency in job scheduling.