Harnessing Scalable Job Scheduling for Enterprise Efficiency

Digital Natives
Scalable job scheduling service for enterprise to improve efficiency of job scheduler.
99%
Less Scheduling Errors
100%
Visibility of Execution Status of Jobs

Summary

Industry

Digital Natives

Challenge

Scheduler Service was missing critical features for timely execution of jobs and suffered from delays in running the jobs causing issues with timely processing of critical data.

Highlights

  • Developed an Cloud-native scheduling service.
  • Provided guided navigation for scheduling jobs..
  • Captured business user preferences and delivered automation service prioritizing jobs based on schedule, blackouts and job priority.
  • Integrated with Splunk, New Relic, and Opsgenie for application-specific alert monitoring and logging.
  • Enabled organization wide adoption by providing seamless integration. 

Challenge

The previous job scheduling service was built on a GCP-based stack and frequently experienced execution delays, defeating the purpose of timely job completion. The service also lacked critical features such as configurable blackout days to prevent job execution during specified periods (e.g., holidays). The client needed a new, highly scalable solution that would seamlessly integrate with their core platform and provide comprehensive monitoring and alerting capabilities.

 

 

Our Solution

Developed a state-of-the-art Schedule Service, deployed on AWS using Elastic Kubernetes Service (EKS).

To address these challenges, we leveraged a modern technology stack:

  • Backend: Kotlin/Java with Spring Boot
  • Database: PostgreSQL, AWS DynamoDB
  • Messaging & Processing: Kafka for job queuing, third-party job executor for execution
  • Monitoring & Logging: Splunk, New Relic, and application-specific alerts integrated with Opsgenie

 

The new Schedule Service provides a collection of RESTful APIs that enable various microservices to schedule jobs with flexible configurations, including minute, hour, day, month, and year-based intervals, as well as one-time executions triggered by core application events.

 

A key enhancement was the introduction of blackout days, allowing organizations to define periods when scheduled jobs should not run, improving compliance and operational control.

 

 

Results

The newly developed AWS-based Schedule Service significantly improved job execution reliability and performance.

Key outcomes included:

 

  • Elimination of execution delays, ensuring jobs are processed on time.
  • Scalability to handle enterprise-wide scheduling needs efficiently.
  • Enhanced monitoring and alerting through Splunk, New Relic, and Opsgenie, providing real-time visibility and proactive issue resolution.
  • Seamless integration with the client’s microservices architecture, enabling organization-wide adoption.

 

Today, the Schedule Service is an integral component of the client’s platform, supporting a variety of use cases, including notification dispatch and workflow automation. The solution has set a new standard for reliability, scalability, and operational efficiency in job scheduling.

 

 

Harnessing Scalable Job Scheduling for Enterprise Efficiency