RevereRecruiter Since 2001
the smart solution for Revere jobs

Sr. Systems Operations Engineer

Company: The Trade Desk
Location: Boston
Posted on: November 13, 2024

Job Description:

Who We Are
At The Trade Desk, we recognize that a seamless customer experience is driven by operational excellence. In pursuit of constantly improving the reliability of our platform, we are establishing a global Systems Operations team. This team's core mission will be to vigilantly monitor The Trade Desk platform services, refine our incident response methodologies, and guarantee a robust and highly-available customer experience. If you're passionate about ensuring system reliability, process improvement, and making an essential customer impact, we invite you to play a critical role in this next evolution of our on-call experience.
What You'll Do


  • Act as a technical expert and advisor to more junior Associate Systems Operations Engineers
  • At an escalated tier, monitor the state of platform services and stability via telemetry and alerts; triage issues, escalate to engineering teams as needed

    • Work collaboratively with development teams to facilitate issue remediation
    • Manage remediation task workflow

    • Proactively update and improve Systems Operations documentation and runbooks
    • Increase the effectiveness of the incident response process by defining and measuring relevant metrics
    • There may be periodic weekend coverage requirements

      Who We are Looking For

      • Bachelor's Degree from a four-year university or relevant substitute experience
      • 6+ years relevant work experience in Technical and/or Application Support with strong knowledge of services support and troubleshooting

        Technical Proficiency:

        • Understanding of large-scale distributed system architectures (e.g., databases, web services, application services).
        • Familiarity with monitoring tools (e.g., Prometheus, Grafana, Nagios).
        • Ability to configure and fine-tune alerts.
        • Proficiency or ability to learn programming languages including C# and SQL.

          Incident Management and Troubleshooting:

          • Ability to prioritize and manage incidents based on severity, with a focus on customer impact.
          • Ability to remain calm under pressure and quickly diagnose issues.
          • Understanding of system logs, metrics, telemetry.

            Communication Skills:

            • Ability to communicate effectively with stakeholders during an incident.
            • Clear and concise documentation skills.
            • Ability to maintain and update troubleshooting guides (TSGs) and operational documentation.
            • Ability to translate complex technical issues and platform outages to non-technical stakeholders.

              Automation & Scripting:

              • Ability to automate repetitive tasks.
              • Proficiency in scripting languages (e.g., Python, Bash) is a plus.

                #J-18808-Ljbffr

Keywords: The Trade Desk, Revere , Sr. Systems Operations Engineer, Other , Boston, Massachusetts

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Massachusetts jobs by following @recnetMA on Twitter!

Revere RSS job feeds