My name is Pete Shima and I have a passion for availability, monitoring and operational excellence. I have done operational and development work across companies large and small from systems with ten nodes to millions of nodes. I am an engineer at heart but have had many positions in leadership or management roles and I like to stay connected to the details.

HashiCorp

Site Reliability Engineer (SRE), Remote

October 2015-Present

HashiCorp builds tools to power the modern datacenter. HashiCorp's most popular tools are Vagrant (run local virtual machines easily), Packer (build images for distribution) and Terraform (infrastructure as code) which are developer tools used to create and manage infrastructure. HashiCorp also has several runtime tools such as Consul (service discovery and key value store), Vault (secrets management), and Nomad (cluster/container scheduler). I am currently the team lead on the site reliability team at HashiCorp which is responsible for the reliability of the Atlas (software as a service product) and Private Atlas installations.

  • Created and built the on-call program spanning multiple development teams with escalation workflows and critical alarming with runbooks.
  • Designed and implemented a post mortem process to reduce repeat issues and help with company operational growth.
  • Manage and own the infrastucture as code through Terraform across multiple environments.
  • Migrated Atlas, the SaaS product, from instances to Nomad, a cluster scheduler for running containerized processes.
  • Built, designed, and implemented a process to run the SaaS product on customer premises.
  • Work directly on-site with large customers to setup Private Installations.
  • Push for operational excellence across all the tools and platform.
  • Built staging stacks used for pre-production testing that can be created from scratch in minutes.
  • Implemented centralized logging for all production services.
  • Built canary systems to help detect and measure faults or unexpected issues.
  • Create and manage a roadmap for the reliability team.


Amazon Elastic Load Balancing (ELB)

Manager, Operations, Seattle

August 2014-October 2015

"Elastic Load Balancing automatically distributes incoming application traffic across multiple Amazon EC2 instances in the cloud. It enables you to achieve greater levels of fault tolerance in your applications, seamlessly providing the required amount of load balancing capacity needed to distribute application traffic." - http://aws.amazon.com/elasticloadbalancing/

In October 2015 I transferred from the Amazon S3 team to the Amazon ELB team as a Systems Engineering manager. I developed and grew a team of System and Support engineers to solve the problems of a massive scale service used in majority of AWS architectures. I met directly with customers and wrote and delivered multiple externally facing post mortems for large scale events. Managed the capacity for the service at scale and created a roadmap and charter to define the team as it grew.

  • Manage a team of 4-12 engineers including managing performance.
  • Managed capacity for systems measured in hundreds of thousands across 400+ dimensions.
  • Piloted and created a customer experience team to engage with customers directly.
  • Worked virtually and on-site with multi-million dollar and fortune 500 companies.
  • Hiring and recruiting for multiple positions including roles requiring security clearance.
  • Piloted a Systems Engineering AWS wide community program.
  • Mentored and developed staff inside and outside of direct organization.
  • Developed a charter and roadmap for the team creating an identity for Systems Engineering.


Amazon S3

Manager, Operations, Seattle

October 2013-August 2014

"Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers." - http://aws.amazon.com/s3/

I was an Systems Engineering manager for the team running the S3 indexing and metadata service. The indexing and metadata service is responsible for handling hundreds of thousands of requests per second across a large distributed system. Built and grew a new team of Systems Engineers/Ops/SRE to handle fleet management, administration, scaling, and capacity of the S3 indexing services as well as develop and adopt new programs to improve reliability and performance.

  • Manage a team of 3-6 including managing performance.
  • Lead and piloted a change management board approving 800+ production changes.
  • Managed a project of 10+ engineers to deploy a mission critical time sensitive update across every running production host and service with no outages.
  • Developing a team charter and goals.
  • Hiring and recruiting.
  • Fleet management.
  • Capacity management.


Amazon S3

Systems Engineer, Operations, Seattle

May 2012-October 2013

"Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers." - http://aws.amazon.com/s3/

As a member of the S3 Ops team, we keep S3 running from the front end to the back end.

  • Automation.
  • Working at large scale.
  • Solving deep technical problems.
  • Measurement and alarming.


King of the Web

Operations Engineer, Seattle

July 2011-May 2012

My role at King of the Web is to ensure our architecture is online, secure, and scaled to our needs. Being a viral video site and showing multi million real-time vote counts present interesting challenges. We use Ruby on Rails and we don't mess around.

Major Goals:

  • Migration off of Engine Yard PAAS on to private hybrid cloud. Migration completed 5 months after start date.
  • Use Chef as a configuration management tool to setup end to end infrastructure, ongoing documentation and management.
  • Auto scale production web site to meet viral video traffic spikes which typically increase site traffic by 500%+ in minutes.
  • Provide a stable environment that can be handled by a single staff member and limited on-call. High focus on visibility and KISS (keep it simple stupid).
  • Develop and see from start to finish projects that span multiple departments.
  • Be an enabler, not a barrier, for development and business ops. Build tools and write maintainable code to automate whatever possible and keep dev happy.
  • Release management and zero downtime deploys to production.
  • Integration with multiple APIs and development of tools that create solutions to business challenges.


Rockstar Games

Consultant, Seattle

April 2011-July 2011

As a remote employee in Seattle, Washington my focus is to provide expert advice and resource across the Rockstar Games development process. Using my diverse skillset and newly available technology I help to keep Rockstar on the bleeding edge of modern day development tools.

Major Goals:

  • Work with technology directors and executive staff to keep development needs at the forefront of IT focus.
  • Review existing workflows for data movement and provide recommendations along with end to end completion of agreed solutions.
  • Investigate business development ideas and provide advice on 3rd party software including business case scenarios and potential return on investments.
  • Provide documented architecture design complete with proof of concept or alpha implementations of desired feature sets or tools.


Take-Two Interactive Software Europe

Infrastructure Manager (Label Technology), London

2010-2011

In our centrally located London office I was responsible for designing and implementing solutions with a goal to improve speed of the game development process through technology. Being the sole member of this department I utilized my in-depth experience within the company to identify and implement solutions to help developers.

Major Accomplishments:

  • Designed and implemented a cross studio global file transfer platform with Aspera technology across 20 different studios and many game titles.
  • Designed, developed and implemented a custom web portal to securely manage deployment of game builds and common development and publishing functions including SDK upgrades.
  • Developed back end systems for a global video sharing solution integrated into an existing large scale development toolset.
  • Worked with development staff including producers, programmers, artists, studio heads and more along with internal/external IT teams to define needed tools for the future.


Take-Two Interactive Software Europe

Infrastructure Manager (Corporate), London

2008-2010

At the European headquarters for Take-Two Interactive (NASDAQ:TTWO) I was responsible for managing end to end infrastructure across London, Windsor, Germany, Spain, France, Netherlands, Italy, Geneva, Singapore, and Australia. I also worked closely with other sites in the European time zone and interfaced with global IT leaders to develop strategy and provide leadership for IT staff.

Major Accomplishments:

  • Over 12 months integrated all European and Pacific Rim sites into global Active Directory and Exchange Forest.
  • Implemented hardware, software and configuration standards across a disparate infrastructure.
  • Hired and developed new European IT team consisting of support staff and engineers.
  • Transitioned European headquarters from Geneva, Switzerland to Windsor, UK including staff and datacenter.
  • Consolidated core services such as email, blackberry and backups into European headquarters.
  • Implemented the largest internal company sharepoint system and provided maintenance and support.
  • Renegotiated and consolidated vendor contracts providing cost savings along with realigning budgets.
  • Developed and implemented a global level 2/3 service desk for "follow the sun" support.


Take-Two Interactive Software Inc.

Senior Systems Engineer, NYC

2007-2008

The focus of Senior Systems Engineer was to take a global approach to architecture and engineering across the company. I was responsible for level 2/3 support, architecture and implementation across all sites in the global Active Directory forest. In addition to that I played a big part in the continual improvement of our public facing datacenter.

Major Accomplishments:

  • Continued integration of international sites into global Active Directory and Exchange Forest from a wide range of independently run studios.
  • Implemented hardware, software and configuration standards across disparate infrastructure.
  • Developed a global centralized event logging system monitoring logs across servers, network gear, and more.
  • Provided level 2/3 support and training to local IT staff in various locations.
  • Worked with developers to update and improve web servers and databases for public facing websites.
  • Resolved difficult technical issues with sensitive time frames.


Rockstar Games

Systems Engineer, NYC

2004-2007

In my position as systems engineer I was responsible not only for the administration and maintenance of the server infrastructure but to provide leadership for global IT architecture. I was also providing level 1-3 support for local staff.

Major Accomplishments:

  • Discovery, design and launch of a global Active Directory and Exchange Forest.
  • Architected and setup a new centralized spam filtering solution which blocks over 5 million spam messages a month.
  • Designed and implemented a global DFS structure allowing all staff globally to map 1 single network drive.
  • Developed global technology standards used across the organization.


Hello, how are you?

Please don't spam me

But I'd love to hear from you.


Social:

Twitter: petey5k

LinkedIn: Pete Shima

GitHub: pshima

Pete Shima

Fair and abiding citizen

Seattle, WA 98199
P: 206-450-1021
Email
me@peteshima.com