Hacker Newsnew | past | comments | ask | show | jobs | submit | dhinus's commentslogin

Our apps are made by 5-15 (micro)services. I'm not sure if this approach would scale to hundreds of services managed by different teams.

We store the source code for all services in subfolders of the same monorepo (one repo <-> one app). Whenever a change in any service is merged to master, the CI rebuilds _all_ the services and pushes new Docker images to our Docker registry. Thanks to Docker layers, if the source code for a service hasn't changed, the build for that service is super-quick, it just adds a new Docker tag to the _existing_ Docker image.

Then we use the Git commit hash to deploy _all_ services to the desired environment. Again, thanks to Docker layers, containers that haven't changed from the previous tag are recreated instantly because they are cached.

From the CI you can check the latest commit hash that was deployed to any environment, and you can use that commit hash to reproduce that environment locally.

Things that I like:

- the Git commit hash is the single thing you need to know to describe a deployment, and it maps nicely to the state of the codebase at that Git commit.

Things that do not always work:

- if you don't write the Dockerfile in the right way, you end up rebuilding services that haven't changed --> build time increases

- containers for services that haven't changed get stopped and recreated --> short unnecessary downtime, unless you do blue-green


At work we also use a monorepo that consists of subfolder of services. We use Kubernetes and we store the config files of each service (and each environment dev/staging/prod) inside the same repo. The k8s config files are placed in directories following this pattern: `<service>/deployment/<environment>/<service>.yaml`.

To avoid rebuilding all services on every commit, we use Bazel to help determine what services need to be rebuilt. Note that we don't use Bazel as build system but just a tool to see what services are changed -- essentially we only use `filegroup` Bazel rule. After a push to git repo, we basically do (1) `git diff --name-only <before> <after>` to get changed files, (2) run `bazel query 'rdeps(..., set(list of changed files))'` at both `<before>` and `<after>` commits, and (3) combine the results of `bazel query` and look for the affected services.

Once we know what services need to be rebuilt, we trigger Jenkins jobs of those services. Each service will have its own Jenkins job and Jenkinsfile (we use Pipeline). Here we also package the application as Docker image and push it to the internal registry.

We keep track of what is released using "production" branch for each service. Once we have a build to release, we (1) create a "release candidate" branch from the commit of the build, (2) update the k8s config file, (3) apply the k8s config, and (4) merge this branch to the production branch of the service if everything is ok. Then we merge back the production branch to master branch.


We follow a very similar pattern and we are at over 150 micro services right now (AWS Lambda).

A couple of things different that we do since we are building and then deploying to AWS:

- Build only on dedicated deployment branches (beta, qa, preview, prod)

- Build all functions (transpile, yarn, lint, etc) on every merge into the branch, but only deploy functions with different checksums (saves on api calls to AWS)

- We cache node_modules, but otherwise don't have any special build requirements and babel takes care of targeting node6.10 for Lambda

Total build time is between 8-13 minutes. There are some things we can do to speed up install that we haven't yet because it's not an issue yet but just a short list of things to note.

- Each function has it's own package.json for it's own packages. We maintain a list of npm packages that we download into a single folder first (that doesn't get deployed) to allow yarn to use those files from cache. We will eventually switch to an offline install for each function which essentially just copies the package folder and sets up anything it needs.

- We have a tarball package that includes all of our shared code / config files. Yarn seems to always want to download this file, regardless if we pre-download it.

- We deploy a single api endpoint for all of our micro services through API Gateway which cuts down on the time to deploy since API Gateway has a pretty hard throttle. This means we create a deployment on API Gateway every merge. We have one APIG for each environment


Just to be clear, all of your functions exist inside one mono-repo, correct? You don't use git submodules at all?

Looks like a pretty solid build process. Thanks for the insight!


Yes, we have all of our functions in a single mono repo, broken into projects, and then folders for each function, similar to this:

- src/project1/function1/

- src/project1/function2/

- src/project2/function1/

- src/project3/function1/

- src/project3/function2/

- src/project3/function3/

Deploying the functions is done by project, so we deploy all of one project, then move to the next, and so on and so forth.


That's a great model. Do you use cloudformation each for deployment? If so, have you thought of creating a single cloudformation template for the whole deployment so you can do the entire deployment in one stack update?

Have you encountered any issues to watch out for when only using one APIG for each environment (150 micro-services). Have you encountered any downsides to doing this versus 1 micro-service to 1 APIG? I'm also running into the Gateway throttle limits and I think deploying many micro-services (like you have done) to 1 APIG is the best solution.


We don't use cloudformation because honestly, it sucks. It's hard limits are a pain in the ass to get around (with 150 lambda functions, we need hundreds of resources, so that means nested stacks, which just suck) and it managing the api gateway just doesn't do what we want.

We have a custom script to deploy our own API Gateway using the AWS SDK and we generate a swagger file from simple json config files.

For the API Gateway issues, so far, we have a few things that are something we have to watch out for.

- All lambda endpoints through APIG are lambda proxy type. This means we can have a framework handle standard request / response stuff. The downside is that we can't support binary endpoints easily because they haven't fixed that issue yet.

- HTTP proxy pass through endpoints have to be added to the swagger somehow before we deploy. This is a little annoying, but not a huge issue

- Merge vs Override for deployments. We merge in beta, and override in other environments. This allows us to keep endpoints exactly as they are, but allow flexible testing in beta

1 APIG for 1 micro service isn't great IMO at scale since we run all our endpoints under on domain and mapping all of them would be a pain.


That's very valuable advice. Thank you. I've been following the serverless.com model of 1 APIG to 1 lambda, but that quickly puts you over the AWS limits when trying to manage hundreds or thousands of micro-services.


Yup yup, I went down that path and converted our then very basic deployment process to use serverless and instantly hit hard limits.


> Whenever a change in any service is merged to master, the CI rebuilds _all_ the services and pushes new Docker images to our Docker registry.

Why are you rebuilding _all_ the services, wouldn't it make sense to just rebuild the ones that have changes? You're now rebuilding perfectly working services without any new changes just because some other service changed, or am I misunderstanding something here?


Because we want to make sure that in the Docker registry we have _all_ services tagged with the latest commit.

For example you might have a Git history like this:

* 89abcde Fix bug in service_b

* 1234567 Initial commit including service_a and service_b

When 89abcde is pushed, the CI rebuilds both service_a and service_b so we can simply "deploy 89abcde" and you always have only one hash for all services, that is also nicely the same hash of the corresponding Git commit.

The trick to avoid rebuilding perfectly working services is to use Docker layer caching so that when you build service_a (that hasn't changed) Docker skips all steps and simply adds the new tag to the _existing_ Docker image. The second build for service_a should take about 1 second.

In our Docker registry we end up with:

service_a:1234567

service_a:89abcde

service_b:1234567

service_b:89abcde

But the two service_a Docker images are _the same image_, with two different tags.


Why? Microservices are suppose to be truly independent.


For ease of deployment and to solve the problem of "what version of service_b is compatible with version x of service_a"?

IMHO this makes sense if the microservices are developed by the same team. If we're talking about services developed and managed by different teams... maybe it's not a good idea.


My guess is that it is because of the mono-repo. Since it would take some work to figure out what changed and what to build, they just did it the easy way and re-build everything :-)


> We store the source code for all services in subfolders of the same monorepo (one repo <-> one app).

So I'm curious, does each service instance have their own server, or do you have multiple services on one server instance?

I have some experience working with microservices. I saw the clear business benefits of being able to map design domain boundaries to repos and specific teams, and to let those teams be able to control their deployments while minimizing external dependencies.

But we seemed to be paying a lot in network chattiness, slow site response times, and networking costs. I'm wondering if we could have minimized those costs by sticking some of those microservices on the same server instance. Not really change service boundaries or interfaces, but change the methods that the microservice interfaces use to communicate.


Quick question: Does docker, or any other higher level service, let you "tag" images? Ideally, you could build only changed stuff, and use that sha to tag every image. That way you still get the benefits of one hash, and that hash representing the state of the codebase as well, while cutting down on build time.


Yes, but the real win is in a different way than I think you're describing. With Docker, containers are built in a layered fashion with each 'step' of the build creating a new layer (think version control hashes). The benefit here is twofold:

First - If your change to the container is near the end of the build process (see earlier comment about smart container design), then the rebuild will only change the final few hashes and Docker is smart enough to not rebuild earlier hashes.

Second - Hashes are global, so if you have multiple containers that start with the same base (say, Alpine Linux + Python + NMP + etc.), Docker will share existing hashed layers. This means a much smaller distribution payload.

To (what I think is) your original question - you can tag the 'final' container itself. Tagging it with the Git hash is one way to get exactly what you're talking about.


How long do your deployments take on average?


Depends on many factors... It can vary from 5 to 30 minutes, from the moment someone presses "merge" on a pull request and the moment that change is live on a test environment. The average is probably around 10-15 minutes.

The builds for all services happen in parallel, so the longest one determines the total time. Big Scala services take much longer than small React frontends. We cache both Maven and NPM modules from previous builds.

Ideally, if the pull request only modified a React component and didn't touch any Scala file, no Scala build is triggered because Docker finds a cached layer and skips the "sbt compile" step. To be honest, we are still working to make sure this always happens, we still trigger unnecessary sbt compiles because the Docker cache is not used correctly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: