DevOps: A Software Architect's Perspective (SEI Series in Software Engineering) 1st Edition by Len Bass, Ingo Weber, Liming Zhu

This is a great entry level into devops. It effectively explains how micro services coordinate with each other, how to go about testing, deployment strategies, why should we have monitoring tools, how to go about doing it, context of certain logs.
Teaches you how to manage your resources i.e. auto de-provision stagings
Oh and I really like the idea of CDC* (consumer driven contract) testing. It's less expensive than integration tests and probably faster since you don't do actual API calls. You can technically stub your provider.
*by martin fowler
There's this blue/green (or red/black) deployment strategy that was talked about. Given that you have N VMs of version A, provision the same N number of VMs with version B. When N VMs of version B are ready, route requests to B. After a safe period of stability, de-provision version A entirely.
Blue/green deployment is expensive though, because suddenly, you'll have 2*N number of VMs for a while. So they proposed an alternative -- rolling upgrade. For any N number of version Bs that you deploy, decomission the same N number of version A. However this might lead to race conditions. They went very specific about the race conditions, of which I can't remember..
Another approach is to route a small number of requests to version B (similar to canary testing). Then gradually increase the number of requests to B until all requests are routed to version B.
See Paxos algo, ZooKeeper/ZAB
Not sure if I get this whole part ^ though.... Not sure how my current company does it too.
It does touch on security audits and rbac.. but nothing new really. Oh one point that I remember --- Do away with admins going into SSH to dig out information and what not. Your ops should be sophisticated enough to provide info to admin and any auditor. Another method I came across, (i dont know if it's this book or The Devops Handbook, a case study of a company which i can't remember, has a kibana account for their security auditor. Ops don't need to work with devs to get rbac info and fill up the report. So that's pretty neat.
On to monitoring distributed systems:
1. Collate related items by time because different nodes in a single cluster differ by microseconds. So it doesn't make sense to rely on timestamps to debug. Use time intervals to determine relations.
2. Collate related items by context -- when doing a rolling upgrade, it's important to know the context of the logs. Is this message/log from Version A or B?
3. Volume of data needs to be thought of. How much of the logs do you keep? Perhaps you can have fine grain reports for past X days and course grain reports for archiving.
Then they mentioned about dbs to keep your logs, such as time series db --> round robin DB, Hadoop HDFS, amazon glacier.
Oh yes and what are the key characteristics of effective logs??? Very important.
- Consistent format across services
- Should include explanations (whichever if helpful, perhaps explanation of errors)
- Should include context info (i.e. datetime, source of log entry, PID, request ID, VM ID)
- Severity level
Alarms! They should include contextual information. You should also aim to prevent false positives and false negatives. You should also disable alarms during specific times. such as doing some upgrades..
AND... how do you monitor your monitoring system??? (i.e. nikhil's KPIs)
1. Business metric
2. Cycle time (not sure what this means)
3. Mean time to detect errors
4. Mean time to report errors
5. Amount of scrap (rework)
Also one advice they gave is to use local mirror of remote services, so that you dont have to deal with different versions of libraries when they upgrade.
Does AWS ELB (elastic load balancer) act as a registry? Registries help to track versions, determines ownership and SLAs.
Comments
Post a Comment