5 Reasons Your Automated Tests Fail

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 91

  • @valentyn.kostiuk
    @valentyn.kostiuk 2 ปีที่แล้ว +3

    I also have flickering tests for UI. We test with Selenium plus test browser providers (will not name those here). And because of network instabilities (VPNs and other stuff) we have to rerun some tests twice. We automated this process.
    And yes there were cases when it hid away real problems. With time you start putting the blind eye on this "flickering" tests.
    UI tests done right usually require more skill and ingenuity than API, robots and daemons tests.

  • @d3stinYwOw
    @d3stinYwOw 2 ปีที่แล้ว +6

    Really great video!
    I have some points though:
    1. For environment to fail sometimes you don't need to break configuration. It can just happen to have issue with HW debugger DUT data exchange. What to do in that case?
    2. About Versioning - what about more 'isolated' test systems, where people who write test don't see the code, and work using black/greybox techniques?
    3. System behaviour - it can be the case for DUT to stop behaving fully deterministic due to infancy of the libraries from vendors, like Hardware Abstraction Layer, or not-widely used part of used RTOS ;)
    I know, that I write from more embedded space of testing work, where you have dedicated HiL instances for various tasks, like data injection or non performant HiLs, but those are also valid things to consider when trying to deal with test intermittency :)

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +2

      1. Sure, hardware failures may break things, but I don't really class that as intermittent, because usually it will fail and then keep failing. The SW remains unchanged and you replace the HW!
      2. Don't do that, it doesn't work very well! Testing needs to be part of development, as soon as it is not, it is slower, lower-quality and telling you too late that your SW is no good. You need to build quality into the system, not try and add it later. I describe my thoughts on this in this video: th-cam.com/video/XhFVtuNDAoM/w-d-xo.html
      3. Yes, so either fix the problem in the library, or wrap it to isolate the problem so that you can test your code. The strategies that I describe work exceedingly well for embedded and electro-mechanical systems. This is how SpaceX and Tesla work. I have done quiet a lot of work in this space, using these techniques. A lot of this is about starting to think lf "testability"' as a property of good design, rather than an afterthought. It is rather like designing a physical product, like an aeroplane, a good design is easy to maintain, as well as doing everything else that it needs to do. Testability is a tool that keeps our SW "easy to maintain", not just because we can run the tests to tell us it works, but also because designing for testability promotes the attributes of SW design that manage its complexity, and so make it easier to maintain.

    • @d3stinYwOw
      @d3stinYwOw 2 ปีที่แล้ว +1

      @@ContinuousDelivery For point 1 tho, HW does not need to break to have this kind of issue. It might be the case that SW which we don't control does not behave like we expect it to do, and to replicate this, you need really specific setup.
      For point 3, it's doable only when SW devs write this HAL. Sometimes you can get this only as library + headers to link against during linking stage when building SW.

  • @bernardobeninifantin509
    @bernardobeninifantin509 2 ปีที่แล้ว +5

    One more great video!
    In two weeks I am going to present to my team something new I learned in an internal event where we share side projects of ours or amazing things we did. In this first edition of this event, I am presenting a small project with TDD, hoping my team will see some benefit to it. I've seen my team mates struggle with problems that could be very well handled by using TDD and I do want to help. That's why I say thank you! (for actually helping with the process of building a system).

  • @scottfranco1962
    @scottfranco1962 2 ปีที่แล้ว +3

    The reason a lot of manual testing goes on is because of management attitudes. "it will take too long to automate that test". Then, after the manual testing has been repeated dozens of times "we are pretty sure we are at the end of testing... to automate it now would be a waste of time".
    Honestly I got tired of pushing automated testing. Its too much work to overcome management biases, and if I succeed, often the result is that I get reassigned to test work, usually at a loss in pay and essentially volunteering to do a boring job.

    • @muyewahqomeyour7206
      @muyewahqomeyour7206 2 หลายเดือนก่อน

      @@scottfranco1962 I couldn’t agree more with you even if I tried.
      You’re absolutely spot on.

  • @elfnecromancer
    @elfnecromancer 2 ปีที่แล้ว +1

    I'm doing both collocated test and app code and test and app code in separate repos, and it's all about tradeoffs.
    Tests in the same repo are great for developer experience, especially for teams that are just discovering that a separate team testing their software is a bad idea. On the other hand, test environment (where test code is executing) is not the same as application environment and dependencies are also different. Installing all dependencies for commit-stage tests, the build itself, and then for acceptance tests, overall slows down the pipeline a lot.
    Tests in a separate repo are a bit easier to forget for teams inexperienced with testing but I've found that option to be better for optimizing pipeline performance. Your commit-stage testing and build stages don't need the dependencies used by your acceptance tests. Acceptance test repo can hold the configuration for its own environment, and it's own test job. This job can be ran as part of the overall CI/CD pipeline, but it can also run on its own, when tests change. After all, we don't want to rebuild and deploy the application if only tests or the environment in which test code runs have changed.

    • @PavelHenkin
      @PavelHenkin 2 ปีที่แล้ว

      You can achieve the best of both worlds by having multiple build targets in the same repo. So (in c# land), one solution, multiple projects. Multiple Dockerfiles - one for Tests, one for Service.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +1

      I think that the goal must be to test the code that will be released, rather than some close approximation of it. So I'd try and find other strategies than having different dependencies in the test environment, unless those are genuinely outside the scope of your system - your team is not responsible for building them, deploying them in step with the system you are testing, and running them. This means that they are in a separate process space, not dependencies that are compiled in to your system, and so using some form of inter-process comms. I fake at this point in the scope of Acceptance tests.
      If any of these things aren't true, then this code is part of the reality of the system that I want to test, so I want them in my acceptance test. So now I have to think harder about how to make that efficient. Incremental builds, incremental deploys, build bakery strategies and so on.

  • @andrewthompson8448
    @andrewthompson8448 2 ปีที่แล้ว +2

    I would love to see a video about how striving for as much purity as possible can solve many of these testing issues. It is also a good strategy to help drive impurity to the edges of the system.

  • @Weaseldog2001
    @Weaseldog2001 2 ปีที่แล้ว +1

    I love the T-Shirt.
    Your arguments are completely logical, yet... These are the things that generate meetings, and too often get a pass by management.
    There's reluctance to allow code quality to get in the way of a scheduled release.
    But if we look at the cost of letting code that is unreliable, go on to QA, we find the release dates negatively impacted anyway.
    Either your QA or the client's QA is going to find that uncertainty, and now you've involved multiple teams in the delay.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +1

      Yes, the approach that I describe is more efficient, and we have the data that backs that up for most of it, but it is different enough to challenge many people's assumptions about what works and what "good" looks like in SW dev, so it is hard to adopt. I don't know of a single company that has tried, that would ever willingly go back to the old way of doing things though.

  • @MisFakapek
    @MisFakapek 2 ปีที่แล้ว +1

    Ah, Dave I finally had to spend the buck and bought both of your books. Looking forward to see how good they will be!

  • @valentyn.kostiuk
    @valentyn.kostiuk 2 ปีที่แล้ว +1

    Let's share signs of potentially unstable tests.
    I'll begin:
    If your tests has Thread.sleep in one way or another in workflow 85% chance (subjectivity) it will be unstable and environment dependent. Will require tuning for every developer. And in the end even if stable will require more time to run in average than it could otherwise.

  • @ilovepickles7427
    @ilovepickles7427 2 ปีที่แล้ว

    The environment is a dependency.
    I never thought of it that way before. So true. Great lesson.

    • @redhotbits
      @redhotbits 2 ปีที่แล้ว

      wasnt that obvious the whole time? too much uncle bob is bad for your brain i guess

  • @dneary
    @dneary ปีที่แล้ว

    I've found test isolation to be difficult in situations where scaffolding requires loading a large data set or multi-step scaffolding of the software dependencies - is there a way to do something like "scaffold to base test environment for test collection X, run test X1, revert to base system, run test X2, revert to base system, etc"? You don't want to rebuild and tear down expensive scaffolding for each test, you just want to roll back any side effects of each test. I could never figure out an easy way to do this in the past.

    • @ContinuousDelivery
      @ContinuousDelivery  ปีที่แล้ว +1

      Sorry to sound too salesy, but my ATDD course covers test isolation in detail. courses.cd.training/courses/atdd-from-stories-to-executable-specifications
      You may find this video interesting, and hopefully helpful, too: th-cam.com/video/vHBzZHE4tJ0/w-d-xo.html

  • @alexanderradzin1224
    @alexanderradzin1224 2 ปีที่แล้ว

    IMHO system behavior should be the first reason in the list. At least according to my experience this is the most often cause of test failures.

  • @scottfranco1962
    @scottfranco1962 2 ปีที่แล้ว +3

    On my best projects, I keep the tests, the code, and the compete report generated by the test, which is time and date stamped, in the repo. When I worked at Cisco Systems, we went one better than that and kept the entire compiler chain in the repo, including compiler, linker, tools, etc.
    I teach the init-test-teardown model of individual tests, and one of the first things I do when entering a project is mix up the order of the individual tests in the run. This makes them fail depressingly often. Most test programmers don't realize that their tests often depend inadvertently on previous test runs to set state in the hardware or simulation. I do understand your point about running them in parallel, but admit I would rather run then in series, then mix them up in order. Why? Because running them in parallel can generate seemingly random errors, and more important, aren't repeatable. I would run them in order, then mixed order, then lastly in parallel because of this.
    Finally, many testers don't understand that testing needs to be both positive and negative. Most just test for positive results. Testing that the target should FAIL for bad inputs is as important, or I would say MORE important, than positive tests, since they go to the robustness of the system. Further, we need to borrow concepts from the hardware test world and adopt coverage and failure injection.

    • @dhLotan
      @dhLotan 2 ปีที่แล้ว

      Definitely seen my share of tests where the success of the seventh test was only working because of test 3. Someone changes test 3 and other test(s) break makes me wanna pull my hair out.

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      That is a very good point.
      Though I think that test when completed, should cleanup after itself.
      Each test should be capable of running standalone.
      But concurrent tests are problematic, such as tests against a database. I have some simple ones that verify that an API writes certain records, according to higher level input.
      At the end I do "DELETE * FROM TABLE".
      In this case, I'm more concerned with keeping the test simple and fast, than I am supporting concurrency with other database tests. In our test environment, concurrency and the added complexity, is unlikely to buy a meaningful speed increase.
      Of course this may change with future conditions.

    • @scottfranco1962
      @scottfranco1962 2 ปีที่แล้ว

      @@Weaseldog2001 Well, yes, and no. Now we are in to test theory. If a test problem needs to be fixed by better cleanup and the end of the test, does that not imply that the initialization of the next test is a problem? It is clearly not able to bring the system to a stable state before the test.
      We had a test unit "farm" at Arista, the idea was that there was a large pool of hardware test units, and the software could take a test run request, grab an available unit, run the tests, and release again. The biggest issue with it was that machines regularly went "offline", meaning they were stuck in an indeterminate state and could no longer be used. This was even after cycling power for the unit and rebooting. The problem was solved by taking some pretty heroic steps to restart the machine, as I recall even rewriting the firmware for the machine.

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      @@scottfranco1962 I understand. Ehen your hardware isn't production ready, that introduces a lot of unique challenges.

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      @@scottfranco1962 But yes, the test should be able to initialize it's environment.
      This can be optimized by separating initialization and cleanup from the individual tests.
      When a new test starts, it could call the initialization routine, which would know if the system had already been started. It could keep a counter for recursive calls, so that it doesn't shutdown until all teats have been released.

  • @thescourgeofathousan
    @thescourgeofathousan 2 ปีที่แล้ว

    Co-locating code, tests and environment config in the one repo is not the simplest solution.
    It is only the simplest solution for one aspect of the problem.
    But it shifts complexity to another location rather than actually dealing with the complexity of the problem space.
    Every time you solve a problem of related SW entities by using mono repo method you push the complexity of managing the relationship and the need to rebuild and test run-time assets into either the package repo, the build process, the provenance management or all of the above.
    The best way to manage relationships between SW elements is via CICD pipelines that trigger each other due to events that happen to each related element.
    This keeps the logic of the relationship management out in the open so it can easily be seen (as opposed to, for example, buried in the build script - I’m looking at you gradle ppl) and makes the overall solution a “more of the same” affair rather than an undertaking of special case handling.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +1

      The best way to manage relationships is not by building CD pipelines that trigger each other. That delays feedback and until the final integration stage that pulls everything together is not testing the reality of the system. What you have then is a monolithic system, and by pretending that you don't you have eliminated a variety of approaches to optimising the build and evaluation process, some of which are MUCH more efficient. For example, Google operates a single repo approach for everything, over 9.5 billion lines of code in a single repo. They use sophisticated, distributed, incremental build and test techniques to give in essence the verification of all 9.5 billion lines on every commit. They maintain "the truth of the system" at that level.
      As soon as you separate out the system into separate repos (sometimes still a good idea, but not if you have to test everything together later) then you add friction, latency, indeterminacy and you increase the problem of tracking which versions work with which.

    • @thescourgeofathousan
      @thescourgeofathousan 2 ปีที่แล้ว

      @@ContinuousDelivery not everything has a dependency on everything else.
      And you don’t necessarily have to wait until the end of the pipeline for the dependency before you trigger the pipeline of the dependent element.
      A lot of assumptions that are not true have gone into the response above.
      I have seen the build processes of people who have tried to replicate the google mega-repo method and they are horrendous Black boxes of complexity that in my last engagement very nearly caused the project to fail as the number of bugs in the build process alarmingly quickly vastly outweighed those in the code of the SW elements themselves.
      It was not until we came in and separated things out where necessary and had the dependencies explicitly modelled and effectively influencing each other at the appropriate stages that the CICD machinery stopped being the cause of failure and the bug count of the SW code started going down because the SW itself was now the most complicated element and not the infrastructure that was supposed to make things easier.
      In ever case I have encountered I have NOT found “sophisticated” build code, I have found Complexity, Bugginess and a raft of special case handling.
      Additionally “appeal to authority” is bad enough as an argument but using google as the source of the authority is extra negative points in my book.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว

      @@thescourgeofathousan Sure you can do it badly, but you can do anything badly. I have consulted with many companies that have implemented the strategy that you describe. Their commonest release schedule was measured in months because of the difficulty of assembling a collection of pieces that work together. They have thrown out all the advantages of CI.
      Sure, different collections of components, services, sub-systems have different levels of coupling, so I think that the best strategy is to define your scope of evaluation to align with "independently deployable units of software" and do CD at that level.
      I am sorry, I don't mean to be rude, but I don't buy the idea that "citing Google is a call to authority". I don't hold Google on a pedestal, we are talking about ways of working and you said "The best way to manage relationships between SW elements is via CICD pipelines that trigger each other due to events that happen to each related element.".
      Google a real world example of not doing that, and succeeding at a massive scale by doing so. So by what measure do you judge "best" by? Not scale, because you reject my Google example. Maybe not speed either, because I can give you examples of orgs working much faster than you can with your strategy - Tesla can change the design of he car, and the factory that produces it in under 3 hours - but is that a call to authority too?
      How about quality? The small team that I led built one, if not the, highest performance financial exchanges in the world. We could get the answer to "is our SW releasable" in under 1 hour for our entire enterprise system, any change whatever its nature in under 1 hour, and we were in prod for 13 months & 5 days before the first defect was noticed by a user.
      Finally there is data, read the State of DevOps reports, and the Accelerate book. They describe the most scientifically justifiable approach to analysing performance in our industry based on over 33k respondents so far. They measure Stability & Throughput, and can predict outcomes, like wether you company will make more money or not, based on their approach to SW dev. They say that if you can't determine the releasability of your SW at least once per day, then you SW will statistically be lower quality (measured by stability) and you will produce it more slowly (measured by Throughput).
      If your system is small, and very simple, it is possible that you can build a system like you described, with chained pipelines, that can answer the question "is my change releasable" for everyone on the team once per day. But I don't believe that this approach scales to SW more complex than the very simple and is able to achieve that. I have not seen it work so far. The long time it takes to get that answer, the more difficult it is to stay on top of failures and understand what is "the current version" for your system.
      I really don't mean to be rude, but this really isn't the "best way", at best it is "sometimes survivable" in my experience.The best way that I have seen working so far, is to match the scope of evaluation to "independently deployable units of software", and the easiest way to do that is to have everything that constitutes that "deployable unit" in the same repo, whatever it's scale.

    • @thescourgeofathousan
      @thescourgeofathousan 2 ปีที่แล้ว

      @@ContinuousDelivery I can’t even imagine you being rude mate so no worries there.
      Sorry if I can in a bit hot.
      I think there are a few misunderstandings between us on the scope or specificity of what I’m referring to as a lot of the ways of working you are talking about above are a) ones I’ve watched you talk about before and b) totally agree with you on but c) I wasn’t really referring to.
      When it comes to what I would call non-intimate relationships such as those between services for example I don’t advocate e2e testing or system integration testing at all. I advocate testing each independent element in isolation and focusing on ensuring adherence to interaction agreements. Which I believe you also espouse.
      What I was referring to specifically was intimate relationships such as between a service and the environment it expects to run within or a contained element and the hardened base container it depends on.
      So, I’ve been in this game for a long time too and my experience has been that not only is it possible to do most things badly but that when dealing with the complexity required to manage a multi-element repo the resulting build code is more complex than the SW you are trying to build etc, more bugs and a faster bug growth rate than the SW being managed as the number of elements that are related but distinct and have differing build, etc profiles than each other increases.
      I don’t believe I have been working with unusually talentless or unintelligent clients who have found themselves in this situation either.
      I’ve also got large, complex systems building and releasing multiple times a day with this method so your conviction that scale and speed cannot be measures I’m using is incorrect.
      Now I can imagine that you would probably have had a lot of success with fixing the problems experienced by correcting mistakes they made in how they implemented their monorepo handling but I just haven’t found it necessary to support the added and hard to surface complexity it brings.
      What I mean by best is that it achieves the desired speed, scale and quality WITHOUT introducing an unnecessary and hidden amount of complexity into the build, package etc management that is out of proportion to the primary SW being developed itself.
      I’ve done this with a combination of using the highly visible and simple mechanism of the pipelines to build only as much complexity as represents the important relationships and leaving the non-intimate relationships to testing techniques like contract testing, mocks.
      It’s hard to see how lumping everything into the same repo and then using complex build code to handle the variations of impacts between all the elements contained therein can be seen as anything BUT monolithology from my perspective. That to me is just moving the monolith from deploy time to build time as if it were inevitable and” had to go somewhere”.
      As for Google not being appeal to authority, the reason I see it as that is that the “well they succeeded so they must have done something right” idea is false. People succeed wildly with bad ideas all the time. Granted this is an extreme example if it is one.
      They also succeed in doing something really hard due to various factors like superior talents at a crucial time being brought to bear etc.l it doesn’t mean it was a good way to go. Success is never an indication or vindication of method validity or superiority. It’s most often just something to be thankful for an analysed more deeply yo find what aspects helped and which were lucky not to have caused downfall.
      This is why I’m never impressed by “Company X has had great success therefore that proves assertion A”.
      Only the deep details of HOW and WHY that success happened can do that.
      So “Google uses a mega repo and they did specific thing A, changed to other specific thing X after D went pear shaped etc etc etc” is the only testament I’m interested in (and would not be appeal to authority).
      The appeal to authority lies in the eliding of the details of how something was accomplished in favour of the very fact that it was.
      Having said all of that I do regret the use of the word “best” as I don’t believe in a “best” way in general.
      I believe in “good” ways where good means the most goals are met and problems solved without introducing new and/or worse ones and “bad” ways that represent the inverse.

  • @softwaretestinglearninghub
    @softwaretestinglearninghub 2 ปีที่แล้ว

    great tips, thank you for sharing!

  • @dushyantchaudhry4654
    @dushyantchaudhry4654 หลายเดือนก่อน

    why might a functional test show a different result if the O/S or database version is chaged? Hope someone can help with clarity on this. I am asking about a version upgrade. Why might the result of the code change on a diferent OS version?

  • @danielevans1680
    @danielevans1680 2 ปีที่แล้ว

    My most common source of intermittent tests is external contract tests - they fail on occasion due to network issues, due to downtime on the external system being tested against talking to, or similar. How would you tackle that without introducing retries?
    In the full system, those retries would be handled by some higher-level error handling and retry logic. To include all of that in the contract test, rather than just the interface being tested, feels like a huge amount of bloat - especially when those retry mechanisms are provided by the cloud provider we're with, not our own code.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +2

      If these are validating the contract for an external, 3rd party system then these aren't the same thing as acceptance tests, and so different rules apply. Sorry for not making that clear. The idea of these tests is just to see "did anything change that breaks my assumptions of that thing I have no control over". In that case, sure a retry is probably the first step in escalating the problem. You are in a nasty state now though. Are these "network issues" because someone was upgrading the beta version in a test env, or is this flakiness a real part of the behaviour of this system in production?
      When I have been in this position, we talked to the supplier to try to understand why these things happen, and figure out how to deal with them "The contract test for X has just failed" - "It's Thursday, they update the system on Thursdays, it will probably be ok in an hour or two".
      Automatically retrying is still a bit risky, but I confess contract tests weren't what I had in mind.

    • @danielevans1680
      @danielevans1680 2 ปีที่แล้ว +1

      Thanks - good to know that there's an exception to the rule! The particular pain here is that it's testing our interface to large, widely used, freely available data services, with which we have no sort of dedicated supplier contract at all, making it difficult to dig into why the systems are sometimes unresponsive or down.
      One might hope that if the likes of Microsoft or (inter-)national government agencies would provide services where flakiness isn't an issue in testing and usage, but that's not proven to be the case. On the other hand, it means we're now pretty aware of this as a potential problem!

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      That's a great question. I deal with the same issues..our QA is down at least 50% of the time, due to client connectivity / client side configuration problems.
      At the moment, I don"t have to deal with this in unit tests, but I foresee a time when it may become my problem.
      Off the cuff, accurate reporting in the code as to the cause of the failure at the origin is very helpful.
      If the test tells you that it's unable to connect to the client at URL xxx , then you have a place to start.
      Perhaps a small suite of diagnostic utilities could play a role.

    •  2 ปีที่แล้ว

      Is mocking the external services possible?

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      @ in my case, we can use simulators to get coverage for most testing.
      But when we are coding for new hardware features, the simulators won't have the new features.
      If we are meeting a new specification that requires client services, then they are usually developing the new service the same time we are.
      As the hardware is extremely expensive, it gets shared by several teams.
      In crunch time, I'm often going in at 2 AM to get exclusive use for development.

  • @thiagoampj
    @thiagoampj 2 ปีที่แล้ว

    I think async and reactive systems suffer more from resource use in tests.
    Async means your test relies on timeouts. Your build machine needs to have very clear use so your tests don't fail waiting results from computations.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +1

      My preference is to monitor events that notify the conclusion of some event, or poll and time-out on results. That way you don’t incur any significant slow down, and the tests are more stable than relying on only timeouts.

    • @albucc
      @albucc 2 ปีที่แล้ว +2

      Lesson learned from a lot of time and frustration with tests: instead of trying to rely on sleeps and waits based on system clocks, just control the "clock". For example, in java, if you rely on Thread.sleep, System.currentTimeMillis and so on, change the system to use a "clock" implementation that you actually control in your tests. Then you can just set the time to whatever you want in the test case, or inject the timeout (or not ) by yourself. The idea is that, if you don't inject anything, the default implementation, with standard dates / system.currentTimeMillis, etc, kicks in. But you can just override that in your tests with your special implementation that can "freeze the clock", set the clock to time X, and so on.

  • @timmartin325
    @timmartin325 2 ปีที่แล้ว

    Please read some some articles from Michael Bolton (developsense) on the topic of software testing, if you have time. He has some very interesting things to say about it.

  • @ntitcoder
    @ntitcoder 2 ปีที่แล้ว

    Hi Dave, I'm working with an existing large code base that's not very well designed. I would like to add tests to assert for existing expected behaviors of the system, so that I can safely refactor the code.
    However as the system is not well designed, it's not friendly for testing and hard to add tests for it.
    How do you approach resolving this chicken and egg problem?

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +2

      My recommendation is to adopt TDD for all new work, but don't try and retro-fit it to code that you aren't actively changing. For that use Approval testing to support refactoring, and Acceptance testing to support releases and to offer protection for areas where you are changing the code.
      The aim is to use the techniques of refactoring and anti-corruption layers of abstraction, to give yourself the freedom to use TDD in the areas where you are doing new work. I cover this in more detail in my commercial training courses. I have this free course on refactoring, and this (very old) YT video - that thinking about it I should probably revisit.
      Training Courses: courses.cd.training (see TDD & ATDD courses)
      Free Refactoring Demo: courses.cd.training/courses/refactoring-tutorial
      "Changing Legacy Code with Continuous Delivery": th-cam.com/video/Z2c3sGUE2GA/w-d-xo.html

  • @stephendgreen1502
    @stephendgreen1502 2 ปีที่แล้ว +1

    Not convinced about keeping a specific version of automated tests alongside a corresponding version of (codebase of) system under test. ATs are surely tied to released code and lag the release, since you are primarily regression testing, testing what is already released. So I would think it typical to develop ATs after releases have settled in so you can focus on stable, valuable features. This is decoupled from code in development or about to be released.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +6

      I don’t agree that acceptance test should come later. They should be written first, as “executable specification” for the system, and used to drive the dev process, to be at their most effective.

    • @stephendgreen1502
      @stephendgreen1502 2 ปีที่แล้ว

      @@ContinuousDelivery OK. So yes, there are two types of automated test or two particular purposes. Regression testing and acceptance testing.

    • @d3stinYwOw
      @d3stinYwOw 2 ปีที่แล้ว +1

      @@ContinuousDelivery It depends on whole organization of a project, or even a company - not everywhere system qualification/integration testing is tightly connected with purely software integration/qualification testing. Even them can be disconnected from tests written by developers themselves :)

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +2

      @@stephendgreen1502 Sorry, that's not what I am saying. In my preferred approach, the Acceptance tests serve both purposes. So we create 1 test that acts as a specification for the work that we are about to do, and then from then on, verifies that the SW does it, and remains true even when the SW changes in design and implementation.
      They also act as documentation of what the system does. I have known some firms that literally use the text of the specification as the documentation that support people read when dealing with customer enquiries, and others that use the specification in automated compliance-mandated release documents.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +3

      @@d3stinYwOw Yes, many organisations are poorly structured to do a good job, and this is one of the ways that this is true. Tesla released a change to the charge rate of the Tesla model 3 car, this involved a physical re-routing of heavy-duty cabling in the car, amongst other changes. The change was made in software, the automated tests all passed, the change was released into production, re-configuring the robots in the factory that build the cars, and 3 hours after the change was committed, the Tesla production line, world-wide I think, was producing cars with a higher maximum charge rate (from 200kW to 250kW).
      To achieve this kind of work, you have to treat the act of software development for what it really is, a holistic practice, and you have to optimise all of it. If organisations are mis-configured to divide up work that you can't afford to be divided, you have to fix that too, of you want to do a great job. The org structure is a tool, it was decided on by people, and people often make mistakes. Separating dev and testing is a mistake. It is not an obvious mistake, it sounds sensible, it just doesn't work very well. So in great organisations, they are will to spot these kinds of problems and fix them.

  • @jrgalyen
    @jrgalyen ปีที่แล้ว

    With tests that have flaky failures, why not rename them or give test runner some way to identify them differently.
    Removing them from the main run. These flaky tests can be scheduled, ran after the first run, anything.
    Decouple bad tests from good tests, where both provide value. Like a functional tests that calls an external resource in dev like a microservice that has outages while deploying. This test is only flaky in dev.
    Could be a quickly written bad test too. Switch it to this flaky run, fix it, then flip it back.

    • @jrgalyen
      @jrgalyen ปีที่แล้ว

      I actually like xunit indeterminate state. If the requirements to run a test are not available, don’t succeed or fail. At least in dev. An environment more stable like QA should fail.
      1.) I don’t think we should rearchitect our system with IoC to fix issues only seen in dev. We should have architected our system better. But since we didn’t, I prefer working software.

    • @ContinuousDelivery
      @ContinuousDelivery  ปีที่แล้ว

      I have used this strategy. Pragmatically it is useful on the way to something better, but ultimately one of the reasons why the test is flaky is because your system is flaky. As long as you continue to release with it being flaky, while pragmatically that may be a useful choice sometimes, at the end you are releasing a system that's broken. So the high quality approach is to treat flaky tests as failing tests, wether or not you decide to release while the test is failing is the same decision as for any other failing test. That seems somehow more "true" to me than treating it as "Maybe passing".

  • @danielkemp9830
    @danielkemp9830 2 ปีที่แล้ว +1

    I like your previous backdrop compared to this one. This one is a little distracting

  • @EZINGARD
    @EZINGARD 2 ปีที่แล้ว +40

    Automated tests won't fail if you don't write them.

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +36

      True, but your code will.

    • @PavelHenkin
      @PavelHenkin 2 ปีที่แล้ว +15

      @@ContinuousDelivery not if you don't write the code either. You can't make a mistake if you do nothing.. ;)

    • @marikselazemaj3428
      @marikselazemaj3428 2 ปีที่แล้ว +3

      @@PavelHenkin Then you are a failure 😂.
      Nothing personal 👍😘

    • @Redheadtama1
      @Redheadtama1 2 ปีที่แล้ว +4

      @@PavelHenkin I like your thinking. I shall proceed to sit in a chair, stare at the wall and bask in my success!

    • @majorhumbert676
      @majorhumbert676 2 ปีที่แล้ว +1

      @@PavelHenkin True, but your company will

  • @davrak711
    @davrak711 ปีที่แล้ว

    Does anyone have any decent strategies for time travel tests????????

    • @ContinuousDelivery
      @ContinuousDelivery  ปีที่แล้ว

      I cover time travel testing in my ATDD & BDD course 😁😎
      courses.cd.training/courses/atdd-from-stories-to-executable-specifications

  • @marcotroster8247
    @marcotroster8247 2 ปีที่แล้ว

    What about intentional randomness due to inherent need for it when testing? I'm thinking of testing shuffle algorithms, mathematical distribution functions or even encryption strength.
    Usually I'm testing such things with entropy based properties. But I'm not so sure if it's actually a good idea 🤔

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      You have conflicting goals when doing this.
      1. You want good test coverage, to insure your code is actually getting a proper shakedown.
      2. You need to finish in a finite amount of time.
      An approach that can accomplish this, is to use predetermined seeds for your random numbers.
      You can then define how many iterations will be run, so you can control how much time the tests take.
      As you have effectively an infinite number of permutations to deal with, you could also add in a finite number of 'random' seeds, and report those numbers in the test.
      If one of those numbers causes a test failure, you can then make that number a permanent part of the test, as it's proven to expose a flaw.

    • @marcotroster8247
      @marcotroster8247 2 ปีที่แล้ว

      @@Weaseldog2001 I see your point. But how can I know that the seed covers all cases for an amount of test parameters drawn from the pseudo-random series? 🤔
      And what do I gain from a precomputed seed? If the test framework logs the test parameters causing the bug, I really don't see much value in repeatability. It's rather dangerous to always test with the same seeds as it might give me wrong confidence 🤔

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      @@marcotroster8247 Those are good questions.
      1. You'll need to evaluate the seeds you apply, by examining function coverage. If your app has full function coverage using any seed, then that one seed might be good enough for the first iteration.
      If there is reason to believe that a seed may on iccasion cause an exception, then you might repeatedly run the test using different seeds until the code you worry over is being executed values that are troublesome.
      2. If code breaks because of a known seed, you want to keep using that seed, to make sure no one puts that bug back. One function of unit testing, is to make sure that historic bugs, do not reappear.
      Your goal is to front load the analysis, to catch as many bug cases as you can, with the fewest and fastest tests.
      As the unit tests add up, the build times will slow, and it reduces productivity for the whole team.
      As I mentioned, using randomized seeds in conjunction with canned ones, is likely a good approach. So long as your test reports the seeds it used.
      Then if the build fails, the engineer looking into it, can add these seeds to the canned list, and use them for troubleshooting .

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      @@marcotroster8247 one thing to consider, if your code changes the order in which it uses the random numbers, then your pre-canned numbers, will no longer have a historical value. At this point deleting your values and running many dev tests on new random seeds may be necessary.

    • @Weaseldog2001
      @Weaseldog2001 2 ปีที่แล้ว

      @@marcotroster8247 I keep thinking of things to add...
      In modern compilers, the seeds are applied, per thread.
      If you set your seed on start up to 11, then spin up a thread and call random, you'll start with a seed of 0. Because you've changed context.
      And this means you could theoretically in a simulation app, end up with a lot of seeds to deal with, and record.
      And that brings you to an architectural problem.
      It's impossible to cover every permutation in a short comment.
      But a possible way to deal with this, if you are spawning a know number of threads, is to seed each thread based on it's index, or order of creation.
      srand(INIT_SEED + Index)

  • @zerettino
    @zerettino 2 ปีที่แล้ว

    What about property based testing, which typically uses random data ?

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +3

      I am not a fan. The value of “tests” as tests is secondary in my opinion. The most effective strategy is to treat them as specifications of the behaviour that we want to add.
      I use such “tests” more as a design-tool first, and only after that, are they useful as tests.
      At which point the property-based tests don’t add much value.

    • @michanackiewicz
      @michanackiewicz 2 ปีที่แล้ว +2

      In my experience if you are using tests that rely on random data it is a good idea to make sure that you are able to rerun your tests with exact same test data (this can be useful when doing black box testing) - this way you can rule out if generated input has anything to do with tests failing.

    • @stephenJpollei
      @stephenJpollei 2 ปีที่แล้ว +1

      Yes, I've written a few tests that use pseudorandom numbers, but when I do, I specify the seed. I haven't used fuzz testers like Google's AFL American Fuzzy Lop. It uses a instrumentation-guided genetic algorithm so kind of by definition the tests it produces will heavily be influenced by the program under test.

    • @zerettino
      @zerettino 2 ปีที่แล้ว

      @@ContinuousDelivery If you go through the effort of writing them once, why wouldn't you keep them ?

    • @zerettino
      @zerettino 2 ปีที่แล้ว

      @@michanackiewicz Yes, sure ! A good PBT framework will allow you to do that, by logging the random seed in case of failure for instance, and allowing you to rerun the failed test with that seed.

  • @a544jh
    @a544jh 2 ปีที่แล้ว

    So are you implying that if you do properly decoupled microservices there is no need to do integration testing for the system as a whole?

    • @ContinuousDelivery
      @ContinuousDelivery  2 ปีที่แล้ว +2

      I am not implying it, I am saying it!
      It’s in the definition of microservices. They are be definition “independently deployable”. What does that mean in an org where more than one service is being developed at the same time? If I change my service, and you change yours. If we test the together, the only way we can be confident in releasing them is if we release them together. If I release mine and you don’t release yours, even though we tested them together, mine may not work with the old version of your service, but I didn’t test with that version. So even for 2 services, if we test them together before release, they aren’t “independently deployable”. So, by definition, you don’t get to test microservices together!

  • @jrgalyen
    @jrgalyen ปีที่แล้ว

    Anything with less complexity than the linux kernel 1.2, the debate where monolith was first used for the power of good, is not a monolith. The linux kernel today is not a monolith! Or even if it is a modular monolith, that is just an ode to its complexity having been one.
    A microservice that exposes more than just http rest endpoints, like grpc, is no longer a microservice. It becomes just an API! http rest endpoint vendor-lock-in begone!
    Hexagonal architecture can make a type of api that can be consumed by modular & load-balance distributed architectures