Do you have a question? Want to learn more about our products and solutions, the latest career opportunities, or our events? We're here to help. Get in touch with us.
IT leaders in Australia and New Zealand are navigating a shifting reality: major disruptions are no longer edge cases, and the defining question has become how quickly you can stand your business back up again
With traditional disaster recovery models struggling to account for modern cyber events that hit primary, secondary and backup environments simultaneously, the playbook is changing. We spoke to Daniel Bowbyes, Associate Director of Strategy at Datacom, about how he approaches building resilient infrastructure for critical systems, reducing dependency risk and designing for disruption.
From my vantage point at Datacom, working across data centres, cloud and cyber with critical infrastructure providers, resilience is a core design principle, not an afterthought. Here’s how I approach it.
Traditional disaster recovery is built around regional events. A building burns down, a flood wipes out a site, staff cannot access the office, so you fail over to a secondary location. In modern cyber events, the primary, secondary and backup environments are often hit at the same time.
Sometimes, as was the case in 2017 when shipping giant Maersk’s global operations were crippled in a NotPetya ransomware attack, a single exploit can render some or all of your locations inoperable.
Backup and disaster recovery (DR) systems are frequently among the first targets of attackers.
So, the mindset has to shift from ‘how do I fail over?’ to ‘how do I rebuild my business from zero?’ using a minimum viable business (MVB) lens. That means:
Agreeing, at executive level, which processes and systems must come back first to keep the organisation alive.
Being realistic about timeframes: some organisations will be rebuilding for weeks.
Accepting that some high-profile systems (for example, CRM) might not be in the first wave of recovery if they are not essential to survival in the first 7-14 days.
The key is to do this prioritisation work before an incident, so when the pressure comes, your CIO is not refereeing a queue of executives all claiming their system is ‘business critical.’
What would you put in your organisation’s top three “must be back within 48 hours” systems if you had to choose today?
Many organisations assume they are resilient because they are multi-cloud or have dual data centres in operation. In reality, there are often hidden single points of failure at the physical, logical and vendor levels. Examples that come up repeatedly:
Cloud regions that sit across the road – or even the other side of the same wall from each other, undermining your “diverse” design.
Basic network services like Network Time Protocol (NTP) time sync or Domain Name System (DNS), which, if misconfigured or unavailable, can stop identity, authentication and cryptography from working even when your core apps are technically online.
Legacy integrations written years ago by a developer who has long since left. The organisation has no clear understanding of how data flows through those glue systems.
AI is accelerating both sides of the equation. Defensive teams are using it to identify vulnerabilities and monitor environments more effectively, but attackers are using automation and AI to move laterally faster and identify exploits at scale.
This is one reason why the separation between production environments and recovery environments matters so much. If the speed of compromise outpaces your ability to detect and contain it, your recovery vault and clean-room approach become the last line of defence – and they need to be designed accordingly
Three factors are becoming crucial to doing this:
Separate backup from cyber recovery. Your cyber recovery vault must be immutable – once data is written, it cannot be altered within the retention window.
Define your clean room: a logically and physically separate environment where you rebuild from that immutable data, without bringing the infection back with you.
Pre-negotiate access to hardware: with lead times for compute and storage stretching to months and compromised assets locked away, contracts that guarantee rapid access to replacement capacity are critical.
Pre-negotiate access to hardware. With lead times for compute and storage stretching to months and compromised assets potentially locked away as evidence, contracts that guarantee rapid access to replacement capacity are critical. The need to quarantine data for forensic purposes can also significantly increase storage costs post-incident, making pre-arranged capacity agreements even more important.
Humans make mistakes, especially under pressure. In a major incident, the last thing you want is a rebuild process that relies entirely on memory and heroics.
Embedding infrastructure as code and automated recovery runbooks changes the game:
Recovery steps for your MVB systems must be scripted, repeatable and kept up to date. You push a button and the environment appears with the right configuration, networking and security controls.
At Datacom, this is paired with practical “hands and feet” capabilities in our data centres. We have engineers who can get to your hardware, swap components and restore access when you cannot physically reach the site, which becomes crucial in regional or global disruption scenarios.
The organisations doing resilience well, particularly in Australia under the Security of Critical Infrastructure Act 2018, have recognised that resilience is not a one-off compliance exercise. They:
Embed resilience and security criteria into architecture, procurement and vendor management processes.
Continuously review suppliers and architectures against evolving regulatory, cyber and geopolitical risk.
Empower their security and compliance functions to say “no” when a decision would increase exposure beyond agreed thresholds, up to and including board-level accountability.
New Zealand is moving in the same direction, with critical infrastructure providers facing sharper expectations, potential penalties and increased oversight. The organisations that succeed will be the ones that treat resilience the way boards treat health and safety – an ongoing obligation, with clear roles, authority and reporting.
For Datacom, the focus is on helping customers build this resilience end to end, from sovereign data centres and private cloud, through public cloud architectures, to cyber recovery, software and application refactoring. This allows our clients to design for disruption rather than hoping to avoid it.
If you think about your own organisation, where does your resilience posture feel strongest today, and where does it feel uncomfortably dependent on a single vendor, region or legacy system that only a couple of people truly understand?
Ready to assess your organisation's resilience posture?
Book a session with our digital resilience team to map your minimum viable business, identify hidden dependencies across your infrastructure platform and design recovery capability that matches today's threat landscape.
Daniel Bowbyes shares a practical framework for building digital resilience – from minimum viable business thinking to dependency mapping, cyber recovery design and governance.