Shostack + Friends Blog Archive


Data driven pen tests

So I’m listening to the “Larry, Larry, Larry” episode of the Risk Hose podcast, and Alex is talking about data-driven pen tests. I want to posit that pen tests are already empirical. Pen testers know what techniques work for them, and start with those techniques.

What we could use are data-driven pen test reports. “We tried X, which works in 78% of attempts, and it failed.”

We could also use more shared data about what tests tend to work.


11 comments on "Data driven pen tests"

  • dunsany says:

    Here, here. I’ve been waiting for some of the major pen-testers to VERISize their data. I’d love to know what techniques work in what circumstances, what controls consistently fail… and what changes in subsequent examinations.

  • Andre Gironda says:

    Yeah, welcome to Core Insight (yes, not Core Impact) or Trustwave PenTest Manager.

    Or The Dradis Framework. Or MagicTree.

    Or HoneyApps Conduit. Or SHODAN. Or Cayman services.

    Or AppSec Risk Management.

    Or Cigital ESP.

  • Yes, it makes sense. The interesting thing would be the appropriate definition of the tests (is the “X”, in your example, the same across different pentest providers?). The assumptions and pre-reqs of each test would also be very hard to define, but it seems that it could be done.

    I wrote something similar a few months ago:



  • CG says:

    I like the idea and i think it could be useful.

    However, they need to drop the pentest part. you are solidly into the vulnerability assessment part of things when you are talking about “ok, i tried 1,2,3,4,5 and 1 & 3 worked” ok on to the next set of tests… thats vulnerability assessment (with exploitation if you want to get technical) and not pentesting.

    pentesting is about that human looking at the problem and figuring out how to break it, not some scanner, thats going to be very hard to standardize and put hard numbers on and i dont think its going to be possible without tying up your tester’s time with bullshit.

  • Orac says:

    I think if we could start with pen tests that are repeatable, record the things that didn’t work and record the things that weren’t tried we might start getting more value.

    How about a test plan and test script agreed before the test starts? Maybe a draft attack tree in the proposal?

    What about a common set of definitions of negative tests for common system deployments so we can compare different pen testers outputs to each other?

    The rush to the bottom of commodity pricing leads to cheap tests that deliver little value. The plaintive pen tester cry of “not enough time, your fault we couldn’t cover everything” drowns out the fact that the proposals for the work are cut to the bone to win on price not to deliver value.

    Without any independent measurement of the value of pen testing the financial customers (the ones paying not the ones using the reports) have a simple metric to judge their decision… the cost.

  • hockey11 says:

    Unfortunately testing exercises suffer from the same disparate roadblocks that scanners used to before SCAP (or at least a more unified testing input/output). Since it depends largely on the scenario, platform, environment, etc…it might take a bit more work to standardize. For now, some tools have sanitized aggregate data that ‘might’ be useful. For instance I could demonstrate to a customer that a successful exploit in their environment was also successful against many other customers, but the data sharing stops there and is 100% predicated upon the tool I used and the demographic information collected by the vendor. And that’s just a popular technical exploit, not inclusive of what I may find manually through my own process of ‘what works’ for me personally as a tester. Testing does depend in part on the testers awareness of all the scope items separately and as part of the overall system, which makes it a bit more difficult to standardize as well.

    Testing can be a commodity, but in itself it’s still useful as ‘validation’ in a security program, which should inherently have value on its own with the right organization. But I do agree that there is a need for solid data to present and share above and beyond just that particular test itself. And not unlike anything else in this world, the risk of engaging a low-cost, poor quality service can be mitigated by asking the vendor questions. If you don’t put some care into your goals and who you engage, you run the risk of getting the same type of effort in return when you engage those lowest bidders.

  • Mark Kelly says:

    I think to ask for a data driven pen test is to not quite understand the intent of the exercise. A Pen test typically used the method of least resistance so is not meant to be “data driven”. I would think a vulnerability assessment tool would be much more suited to the desire to quantify vulnerabilities.

  • Arian says:

    That is how WhiteHat works, and how Sentinel works. We generate stats on every test battery and pattern-match every 24 hours and analyze and optimize (the stats are sliced a dozen different ways, by technology tested, by mix of test, etc. for better decision tree making). We publish stats on the aggregate whole. People seem to really like that. I’d love to see HP or IBM publish some specific states about their DAST or SAST testing accuracy and efforts.

    In general – I think your idea of sharing is great. I just don’t see many vendors opening the kimono.

  • Terrence says:

    Agreed, but I do feel like we can automate 80% of the manual investigation. We can standardize the fundamental approach of the pen-test and then layer tools upon that. Seems not un reasonable.

Comments are closed.