Security Data Science Learning Resources

Jason Trost
5 min readMay 5, 2019

--

Image credit robohub.org

This short post catalogs some resources that may be useful for those interested in security data science. It is not meant to be an exhaustive list. It is meant to be a curated list to help you get started.

Staying Current with Security Data Science

Here is my current strategy for staying current with security data science research. It leans heavier towards academic research since this is what interests me at the moment.

  1. Google Scholar Publication alerts on known respected researchers.
  2. Google Scholar Citation alerts on interesting or noteworthy papers.
  3. Follow security ML researchers on Twitter and Medium. They frequently share interesting and cutting edge research papers / videos / blogs.
  4. Periodically review proceedings from noteworthy security conferences.
  5. Skim published security conference videos from Irongeek looking for topics of interest.

Google Scholar alerts

Citation Alerts on these papers:

  • “Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence”
  • “AI^ 2: training a big data machine to defend”
  • “APT Infection Discovery using DNS Data”
  • “Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks”
  • “Deep neural network based malware detection using two dimensional binary program features”
  • “Detecting malicious domains via graph inference”
  • “Detecting malware based on DNS graph mining”
  • “Detecting structurally anomalous logins in Enterprise Networks”
  • “Discovering malicious domains through passive DNS data graph analysis”
  • “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”
  • “Enabling network security through active DNS datasets”
  • “Feature-based transfer learning for network security”
  • “Gotcha-Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System”
  • “Guilt by association: large scale malware detection by mining file-relation graphs”
  • “Identifying suspicious activities through dns failure graph analysis”
  • “Polonium: Tera-scale graph mining and inference for malware detection”
  • “Segugio: Efficient behavior-based tracking of malware-control domains in large ISP networks”

New article alerts on these authors with the bolded being the most relevant / interesting to me.

  • Alina Oprea — heavily focused on operational security ML.
  • Josh Saxe, Rich Harang, and Konstantin Berlin — heavily focused on Malware detection/analytics using ML. Also a published book author.
  • Manos Antonakakis and Roberto Perdisci — heavily focused on network security analytics using ML with a specialty in DNS traffic.
  • Balduzzi Marco
  • Battista Biggio
  • Chaz Lever
  • Christopher Kruegel
  • Damon McCoy
  • David Dagon
  • David Freeman
  • Gianluca Stringhini
  • Giovanni Vigna
  • Guofei Gu
  • Han Yufei
  • Hossein Siadati
  • Issa Khalil
  • Jason (Iasonas) Polakis
  • Michael Donald Bailey
  • Michael Iannacone
  • Nick Feamster
  • Niels Provos
  • Nir Nissim
  • Patrick McDaniel
  • Stefan Savage
  • Steven Noel
  • Terry Nelms
  • Ting-Fang Yen
  • Vern Paxson
  • Wenke Lee
  • Yacin Nadji
  • Yanfang (Fanny) Ye
  • Yizheng Chen
  • Yuval Elovici

Twitter

Twitter can be a gold mine for new and relevant ideas, blogs, presentations, etc for security data science. You just need to make sure you continually follow the right folks. Here is a short list of thought leaders in this space (if I left you off it is my oversight so please don’t take offense).

For a more exhaustive list of others I would recommend following on Twitter, see this gist. This list is focused on Threat Intel, Threat Hunting, Detection Engineering, IR, and Security Engineering. It is not exhaustive, but is a good start.

Conferences

Below are several interesting security conferences where research is published on security data science topics. It is a good idea to be on the look out for the proceedings from these events.

This page is also an excellent resource in general for top academic security conferences: Top Academic Security conferences list. The major industry focused security conferences like Blackhat, RSA, Defcon, BSides*, DerbyCon, and ShmooCon all frequently have talks relevant to security data science, but this is not their primary focus, so they are not explicitly called out above.

Learning Resources

These resources will help you build a baseline of knowledge in Cyber Security and Machine Learning.

Books

Security:

Security Data Science:

Machine Learning / Data Science:

Courses

Short Courses / Live Sessions

O’Reilly’s learning platform has some pretty interesting security + ML / DS related “Live Training sessions”. These are usually just a few hours long and all make their course materials available through O’Reilly’s GitLab instance which is open to the public.

Security for Machine Learning by Katharine Jarmul [code]

Hands-on adversarial machine learning by Yacin Nadji [code]

Data science for security professionals by Charles Givre [code]

Enhanced machine learning for cybersecurity by Charles Givre [code]

I hope this is helpful, and I would be interested to hear about other resources that you find useful. Please leave a message here, on Medium, or @ me on twitter!

This blog post was originally published at my personal blog at http://www.covert.io/security-data-science-learning-resources/

–Jason
@jason_trost

--

--

Jason Trost

Interests: Network security, Digital Forensics, Machine Learning, Big Data. retweets are not endorsements.