In this post, I share a few links of examples of using Weak Supervision for cyber security use cases. I was surprised that I couldn’t find more examples.
According to wikipedia…
Weak supervision is a branch of machine learning where noisy, limited, or imprecise sources are used to provide supervision signal for labeling large amounts of training data in a supervised learning setting. This approach alleviates the burden of obtaining hand-labeled data sets, which can be costly or impractical. Instead, inexpensive weak labels are employed with the understanding that they are imperfect, but can nonetheless be used to create a strong predictive model.
Weak supervision was popularized by the release of this paper:Snorkel: Rapid Training Data Creation with Weak Supervision and the library/toolset with the same name: https://github.com/snorkel-team/snorkel
- Firenze: Model Evaluation Using Weak Signals
- Learning to Rank Relevant Malware Strings Using Weak Supervision [video] [code]
- Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction (weak supervision used for creating training set, but not a primary part of the paper)
The “short links” format was inspired by O’Reilly’s Four Short Links series.