Home / Companies / Crowdstrike / Blog / Post Details
Content Deep Dive

EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis

Blog post from Crowdstrike

Post Details
Company
Date Published
Author
-
Word Count
3,399
Language
English
Hacker News Points
-
Summary

CrowdStrike has recently made significant advancements in cybersecurity through the development of EMBERSim, a large-scale dataset aimed at improving malware detection and binary code similarity (BCS) research. This dataset builds on the existing EMBER dataset by incorporating new tags and a co-occurrence algorithm to enhance the detection of similarities in both malicious and benign binaries. EMBERSim has demonstrated improved performance in identifying malware by employing a novel leaf similarity technique using XGBoost, outperforming traditional methods like ssdeep. The company's research efforts, which include this project and others like Threat AI, underscore its commitment to innovation and maintaining a leadership position in cybersecurity, providing robust protection through the AI-driven CrowdStrike FalconĀ® platform.