Darktrace

We consider the rarity of certain strings when analyzing the behavior of devices and users. The strings might be hostnames or domain names in HTTP requests, email addresses, or executable names. We can determine their rarity by observing network traffic, making lists of known entities over time, and comparing new observations to old.

Matryoshka Bloom filters form the basis of a new method to determine rarity by finding out how often a string appears in these lists. A Bloom filter is a method of probabilistically determining if a specific string is a member of a set. A Bloom filter contains an array of bits and computes several hashes for each string to produce a list of indices that point to elements of this array. If these elements are all ones, it is highly probable that the string appears in the set. If a single bit is set to zero, the string does not appear in the set. The Bloom filter uses much less memory than a hash set would use to determine set membership, at the cost of a small but non-zero probability of a false positive. Choosing an appropriate set of filter parameters (such as array size and number of hashes) can reduce this cost.

However, a standard Bloom filter can only give a binary answer to the question of membership. By structuring the popularity data as a sequence of nested sets, each of which is a subset of the next (like Matryoshka dolls), we can construct a series of Bloom filters. These filters are able to give each string a rarity score when queried (between zero and the number of “popularity” sets available). This structure increases the fault tolerance of the filters and reduces the scope for false positives. Additionally, the small size of the filters allows us to distribute the information needed to perform these lookups among devices on a distributed deployment. This reduces bandwidth requirements to keep our rarity data up-to-date.

Researcher

Dr. Matthew Ferguson

Estratti della ricerca

Rilevamento rapido delle anomalie della catena di processo mediante un classificatore multistadio

Ordinamento di lunghi elenchi di nomi di file per rilevanza e contenuto sensibile

Un sistema di analisi dell'attività di rete per rilevare il mining di criptovalute furtivo

Rilevamento autonomo della funzione prevista di una casella di posta elettronica aziendale attraverso il meta-scoring

Utilizzo della teoria epidemiologica per identificare i dispositivi di rete più dannosi

Identificazione automatica degli intervalli IP scansionati

Un classificatore di similarità in tempo reale e autocorrettivo per le e-mail

Uso della teoria dei grafici per identificare i nodi critici delle reti informatiche

Analyzing network activity to detect compromised devices sending spam emails