文档库 最新最全的文档下载
当前位置:文档库 › The security of machine learning

The security of machine learning

The security of machine learning
The security of machine learning

eScholarship provides open access, scholarly publishing

services to the University of California and delivers a dynamic

research platform to scholars worldwide.

Previously Published Works

Peer Reviewed

Title:

The security of machine learning

Author:Barreno, Marco ; Nelson, Blaine ; Joseph, Anthony D.; Tygar, J. D.

Publication Date:

2010

Publication Info:

Previously Published Works, Multi-Campus

Permalink:https://www.wendangku.net/doc/8b14120745.html,/uc/item/2p33k2mj

DOI:https://www.wendangku.net/doc/8b14120745.html,/10.1007/s10994-010-5188-5

Abstract:

Machine learning’s ability to rapidly evolve to changing and complex situations has helped it become a fundamental tool for computer security. That adaptability is also a vulnerability: attackers can exploit machine learning systems. We present a taxonomy identifying and analyzing attacks against machine learning systems. We show how these classes influence the costs for the attacker and defender, and we give a formal structure defining their interaction. We use our framework to survey and analyze the literature of attacks against machine learning systems. We also illustrate our taxonomy by showing how it can guide attacks against SpamBayes, a popular statistical spam filter. Finally, we discuss how our taxonomy suggests new lines of defenses.

Copyright Information:

All rights reserved unless otherwise indicated. Contact the author or original publisher for any necessary permissions. eScholarship is not the copyright owner for deposited works. Learn more at https://www.wendangku.net/doc/8b14120745.html,/help_copyright.html#reuse

Mach Learn(2010)81:121–148

DOI10.1007/s10994-010-5188-5

The security of machine learning

Marco Barreno·Blaine Nelson·Anthony D.Joseph·

J.D.Tygar

Received:8April2008/Revised:15April2010/Accepted:15April2010/Published online:20May2010?The Author(s)2010.This article is published with open access at https://www.wendangku.net/doc/8b14120745.html,

Abstract Machine learning’s ability to rapidly evolve to changing and complex situations has helped it become a fundamental tool for computer security.That adaptability is also a vulnerability:attackers can exploit machine learning systems.We present a taxonomy identifying and analyzing attacks against machine learning systems.We show how these classes in?uence the costs for the attacker and defender,and we give a formal structure de?ning their interaction.We use our framework to survey and analyze the literature of attacks against machine learning systems.We also illustrate our taxonomy by showing how it can guide attacks against SpamBayes,a popular statistical spam?lter.Finally,we discuss how our taxonomy suggests new lines of defenses.

Keywords Security·Adversarial learning·Adversarial environments

1Introduction

If we hope to use machine learning as a general tool for computer applications,it is incum-bent on us to investigate how well machine learning performs under adversarial conditions. When a learning algorithm performs well in adversarial conditions,we say it is an algo-rithm for secure learning.This raises the natural question:how do we evaluate the quality of a learning system and determine whether it satis?es requirements for secure learning? Editors:Pavel Laskov and Richard Lippmann.

M.Barreno()·B.Nelson·A.D.Joseph·J.D.Tygar

Computer Science Division,University of California,Berkeley,CA94720-1776,USA

e-mail:barreno@https://www.wendangku.net/doc/8b14120745.html,

B.Nelson

e-mail:nelsonb@https://www.wendangku.net/doc/8b14120745.html,

A.D.Joseph

e-mail:adj@https://www.wendangku.net/doc/8b14120745.html,

J.D.Tygar

e-mail:tygar@https://www.wendangku.net/doc/8b14120745.html,

Machine learning advocates have proposed learning-based systems for a variety of secu-rity applications,including spam detection and network intrusion detection.Their vision is that machine learning will allow a system to respond to evolving real-world inputs,both hos-tile and benign,and learn to reject undesirable behavior.The danger is that an attacker will attempt to exploit the adaptive aspect of a machine learning system to cause it to fail.Fail-ure consists of causing the learning system to produce errors:if it misidenti?es hostile input as benign,hostile input is permitted through the security barrier;if it misidenti?es benign input as hostile,desired input is rejected.The adversarial opponent has a powerful weapon: the ability to design training data that will cause the learning system to produce rules that misidentify inputs.If users detect the failure,they may lose con?dence in the system and abandon it.If users do not detect the failure,then the risks can be even greater.

It is well established in computer security that evaluating a system involves a con-tinual process of?rst,determining classes of attacks on the system;second,evaluating the resilience of the system against those attacks;and third,strengthening the system against those classes of attacks.Our paper follows exactly this model in evaluating secure learning.

First,we identify different classes of attacks on machine learning systems(Sect.2). While many researchers have considered particular attacks on machine learning systems, previous research has not presented a comprehensive view of attacks.In particular,we show that there are at least three interesting dimensions to potential attacks against learning sys-tems:(1)they may be Causative in that they alter the training process,or they may be Exploratory and exploit existing weaknesses;(2)they may be attacks on Integrity aimed at false negatives(allowing hostile input into a system)or they may be attacks on Availabil-ity aimed at false positives(preventing benign input from entering a system);and(3)they may be Targeted at a particular input or they may be Indiscriminate in which inputs fail. Each of these dimensions operates independently,so we have at least eight distinct classes of attacks on machine learning system.We can view secure learning as a game between an attacker and a defender;the taxonomy determines the structure of the game and cost model.

Second,we consider how resilient existing systems are against these attacks(Sect.3). There has been a rich set of work in recent years on secure learning systems,and we evaluate many attacks against machine learning systems and proposals for making systems secure against attacks.Our analysis describes these attacks in terms of our taxonomy and secure learning game,demonstrating that our framework captures the salient aspects of each attack.

Third,we investigate some potential defenses against these attacks(Sect.4).Here the work is more tentative,and it is clear that much remains to be done,but we discuss a variety of techniques that show promise for defending against different types of attacks.

Finally,we illustrate our different classes of attacks by considering a contemporary ma-chine learning application,the SpamBayes spam detection system(Sect.5).We construct realistic,effective attacks by considering different aspects of the threat model according to our taxonomy,and we discuss a defense that mitigates some of the attacks.

This paper provides system designers with a framework for evaluating machine learning systems for security applications(illustrated with our evaluation of SpamBayes)and sug-gests directions for developing highly robust secure learning systems.Our research not only proposes a common language for thinking and writing about secure learning,but goes be-yond that to show how our framework works,both in algorithm design and in real system evaluation.We present an essential?rst step if machine learning is to reach its potential as a tool for use in real systems in potentially adversarial environments.

1.1Notation

We focus on binary classi?cation for security applications,in which a defender attempts

to separate instances of input(data points),some or all of which come from a malicious

attacker,into harmful and benign classes.This setting covers many interesting security ap-

plications,such as host and network intrusion detection,virus and worm detection,and spam

?ltering.In detecting malicious activity,the positive class(label1)indicates malicious intru-

sion instances while the negative class(label0)indicates benign normal instances.A classi-

?cation error is a false positive(FP)if a normal instance is classi?ed as positive and a false

negative(FN)if an intrusion instance is classi?ed as negative.

It may be interesting as well to consider cases where a classi?er has more than two

classes,or even a real-valued output.Indeed,the spam?lter SpamBayes,which we consider

in our experiments in Sect.5,uses three labels so it can explicitly label some messages

unsure.However,generalizing the analysis of errors to more than two classes is not straight-

forward,and furthermore most systems in practice make a single fundamental distinction

(for example,spam messages that the user will never see vs.non-spam and unsure messages

that the user will see).For these reasons,and in keeping with common practice in the liter-

ature,we limit our analysis to binary classi?cation and leave extension to the multi-class or

real-valued cases as future work.

In the supervised classi?cation problem,the learner trains on a dataset of N instances,

X={(x,y)|x∈X,y∈Y}N,given an instance space X and the label space Y={0,1}.

Given some hypothesis classΩ,the goal is to learn a classi?cation hypothesis(classi?er)

f?∈Ωto minimize errors when predicting labels for new data,or if our model includes a cost function over errors,to minimize the total cost of errors.The cost function assigns a nu-

meric cost to each combination of data instance,true label,and classi?er label.The defender

chooses a procedure H,or learning algorithm,for selecting hypotheses.The classi?er may

periodically interleave training steps with the evaluation,retraining on some or all of the

accumulated old and new data.In adversarial environments,the attacker controls some of

the data,which may be used for training.We assume that the learner has some way to get the

true labels for its training data and for the purpose of computing cost;the true label might

come from manual classi?cation of a training set or from observing the effect of instances

on a test system,for example.

The procedure can be any method of selecting a hypothesis;in statistical machine learn-

ing,a common procedure is(regularized)empirical risk minimization.This procedure is an

optimization problem where the objective function has an empirical risk term and a regular-

ization term.Since true cost is often not representable precisely and ef?ciently,we calculate

risk as the expected loss given by a loss function that approximates true cost;the regu-

larization termρcaptures some notion of hypothesis complexity to prevent over?tting the

training data,using a weightλto quantify the trade-off.This procedure?nds the hypothesis

minimizing:

f?=argmin

f∈Ω

(x,y)∈X

(y,f(x))+λρ(f)(1)

Many learning methods make a stationarity assumption:training data and evaluation data are drawn from the same distribution.This assumption allows us to minimize the risk on the training set as a surrogate for risk on the evaluation data,since evaluation data are not known at training time.However,real-world sources of data often are not stationary and,even worse,attackers can easily break the stationarity assumption with some control of

Table1Notation in this paper

X Space of data instances

Y Space of data labels;for classi?cation Y={0,1}

D Space of distributions over(X×Y)

ΩSpace of hypotheses f:X→Y

P T∈D Training distribution

P E∈D Evaluation distribution

P∈D Distribution for training and evaluation(Sect.4.2.3) x∈X Data instance

y∈Y Data label

X,E,Z,C,T i,Q i∈(X×Y)N Datasets

H:(X×Y)N→ΩProcedure for selecting hypothesis

A T,A E:X N×Ω→D Procedures for selecting distribution

:Y×Y→R0+Loss function

C:X×Y×Y→R Cost function

f:X→Y Hypothesis(classi?er)

f?:X→Y Best hypothesis

N Number of data points

K Number of repetitions of a game

M Number of experts(Sect.4.2.3)

λTrade-off parameter for regularized risk minimization either training or evaluation instances.Analyzing and strengthening learning methods in the face of a broken stationarity assumption is the crux of the secure learning problem.

We model attacks on machine learning systems as a game between two players,the attacker and the defender.The game consists of a series of moves,or steps.Each move encapsulates a choice by one of the players:the attacker alters or selects data;the defender chooses a training procedure for selecting the classi?cation hypothesis.

Table1summarizes the notation we use in this paper.

2Framework

2.1Security analysis

Properly analyzing the security of a system requires identifying security goals and a threat model.Security is concerned with protecting assets from attackers.A security goal is a requirement that,if violated,results in the partial or total compromise of an asset.A threat model is a pro?le of attackers,describing motivation and capabilities.Here we analyze the security goals and threat model for machine learning systems.

Classi?ers are used to make distinctions that advance security goals.For example,a virus detection system has the goal of reducing susceptibility to virus infection,either by detect-ing the virus in transit prior to infection or by detecting an extant infection to expunge. Another example is an intrusion detection system(IDS),which identi?es compromised sys-tems,usually by detecting malicious traf?c to and from the system or by detecting suspicious

behavior in the system.A closely related concept is the intrusion prevention system(IPS), which detects intrusion attempts and then acts to prevent them from succeeding.

In this section we describe security goals and a threat model that are speci?c to machine learning systems.

2.1.1Security goals

In a security context the classi?er’s purpose is to classify malicious events and prevent them from interfering with system operations.We split this general learning goal into two goals:–Integrity goal:To prevent attackers from reaching system assets.

–Availability goal:To prevent attackers from interfering with normal operation.

There is a clear connection between false negatives and violation of the integrity goal:ma-licious instances that pass through the classi?er can wreak havoc.Likewise,false positives tend to violate the availability goal because the learner itself denies benign instances.

2.1.2Threat model

Attacker goal/incentives In general the attacker wants to access system assets(typically with false negatives)or deny normal operation(usually with false positives).For example,a virus author wants viruses to pass through the?lter and take control of the protected system (a false negative).On the other hand,an unscrupulous merchant may want sales traf?c to a competitor’s web store to be blocked as intrusions(false positives).

We assume that the attacker and defender each have a cost function that assigns a cost to each labeling for any given instance.Cost can be positive or negative;a negative cost is a bene?t.It is usually the case that low cost for the attacker parallels high cost for the defender and vice-versa;the attacker and defender would not be adversaries if their goals were aligned.In this paper,unless otherwise stated,for ease of exposition we assume that every cost for the defender corresponds to a similar bene?t for the attacker and vice-versa. This assumption is not essential to our arguments,which extend easily to arbitrary cost functions.We take the defender’s point of view,so we use“high-cost”to mean high positive cost for the defender.

Attacker capabilities We assume that the attacker has knowledge of the training algorithm, and in many cases partial or complete information about the training set,such as its distri-bution.For example,the attacker may have the ability to eavesdrop on all network traf?c over the period of time in which the learner gathers training data.The attacker may be able to modify or generate data used in training;we consider cases in which the attacker can and cannot control some of the learner’s training data.While we think that this is the most accurate assumption for most cases,if we do err,we wish to follow the common practice in computer security research of erring on the side of overestimating the attacker’s capabilities rather than underestimating them.

In general we assume the attacker can generate arbitrary instances;however,many set-tings do impose signi?cant restrictions on the instances generated by the attacker.For exam-ple,when the learner trains on data from the attacker,sometimes it is safe to assume that the attacker cannot choose the label for training,such as when training data is carefully hand labeled.As another example,an attacker may have complete control over data packets being sent from the attack source,but routers in transit may add to or alter the packets as well as affect their timing and arrival order.

When the attacker controls training data,an important limitation to consider is what fraction of the training data the attacker can control and to what extent.If the attacker has

arbitrary control over100%of the training data,it is dif?cult to see how the learner can learn anything useful;however,even in such cases there are learning strategies that can make the attacker’s task more dif?cult(see Sect.4.2.3).We primarily examine intermediate cases and explore how much in?uence is required for the attacker to defeat the learning procedure.

2.2Taxonomy

We present a taxonomy categorizing attacks against learning systems along three axes:

I NFLUENCE

–Causative attacks in?uence learning with control over training data.

–Exploratory attacks exploit misclassi?cations but do not affect training.

S ECURITY VIOLATION

–Integrity attacks compromise assets via false negatives.

–Availability attacks cause denial of service,usually via false positives.

S PECIFICITY

–Targeted attacks focus on a particular instance.

–Indiscriminate attacks encompass a wide class of instances.

The?rst axis describes the capability of the attacker:whether(a)the attacker has the ability to in?uence the training data that is used to construct the classi?er(a Causative attack)or(b)the attacker does not in?uence the learned classi?er,but can send new instances to the classi?er and possibly observe its decisions on these carefully crafted instances(an Exploratory attack).

The second axis indicates the type of security violation the attacker causes:either(a)al-lowing harmful instances to slip through the?lter as false negatives(an Integrity violation); or(b)creating a denial of service in which benign instances are incorrectly?ltered as false positives(an Availability violation).

The third axis refers to how speci?c the attacker’s intention is:whether(a)the attack is highly Targeted to degrade the classi?er’s performance on one particular instance or(b)the attack aims to cause the classi?er to fail in an Indiscriminate fashion on a broad class of instances.Each axis,especially this one,is actually a spectrum of choices.

A preliminary version of this taxonomy appears in previous work(Barreno et al.2006). Here we extend the framework and show how the taxonomy shapes the game played by the attacker and defender(see Sect.2.4).The INFLUENCE axis of the taxonomy determines the structure of the game and the move sequence.The SPECIFICITY and SECURITY VIOLATION axes of the taxonomy determine the general shape of the cost function:an Integrity attack bene?ts the attacker on false negatives,and therefore focuses high cost(to the defender)on false negatives,and an Availability attack focuses high cost on false positives;a Targeted attack focuses high cost only on a small number of instances,while an Indiscriminate attack spreads high cost over a broad range of instances.

2.3Examples

Here we give four hypothetical attack scenarios,each with two variants,against a token-based spam?lter that uses machine learning,such as the SpamBayes?lter discussed in Sect.5.Table2summarizes the taxonomy and shows where these examples?t within it.

Table2Our taxonomy of attacks against machine learning systems,with examples from Sect.2.3

Integrity Availability

Causative

Targeted The spam foretold:mis-train The rogue?lter:mis-train?lter to

a particular spam block a certain message

Indiscriminate The spam foretold:mis-train any of The rogue?lter:mis-train?lter to

several spams broadly block normal email

Exploratory

Targeted The shifty spammer:obfuscate The unwanted reply:?ood a particular

a chosen spam target inbox

Indiscriminate The shifty spammer:obfuscate The unwanted reply:?ood any of

any spam several target inboxes

This section is meant to give the reader an intuition for how the taxonomy organizes at-tacks against machine learning systems.There are many practical considerations that may make attacks dif?cult.In Sect.3we discuss some concrete attacks that have been published in the literature,which more fully address relevant practical concerns(see Sect.3.5for a summary).

2.3.1Causative Integrity attack:The spam foretold

In a Causative Integrity attack,the attacker uses control over training to cause spam to slip past the classi?er as false negatives.

Example:an attacker wants the defender’s spam?lter not to?ag a novel spam message. The defender re-trains periodically on new messages,so the attacker sends non-intrusion traf?c that is carefully chosen to resemble the desired spam,such as by including many of the spam’s tokens re-ordered into a benign message,to mis-train the learner to fail to block the eventual spam campaign.As a trivial example,the spam sales pitch“What watch do you want?Really,buy it now!”could be recast as“Watch what you buy now!Do you really want it?”;these excerpts have different semantic meanings but appear identical to a unigram spam ?lter.The primary limitation of this attack is ensuring that the?lter is retrained on the attack messages before the campaign commences.The attack’s success would also depend on the particular words in the spam—messages that contain words highly indicative of spam,such as‘Viagra,’may be more dif?cult to mis-train.

This example might be Targeted if the attacker already has a particular spam to send and needs to cause the learner to miss that particular message.It might be Indiscriminate,on the other hand,if the attacker has a polymorphic spam and could use any of a large number of variants of it for the campaign,in which case the attack need only fool the learner on any one of these variations.

2.3.2Causative Availability attack:The rogue?lter

In a Causative Availability attack,the attacker uses control over training instances to inter-fere with operation of the mailing system,such as by blocking legitimate email.

Example:an attacker wants normal email to be classi?ed as spam so the intended recip-ient doesn’t receive them.The attacker generates and sends attack messages that resemble

benign messages when the defender is collecting training data to train the spam?lter.The attack messages include both a spam component and benign words,which subsequently also become erroneously associated with spam.When the learner trains on the attack data,the spam?lter will start to?lter normal messages as spam.The primary impediment to this attack is for the attacker to ensure their messages are used to train the?lter,though in many cases this is realistic since all available messages are used for training.

We more fully explore an attack of this type in Sect.5.

This attack could be Targeted to block a particular message(see focused attacks in Sect.5).On the other hand,it might be Indiscriminate and attempt to block a signi?cant portion of all legitimate messages(see dictionary attacks in Sect.5).

2.3.3Exploratory Integrity attack:The shifty spammer

In an Exploratory Integrity attack,the attacker crafts spam so as to evade the classi?er with-out direct in?uence over the classi?er itself.

Example:an attacker modi?es and obfuscates spam,such as by changing headers or by exchanging common spam words for less common synonyms.If successful,these modi?-cations prevent the?lter from recognizing the altered spam as malicious,so they are placed into the user’s inbox(see Sect.3.3for additional discussion of these attacks that use good words to sanitize spam).A practical consideration for this attack is whether the attacker has a feedback mechanism to observe the success of the evasion.Also,the obfuscated message may have less value to the spammer(it may be less effective at generating sales),so there might be a limit on the extent of obfuscation that is practical.

In the Targeted version of this attack,the attacker has a particular spam to get past the ?lter.In the Indiscriminate version,the attacker has no particular preference and can search for any spam that succeeds,such as by modifying a number of different spam campaigns to see which modi?cations evade the?lter(of course,in this case the attacker must take care not to appear anomalous simply due to the large number of exploratory instances).

2.3.4Exploratory Availability attack:The mistaken identity

In an Exploratory Availability attack,the attacker interferes with legitimate operation with-out in?uence over training.

Example:an attacker wishes to interfere with the target’s ability to read legitimate email messages.The attacker creates a spam campaign in which the target’s email address appears as the From:address of the spam messages.The target will receive a?ood of spurious, non-spam messages such as bounces for nonexistent addresses,vacation replies,and angry responses demanding the spam to stop.For a large enough spam campaign,these messages will completely overwhelm the target’s inbox,making the task of?nding the legitimate messages among them so costly as to be impractical.This attack is within the means of most large-scale spammers;however,since it requires a large number of messages to use the same From:address,the number of targets may not be very high.Furthermore,some spam ?lters now block many bounce messages,so the volume of the spam campaign may need to be high enough to overwhelm the target solely with vacation responses and hand-written replies.

It does not make sense to consider this attack targeted to speci?c messages,since it necessarily affects the entire inbox;however,the attacker might target certain people or a broader set.In the Targeted version,the attacker has a particular inbox to?ood with replies. In the Indiscriminate version,the attacker has a broader set of inboxes to choose among, such as when going after an organization rather than a single individual.

2.4The adversarial learning game

This section models attacks on learning systems as games where moves represent strategic choices.The choices and computations in a move depend on information produced by pre-vious moves (when a game is repeated,this includes previous iterations).In an Exploratory attack,the attacker chooses a procedure A E that affects the evaluation data E ,and in a Causative attack,the attacker also chooses a procedure A T to manipulate the training data X .The defender chooses a learning algorithm H .This formulation gives us a theoretical basis for analyzing the interactions between attacker and defender.Refer back to Table 1for a summary of notation.

The choices made by the attacker and defender offer a variety of different domain-speci?c strategies for both players.In Sects.3and 4,we discuss how attack and defense strategies have been developed in different domains,and we highlight important aspects of the game in those settings.In Sects.3.1and 3.2,we discuss practical examples of the attacker’s choices in the Causative game,and we discuss the defender’s choice in Sect.4.2.Similarly,for an Exploratory attack,we discuss realistic instances of the attacker’s choice for A E in Sects.3.3and 3.4,and we discuss the defender’s choice for an algorithm H in Sect.4.1.

2.4.1Exploratory game

First we present the formal version of the game for Exploratory attacks,and then we explain it in greater detail:

1.Defender Choose procedure H for selecting hypothesis

2.Attacker Choose procedure A E for selecting distribution

3.Evaluation:

Reveal distribution P T –

Sample dataset X from P T –

Compute f ←H (X )–

Compute P E ←A E (X ,f )–

Sample dataset E from P E

–Assess total cost: (x,y)∈E C(x,f (x),y)

The defender’s move is to choose a learning algorithm (procedure)H for creating hy-potheses from datasets.Many procedures used in machine learning have the form of (1).For example,the defender may choose a support vector machine (SVM)with a particular kernel,loss,regularization,and cross-validation plan.The attacker’s move is then to choose a procedure A E to produce a distribution on which to evaluate the hypothesis that H gen-erates.(The degree of control the attacker has in generating the dataset and the degree of information about X and f that A E has access to are setting-speci?c.)

After the defender and attacker have both made their choices,the game is evaluated.A training dataset X is drawn from some ?xed and possibly unknown distribution P T ,and training produces f =H (X ).The attacker’s procedure A E produces distribution P E ,which is based in general on X and f ,and an evaluation dataset E is drawn from P E .Finally,the attacker and defender incur cost based on the performance of f evaluated on E .

The procedure A E generally depends on X and f ,but the amount of information an attacker actually has is setting speci?c (in the least restrictive case the attacker knows X and f completely).The attacker may know a subset of X or the family Ωof f .However,the procedure A E may also involve acquiring information dynamically.For instance,in many cases,the procedure A E can query the classi?er,treating it as an oracle that provides labels

for query instances;this is one particular degree of information that A E can have about f .Attacks that use this technique are probing attacks .Probing can reveal information about the classi?er.On the other hand,with suf?cient prior knowledge about the training data and algorithm,the attacker may be able to ?nd high-cost instances without probing.

2.4.2Causative game

The game for Causative attacks is similar:

1.Defender Choose procedure H for selecting hypothesis

2.Attacker Choose procedures A T and A E for selecting distributions

3.Evaluation:

Compute P T ←A T –

Sample dataset X from P T –

Compute f ←H (X )–

Compute P E ←A E (X ,f )–

Sample dataset E from P E

–Assess total cost: (x,y)∈E C(x,f (x),y)

This game is very similar to the Exploratory game,but the attacker can choose A T to affect the training data X .The attacker may have various types of in?uence over the data,ranging from arbitrary control over some fraction of instances to a small biasing in?uence on some aspect of data production;details depend on the setting.

Control over data used for training opens up new strategies to the attacker.Cost is based on the interaction of f and E .In the Exploratory game the attacker chooses E while the defender controls f ;in the Causative game the attacker also has in?uence on f .With this in?uence,the attacker can proactively cause the learner to produce bad classi?ers.

2.4.3Iteration

We have analyzed these games as one-shot games ,in which players minimize cost when each move happens only once.We can also consider an iterated game ,in which the game repeats several times and players minimize total accumulated cost.In this setting,we assume players have access to all information from previous iterations of the game.

3Attacks:categorizing related work

This section surveys examples of learning in adversarial environments from the literature.Our taxonomy provides a basis for evaluating the resilience of the systems described,ana-lyzing the attacks against them in preparation for constructing defenses.

3.1Causative Integrity attacks

Contamination in PAC learning Kearns and Li (1993)extend Valiant’s probably approx-imately correct (PAC)learning framework (Valiant 1984,1985)to prove bounds for mali-ciously chosen errors in the training data.In PAC learning,an algorithm succeeds if it can,with probability at least 1?δ,learn a hypothesis that has at most probability εof making an incorrect prediction on an example drawn from the same distribution.Kearns and Li ex-amine the case where an attacker has arbitrary control over some fraction βof the training

Table3Related work in the taxonomy

Integrity Availability

Causative

Targeted Kearns and Li(1993),Kearns and Li(1993),Newsome et al.(2006), Newsome et al.(2006)Chung and Mok(2007),Nelson et al.(2008)

Indiscriminate Kearns and Li(1993),Kearns and Li(1993),Newsome et al.(2006), Newsome et al.(2006)Chung and Mok(2007),Nelson et al.(2008)

Exploratory

Targeted Tan et al.(2002),Moore et al.(2006)

Lowd and Meek(2005b),

Wittel and Wu(2004),

Lowd and Meek(2005a)

Indiscriminate Fogla and Lee(2006),Moore et al.(2006)

Lowd and Meek(2005b),

Wittel and Wu(2004)

examples(this speci?es the form that A T takes in our Causative game).They prove that in general the attacker can prevent the learner from succeeding ifβ≥ε/(1+ε),and for some classes of learners they show this bound is tight.

This work provides an interesting and useful bound on the ability to succeed at PAC-learning.The analysis broadly concerns both Integrity and Availability attacks as well as both Targeted and Indiscriminate.However,not all learning systems fall into the PAC-learning model.

Red herring attack Newsome et al.(2006)present Causative Integrity and Causative Availability attacks against Polygraph(Newsome et al.2005),a polymorphic virus detector that learns virus signatures using both a conjunction learner and a naive-Bayes-like learner. They present red herring attacks against conjunction learners that exploit certain weaknesses not present in other learning algorithms(these are Causative Integrity attacks,both Targeted and Indiscriminate).

The idea is that the attacker chooses P T to introduce spurious features into all malicious instances that the defender uses for training.The malicious instances produced by P E,how-ever,lack the spurious features and therefore bypass the?lter,which erroneously generalized that the spurious features were necessary elements of the malicious behavior.

3.2Causative Availability attacks

Correlated outlier attack Newsome et al.(2006)also suggest a correlated outlier attack, which attacks a naive-Bayes-like learner by adding spurious features to positive training in-stances,causing the?lter to block benign traf?c with those features(an Availability attack).

As with the red herring attacks,these correlated outlier attacks?t neatly into our causative game;this time P T includes spurious features in malicious instances,causing H to produce an f that classi?es many benign instances as malicious.

Allergy attack Chung and Mok(2006,2007)present Causative Availability attacks against the Autograph worm signature generation system(Kim and Karp2004).Autograph operates

in two phases.First,it identi?es infected nodes based on behavioral patterns,in particular scanning behavior.Second,it observes traf?c from the identi?ed nodes and infers blocking rules based on observed patterns.Chung and Mok describe an attack that targets traf?c to a particular resource.In the?rst phase,an attack node convinces Autograph that it is infected by scanning the network.In the second phase,the attack node sends crafted packets mimicking targeted traf?c,causing Autograph to learn rules that block legitimate access and create a denial of service.

In the context of our causative game,the attacker’s choice of P T provides the traf?c for both phases of Autograph’s learning.When Autograph produces a hypothesis f that depends on the carefully crafted traf?c from the attacker,it will block access to legitimate traf?c from P E that shares patterns with the malicious traf?c.

Attacking SpamBayes Nelson et al.(2008)demonstrate Causative Availability attacks (both Targeted and Indiscriminate)against the SpamBayes statistical spam classi?er.We examine these attacks in depth in Sect.5.

3.3Exploratory Integrity attacks

Some Exploratory Integrity attacks mimic statistical properties of the normal traf?c to cam-ou?age intrusions.In the Exploratory game,the attacker’s move produces instances E that statistically resemble normal traf?c in the training data X as measured by the learning pro-cedure H.

Polymorphic blending attack Polymorphic blending attacks encrypt attack traf?c in such a way that it appears statistically identical to normal traf?c.Fogla and Lee(2006)present a formalism for reasoning about and generating polymorphic blending attack instances to evade intrusion detection systems.

Attacking a sequence-based IDS Tan et al.(2002)describe a mimicry attack against the stide anomaly-based intrusion detection system(IDS).They modify exploits of the passwd and traceroute programs to accomplish the same ends using different se-quences of system calls:the shortest subsequence in attack traf?c that does not appear in normal traf?c is longer than the IDS window size,evading detection.In subsequent work Tan et al.(2003)characterize their attacks as part of a larger class of information hiding techniques which they demonstrate can make exploits mimic either normal call sequences or the call sequence of another less severe exploit.

Independently,Wagner and Soto(2002)have also developed mimicry attacks against a sequence-based IPS called pH.1Using the machinery of?nite automata,they construct a framework for testing whether an IDS is susceptible to mimicry for a particular exploit.In doing so,they develop a tool for validating IDSs on a wide-range of variants of a particu-lar attack and suggest that similar tools should be more broadly employed to identify the vulnerabilities of an IDS.

Overall,these mimicry attacks against sequence-based anomaly detection systems under-score critical weaknesses in these systems that allow attackers to obfuscate critical elements of their exploits to avoid detection.Further they highlight how an IDS may appear to perform well against a known exploit but,unless it captures necessary elements of the intrusion,the exploit can easily be adapted to circumvent the detector.See Sect.4.1.1for more discussion. 1Wagner and Soto(2002)treat pH as an IDS rather than an IPS,though that distinction is not relevant to the attacks they present.

Good word attacks Several authors demonstrate Exploratory integrity attacks using similar principles against spam?lters.Lowd and Meek(2005b)and Wittel and Wu(2004)develop attacks against statistical spam?lters that add good words,or words the?lter considers indicative of non-spam,to spam emails.This type of modi?cation can make spam emails appear innocuous to the?lter,especially if the words are chosen to be ones that appear often in non-spam email and rarely in spam email.

Reverse engineering classi?ers Lowd and Meek(2005a)approach the Exploratory In-tegrity attack problem from a different angle:they give an algorithm for an attacker to re-verse engineer a classi?er.The attacker seeks the highest cost(lowest cost for the attacker) instance that the classi?er labels negative.This work is interesting in its use of a cost func-tion over instances for the attacker rather than simple positive/negative classi?cation.We explore this work in more detail in Sect.4.1.2.

3.4Exploratory Availability attacks

Exploratory Availability attacks against non-learning systems abound in the literature:al-most any denial of service(DoS)attack falls into this category,such as those described by Moore et al.(2006).

However,Exploratory Availability attacks against the learning components of systems are not common.We suspect this is because if an attacker wishes to cause denial of service but has no control over the learning process,other avenues of attack are more fruitful.

Here is one hypothetical scenario:if a learning IPS has trained on intrusion traf?c and has the policy of blocking hosts that originate intrusions,an attacker could send intrusions that appear to originate from a legitimate host,convincing the IPS to block that host.An-other possibility is to take advantage of a computationally expensive learning component: for example,spam?lters that use image processing to detect advertisements in graphical attachments can take signi?cantly more time than text-based?ltering(Dredze et al.2007; Wang et al.2007).An attacker could exploit such overhead by sending many emails with images,causing the expensive processing to delay and perhaps even block messages.

3.5Practical considerations for attacks

The attacks described in this section highlight several practical considerations that must be overcome to design an effective attack against a machine learning system.In many cases, realistic attacks on an IPS need to be valid executables—we discuss several such attacks that create valid sequences of system calls in Sect.3.3(e.g.attacks by Tan et al.2002). Other attacks require the attacker to spoof normal behavior—features such as the address of the target system may be dif?cult to spoof,but Chung and Mok(2007)demonstrate that certain components of HTTP requests sometimes used as features for detection can easily be spoofed.Mahoney and Chan(2003)and Chung and Mok(2007)suggest that poor choices of features can facilitate attacks on an IPS,which we discuss further in Sect.4.1.1.And?nally, Fogla and Lee(2006)discuss in depth the opportunities and constraints of generating attack traf?c that appears statistically identical to benign traf?c,as mentioned in Sect.3.3.

4Defenses

This section gives an overview of possible defenses against the types of attacks introduced in previous sections.We discuss both defense mechanisms from prior work as well as ideas

for new defenses.The game between attacker and defender and the taxonomy that we intro-duce in Sect.3provide a foundation on which to construct defense strategies against broad classes of attacks.We address Exploratory and Causative attacks separately;in Sect.4.2.3 we discuss the broader setting of an iterated game.

In all cases,defenses present a trade-off:changing the algorithms to make them more robust against(worst-case)attacks will generally make them less effective on average.An-alyzing this trade-off is an important part of developing defenses.

4.1Defending against Exploratory attacks

Exploratory attacks do not corrupt the training data but attempt to?nd vulnerabilities in the learned hypothesis.When producing the evaluation distribution,the attacker attempts to construct an unfavorable evaluation distribution concentrating probability mass on high-cost instances;in other words,the attacker’s procedure A E tries to construct an evaluation distribution P E on which the learner predicts poorly(violating stationarity)and the cost computed in the last step of the Exploratory game is high.This section examines defender strategies that make it dif?cult for the attacker to construct such a distribution.

In the Exploratory game,the defender makes a move before observing contaminated data.The defender can impede the attacker’s ability to reverse engineer the classi?er by limiting access to information about the training procedure and data.With less information, A E has dif?culty producing an unfavorable evaluation distribution.Nonetheless,even with incomplete information,the attacker may be able to construct an unfavorable evaluation distribution using a combination of prior knowledge and probing.

4.1.1Defenses against attacks without probing

Part of our security analysis involves identifying aspects of the system that should be kept secret.In securing a learner,we limit information to make it dif?cult for an attacker to conduct an attack.

Training data Preventing the attacker from knowing the training data limits the attacker’s ability to reconstruct internal states of the classi?er.There is a tension between collecting training data that fairly represents the real world instances and keeping all aspects of that data secret.In most situations,it is dif?cult to use completely secret training data,though the attacker may have only partial information about its distribution.

Feature selection We can harden classi?ers against attack through attention to features in the feature selection and learning steps(which are both internal steps of the defender’s hy-pothesis selection procedure H).Feature selection is the process of choosing a feature map that transforms raw measurements into the feature space used by the learning algorithm.In the learning step,the learning algorithm builds its model or signature using particular fea-tures from the map’s feature space;this choice of features for the model or signature is also sometimes referred to as feature selection,though we consider it to be part of the learning process,after the feature map has been established.For example,one feature map for email message bodies might transform each token to a Boolean feature indicating its presence; another map might specify a real-valued feature indicating the relative frequency of each word in the message compared to its frequency in natural language;yet another map might count sequences of n characters and specify an integer feature for each character n-gram in-dicating how many times it appears.In each of these cases,a learner will construct a model

or signature that uses certain features(tokens present or absent;relative frequency of words present;character n-gram counts)to decide whether an instance is benign or malicious.

Obfuscation of spam-indicating words(an attack on the feature set)is a common Targeted Exploratory Integrity attack.Sculley et al.(2006)use inexact string matching in feature se-lection to defeat obfuscations of words in spam emails.They choose a feature map based on character subsequences that are robust to character addition,deletion,and substitution.

Globerson and Roweis(2006)present a feature-based learning defense for the feature deletion exploratory attack on the evaluation data E.In feature deletion,features present in the training data,and perhaps highly predictive of an instance’s class,are removed from the evaluation data by the attacker.For example,words present in training emails may not occur in evaluation messages,and network packets in training data may contain values for optional?elds that are missing from future traf?c.Globerson and Roweis formulate a mod-i?ed support vector machine classi?er that is robust in its choice of features against deletion of high-value features.

One particularly important consideration when the learner builds its model or signature is to ensure that the learner uses features related to the intrusion itself.In their study of the DARPA/Lincoln Laboratory IDS evaluation dataset,Mahoney and Chan(2003)demon-strate that spurious artifacts in training data can cause an IDS to learn to distinguish normal from intrusion traf?c based on those artifacts rather than relevant features.Ensuring that the learner builds a model from features that describe the fundamental differences between ma-licious and benign instances should mitigate the effects of mimicry attacks(Sect.3.3)and red herring attacks(Sect.3.1).

Using spurious features in constructing a model or signature is especially problematic in cases where any given intrusion attempt may cause harm only probabilistically or depending on some internal state of the victim’s system.If the features relevant to the intrusion are consistent for some set of instances but the actual cost of those instances varies widely,then a learner risks attributing the variation to other nonessential features.

Hypothesis space/learning procedures A complex hypothesis space may make it dif?cult for the attacker to infer precise information about the learned hypothesis.However,hypothe-sis complexity must be balanced with capacity to generalize,such as through regularization.

Wang et al.(2006)present Anagram,an anomaly detection system using n-gram models of bytes to detect intrusions.They incorporate two techniques to defeat Exploratory attacks that mimic normal traf?c(mimicry attacks):(1)they use high-order n-grams(with n typi-cally between3and7),which capture differences in intrusion traf?c even when that traf?c has been crafted to mimic normal traf?c on the single-byte level;and(2)they randomize fea-ture selection by randomly choosing several(possibly overlapping)subsequences of bytes in the packet and testing them separately,so the attack will fail unless the attacker makes not only the whole packet but also any subsequence mimic normal traf?c.

Dalvi et al.(2004)develop a cost-sensitive game-theoretic classi?cation defense to counter Exploratory Integrity attacks.In their model,the attacker can alter natural instance features in A E but incurs a known cost for each change.The defender can measure each feature at a different known cost.Each has a known cost function over classi?cation/true label pairs.The classi?er H is a cost-sensitive naive Bayes learner that classi?es instances to minimize its expected cost,while the attacker modi?es features to minimize its own ex-pected cost.Their defense constructs an adversary-aware classi?er by altering the likelihood function of the learner to anticipate the attacker’s changes.They adjust the likelihood that an instance is malicious by considering that the observed instance may be the result of an attacker’s optimal transformation of another instance.This defense relies on two assump-tions:(1)the defender’s strategy is a step ahead of the attacker’s strategy(in other words,

their game differs from ours in that the attacker’s procedure A E cannot take f into account), and(2)the attacker plays optimally against the original cost-sensitive classi?er.It is worth noting that while their approach defends against optimal attacks,it doesn’t account for non-optimal attacks.For example,if the attacker doesn’t modify any data,the adversary-aware classi?er misclassi?es some instances that the original classi?er correctly classi?es.

4.1.2Defenses against probing attacks

The ability for A E to query a classi?er gives an attacker powerful additional attack options. Analysis of reverse engineering Lowd and Meek(2005a)observe that the attacker need not model the classi?er explicitly,but only?nd lowest-attacker-cost instances as in the Dalvi et al.setting.They formalize a notion of reverse engineering as the adversarial classi?er re-verse engineering(ACRE)problem.Given an attacker cost function,they analyze the com-plexity of?nding a lowest-attacker-cost instance that the classi?er labels as negative.They assume no general knowledge of training data,though the attacker does know the feature space and also must have one positive example and one negative example.A classi?er is ACRE-learnable if there exists a polynomial-query algorithm that?nds a lowest-attacker-cost negative instance.They show that linear classi?ers are ACRE-learnable with linear attacker cost functions and some other minor restrictions.

The ACRE-learning problem provides a means of qualifying how dif?cult it is to use queries to reverse engineer a classi?er from a particular hypothesis class using a particular feature space.We now suggest defense techniques that can increase the dif?culty of reverse engineering a learner.

Randomization A randomized hypothesis may decrease the value of feedback to an at-tacker.Instead of choosing a hypothesis f:X→{0,1},we generalize to hypotheses that predict a real value on[0,1].This generalized hypothesis returns a probability of classify-ing x as1.By randomizing,the expected performance of the hypothesis may decrease on regular data drawn from a non-adversarial distribution,but it also may decrease the value of the queries for the attacker.

Randomization in this fashion does not reduce the information available in principle to the attacker,but merely requires more work from the attacker for the information.It is likely that this defense is appropriate in only a small number of scenarios.

Limiting/misleading feedback Another potential defense is to limit the feedback given to an attacker.For example,common techniques in the spam domain include eliminating bounce emails,remote image loading,and other potential feedback channels.It is impos-sible to remove all feedback channels;however,limiting feedback increases work for the attacker.In some settings,it may be possible to mislead the attacker by sending fraudulent feedback.

Actively misleading the attacker by fabricating feedback suggests an interesting battle of information between attacker and defender.In some scenarios the defender may be able to give the attacker no information via feedback,and in others the defender may even be able to return feedback that causes the attacker to come to incorrect conclusions.

4.2Defending against Causative attacks

In Causative attacks,the attacker has a degree of control over not only the evaluation distri-bution but also the training distribution.Therefore the learning procedures we consider must

be resilient against contaminated training data,as well as to the evaluation considerations discussed in Sect.4.1.

Two general strategies for defense are to remove malicious data from the training set and to harden the learning algorithm against malicious training data.We?rst present one method of our own for the former and then describe two approaches to the latter that appear in the literature.

4.2.1The RONI defense

We propose the Reject On Negative Impact(RONI)defense,a technique that measures the empirical effect of each training instance and eliminates from training those points that have a substantial negative impact on classi?cation accuracy.To determine whether a candidate training instance is malicious or not,we can train a classi?er on a base training set,then add the candidate instance to our training set and train a second classi?er.We apply both classi?ers to a quiz set of instances with known labels,measuring the difference in accuracy between the two.If adding the candidate instance to our training set causes the resulting classi?er to produce substantially more classi?cation errors,we reject the instance as detri-mental in its effect.

We re?ne and explore the RONI defense experimentally in Sect.5.3.

4.2.2Robustness

The?eld of Robust Statistics explores procedures that limit the impact of a small fraction of deviant(adversarial)training data.In the setting of Robust Statistics,it is assumed that the bulk of the data is generated from a known model,but a fraction of the data comes from an unknown model—to bound the effect of this unknown source it is assumed to be adversarial.There are a number of measures of a procedure’s robustness:the breakdown point is the level of contamination required for the attacker to arbitrarily manipulate the procedure and the in?uence function measures the impact of contamination on the procedure. Robustness measures can be used to assess the susceptibility of an existing system and to suggest alternatives that reduce or eliminate the vulnerability.Ideally one would like to use a procedure with a high breakdown point and a bounded in?uence function.In this way,these measures can be used to compare candidate procedures and to design procedures H that are optimally robust against adversarial contamination of the training data.For a full treatment, see the books by Huber(1981),Hampel et al.(1986),and Maronna et al.(2006).

Recent research has highlighted the importance of robust procedures in security and learning tasks.Wagner(2004)observes that common sensor net aggregation procedures, such as computing a mean,are not robust to adversarial point contamination,and he iden-ti?es robust replacements.Christmann and Steinwart(2004)study robustness for a general family of learning methods.Their results suggest that certain commonly used loss func-tions,along with proper regularization,lead to robust procedures with a bounded in?u-ence function.These results suggest such procedures have desirable properties for secure learning.

4.2.3Online prediction with experts

Another approach to combatting Causative attacks is to dynamically adapt to the data in an online fashion.In this setting,we allow for the attacker to potentially have arbitrary control of the training data,but instead of attempting to learn on this arbitrarily corrupted data,the

online learner attempts to optimize with respect to a set of experts.For instance,the ex-perts may be a set of classi?ers each designed to provide different security properties.The experts provide advice(predictions)to the defender who forms a composite classi?er that weighs the advice based on each experts past performance and thus produces a compos-ite prediction.We make no assumption about how the experts form their advice or about their performance.However,rather than evaluating the cost of the composite classi?er’s predictions directly,we instead compare the cost incurred by the composite classi?er rela-tive to the cost of the best expert in hindsight;that is,we compute the regret the composite classi?er has for not heeding the advice of the best expert.In designing strategies that op-timize regret,the composite classi?ers for these online games create a moving target and force the attacker to construct attacks that succeed against a set of experts rather than a single one.

In this setting,the learner forms a prediction from the M expert predictions and adapts the hypothesis based on their performance during K repetitions.At each step k of the game,

the defender receives a prediction g(k)

m from each expert;this may be based on the data but we

make no assumptions about its behavior.More formally,the k-th round of the expert-based prediction game is:

1.Defender Update function h(k):Y M→Y

2.Attacker Choose distribution P(k)

3.Evaluation:

–Sample an instance(x(k),y(k))~P(k)

–Compute expert advice{g(k)

m

}M m=1

–Predict?y(k)=h(k)(g(k)1,...,g(k)M)

–Assess cost C(x(k),?y(k),y(k))

This game has a slightly different structure from the games we present in Sect.2.4—here the defender chooses one strategy at the beginning of the game and then in each iteration updates the function h(k)according to that strategy.The attacker,however,may select a new strategy at each iteration.

The setting of online expert-based prediction splits risk minimization into two subprob-lems:(1)minimizing the average loss of each expert and(2)minimizing the average regret—the difference between the loss of our composite learner and the loss of the best overall ex-pert in hindsight.The other defenses we have discussed approach the?rst problem.Online game theory addresses the second problem:the defender chooses a strategy for updating h(k)to minimize regret based only on the expert’s past performance.For certain variants of the game,there exist composite predictors whose regret is o(K)—that is,the average regret approaches0as the K increases.Thus,the composite learner can perform almost as well as the best expert without knowing ahead of time which expert is best.A full description of this setting and several results appear in Cesa-Bianchi and Lugosi(2006).

Importantly,the online prediction setting allows the defender to adapt to an adversary and forces the adversary to design attack strategies that succeed against an entire set of experts (each of which can have its own security design considerations).Thus,we can incorporate several classi?ers with desirable security properties into a composite approach.Moreover, if a particular attack is succeeding,we can design a new expert against the identi?ed vul-nerability and add it to our set of experts to patch the exploit.This makes online prediction well-suited to the changing attack landscape.

5Case study:attacking SpamBayes

We have put our framework to use studying attacks against the SpamBayes statistical spam ?lter(Nelson et al.2008).Here we review that work and demonstrate how our framework informs and structures the analysis.

SpamBayes is a content-based statistical spam?lter that classi?es email using token counts in a model proposed by Robinson(2003)and inspired by Graham(2002).Meyer and Whateley(2004)describe the system in detail.SpamBayes computes a score for each token in the training corpus;this score is similar to a smoothed estimate of the posterior probability that an email containing that token is spam.Each token’s score is an estimator of the conditional likelihood that a message is spam given that it contains the token.2In terms of(1),the loss function (·,·)takes the form of the logarithm of the posterior probability for a binomial model,the regularization termρ(·)takes the form of the logarithm of a prior probability using a beta prior to smooth the estimate,andλ=1is the regularizer’s weight. The?lter computes a message’s spam score by assuming token scores are independent and applying Fisher’s method for combining signi?cance tests(Fisher1948).The message score is compared against two thresholds to select the label spam,ham(non-spam),or unsure. 5.1Causative Availability attacks on SpamBayes

In analyzing the vulnerabilities of SpamBayes,we are motivated by our taxonomy of attacks.Known attacks that spammers use against deployed spam?lters tend to be Exploratory Integrity attacks:either they obfuscate the especially spam-like content of a spam email or they include content not indicative of spam.Both tactics aim to get the modi?ed message into the victim’s inbox.This category of attack has been stud-ied in detail in the literature(Lowd and Meek2005a,2005b;Wittel and Wu2004; Dalvi et al.2004).However,we?nd the study of Causative attacks more compelling be-cause they are unique to machine learning systems and potentially more harmful.

In particular,a Causative Availability attack can create a powerful denial of service.For example,if a spammer causes enough legitimate messages to be?ltered away by the user’s spam?lter,the user is likely to disable the?lter and therefore see the spammer’s advertise-ments.As another example,an unscrupulous business owner may wish to use spam?lter de-nial of service to prevent a competitor from receiving email orders from potential customers. In this section,we present two novel Causative Availability attacks against SpamBayes:the dictionary attack is Indiscriminate and the focused attack is Targeted.

We consider an attacker with the following capabilities.We allow the attacker’s proce-dures A T and A E to craft email messages with arbitrary message bodies but realistically limited control over the headers.Because we limit our study to Availability attacks,we as-sume that all malicious messages generated by A T are labeled as spam when included in the training set X;if we allow the attacker to create ham-labeled training messages then Integrity attacks become easier,but the advantage for Availability attacks is small. Dictionary attack Our?rst attack is an Indiscriminate attack—the attacker wants to cause a large number of false positives so that the user loses con?dence in the?lter and must manually sort through spam and ham emails.There are three variants of this attack.In the ?rst,the attacker maximizes the expected spam score of any future message in an optimal 2The estimator used by Meyer and Whateley(2004)for conditional likelihood deviates slightly from a tradi-tional maximum likelihood estimator.

相关文档