当前位置：文档库 › Hedging

Hedging

Hedging predictions in machine learning Alexander Gammerman and Vladimir Vovk

praktiqeskie vyvody

teorii vero tnoste

mogut byt obosnovany

v kaqestve sledstvi

gipotez o predel no

pri dannyh ograniqeni h

slo nosti izuqaemyh vleni

On-line Compression Modelling Project(New Series)

Working Paper#2

June9,2007

Project web site:

https://www.wendangku.net/doc/5e8143121.html,/

Abstract

Recent advances in machine learning make it possible to design e?cient pre-diction algorithms for data sets with huge numbers of parameters.This article describes a new technique for‘hedging’the predictions output by many such algorithms,including support vector machines,kernel ridge regression,kernel nearest neighbours,and by many other state-of-the-art methods.The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability.These measures are provably valid under the as-sumption of randomness,traditional in machine learning:the objects and their labels are assumed to be generated independently from the same probability distribution.In particular,it becomes possible to control(up to statistical?uc-tuations)the number of erroneous predictions by selecting a suitable con?dence level.Validity being achieved automatically,the remaining goal of hedged pre-diction is e?ciency:taking full account of the new objects’features and other available information to produce as accurate predictions as possible.This can be done successfully using the powerful machinery of modern machine learning. Contents

1Introduction1 2Ideal hedged predictions3 3Conformal prediction6 4Bayesian approach to conformal prediction10 5On-line prediction12 6Slow teachers,lazy teachers,and the batch setting18 7Induction and transduction19 8Inductive conformal predictors20 9Conclusion21

A Discussion22

B Rejoinder36 References43

1Introduction

The two main varieties of the problem of prediction,classi?cation and regres-sion,are standard subjects in statistics and machine learning.The classical classi?cation and regression techniques can deal successfully with conventional small-scale,low-dimensional data sets;however,attempts to apply these tech-niques to modern high-dimensional and high-throughput data sets encounter serious conceptual and computational di?culties.Several new techniques,?rst of all support vector machines[42,43]and other kernel methods,have been developed in machine learning recently with the explicit goal of dealing with high-dimensional data sets with large numbers of objects.

A typical drawback of the new techniques is the lack of useful measures of con?dence in their predictions.For example,some of the tightest upper bounds of the popular theory of PAC(probably approximately correct)learning on the probability of error exceed1even for relatively clean data sets([51],p.249). This article describes an e?cient way to‘hedge’the predictions produced by the new and traditional machine-learning methods,i.e.,to complement them with measures of their accuracy and reliability.Appropriately chosen,not only are these measures valid and informative,but they also take full account of the special features of the object to be predicted.

We call our algorithms for producing hedged predictions conformal predic-tors;they are formally introduced in Section3.Their most important property is the automatic validity under the randomness assumption(to be discussed shortly).Informally,validity means that conformal predictors never overrate the accuracy and reliability of their predictions.This property,stated in Sections3 and5,is formalized in terms of?nite data sequences,without any recourse to asymptotics.

The claim of validity of conformal predictors depends on an assumption that is shared by many other algorithms in machine learning,which we call the assumption of randomness:the objects and their labels are assumed to be generated independently from the same probability distribution.Admittedly, this is a strong assumption,and areas of machine learning are emerging that rely on other assumptions(such as the Markovian assumption of reinforcement learning;see,e.g.,[36])or dispense with any stochastic assumptions altogether (competitive on-line learning;see,e.g.,[6,47]).It is,however,much weaker than assuming a parametric statistical model,sometimes complemented with a prior distribution on the parameter space,which is customary in the statistical theory of prediction.And taking into account the strength of the guarantees that can be proved under this assumption,it does not appear overly restrictive.

So we know that conformal predictors tell the truth.Clearly,this is not enough:truth can be uninformative and so useless.We will refer to various measures of informativeness of conformal predictors as their‘e?ciency’.As conformal predictors are provably valid,e?ciency is the only thing we need to worry about when designing conformal predictors for solving speci?c problems. Virtually any classi?cation or regression algorithm can be transformed into a conformal predictor,and so most of the arsenal of methods of modern machine

learning can be brought to bear on the design of e?cient conformal predictors.

We start the main part of the article,in Section2,with the description of an idealized predictor based on Kolmogorov’s algorithmic theory of randomness. This‘universal predictor’produces the best possible hedged predictions but, unfortunately,is noncomputable.We can,however,set ourselves the task of approximating the universal predictor as well as possible.

In Section3we formally introduce the notion of conformal predictors and state a simple result about their validity.In that section we also brie?y describe results of computer experiments demonstrating the methodology of conformal prediction.

In Section4we consider an example demonstrating how conformal predictors react to the violation of our model of the stochastic mechanism generating the data(within the framework of the randomness assumption).If the model coin-cides with the actual stochastic mechanism,we can construct an optimal confor-mal predictor,which turns out to be almost as good as the Bayes-optimal con?-dence predictor(the formal de?nitions will be given later).When the stochastic mechanism signi?cantly deviates from the model,conformal predictions remain valid but their e?ciency inevitably su?ers.The Bayes-optimal predictor starts producing very misleading results which super?cially look as good as when the model is correct.

In Section5we describe the‘on-line’setting of the problem of prediction, and in Section6contrast it with the more standard‘batch’setting.The notion of validity introduced in Section3is applicable to both settings,but in the on-line setting it can be strengthened:we can now prove that the percentage of the erroneous predictions will be close,with high probability,to a chosen con?dence level.For the batch setting,the stronger property of validity for conformal predictors remains an empirical fact.In Section6we also discuss limitations of the on-line setting and introduce new settings intermediate between on-line and batch.To a large degree,conformal predictors still enjoy the stronger property of validity for the intermediate settings.

Section7is devoted to the discussion of the di?erence between two kinds of inference from empirical data,induction and transduction(emphasized by Vladimir Vapnik[42,43]).Conformal predictors belong to transduction,but combining them with elements of induction can lead to a signi?cant improve-ment in their computational e?ciency(Section8).

We show how some popular methods of machine learning can be used as un-derlying algorithms for hedged prediction.We do not give the full description of these methods and refer the reader to the existing readily accessible descrip-tions.This article is,however,self-contained in the sense that we explain all features of the underlying algorithms that are used in hedging their predictions. We hope that the information we provide will enable the reader to apply our hedging techniques to their favourite machine-learning methods.

2Ideal hedged predictions

The most basic problem of machine learning is perhaps the following.We are given a training set of examples

(x1,y1),...,(x l,y l),(1) each example(x i,y i),i=1,...,l,consisting of an object x i(typically,a vector of attributes)and its label y i;the problem is to predict the label y l+1of a new object x l+1.Two important special cases are where the labels are known a priori to belong to a relatively small?nite set(the problem of classi?cation)and where the labels are allowed to be any real numbers(the problem of regression).

The usual goal of classi?cation is to produce a prediction?y l+1that is likely to coincide with the true label y l+1,and the usual goal of regression is to produce a prediction?y l+1that is likely to be close to the true label y l+1.In the case of classi?cation,our goal will be to complement the prediction?y l+1with some measure of its reliability.In the case of regression,we would like to have some measure of accuracy and reliability of our prediction.There is a clear trade-o?between accuracy and reliability:we can improve the former by relaxing the latter and vice versa.We are looking for algorithms that achieve the best possible trade-o?and for a measure that would quantify the achieved trade-o?.

Let us start from the case of classi?cation.The idea is to try every possible label Y as a candidate for x l+1’s label and see how well the resulting sequence

(x1,y1),...,(x l,y l),(x l+1,Y)(2) conforms to the randomness assumption(if it does conform to this assumption, we will say that it is‘random’;this will be formalized later in this section).The ideal case is where all Y s but one lead to sequences(2)that are not random; we can then use the remaining Y as a con?dent prediction for y l+1.

In the case of regression,we can output the set of all Y s that lead to a random sequence(2)as our‘prediction set’.An obvious obstacle is that the set of all possible Y s is in?nite and so we cannot go through all the Y s explicitly, but we will see in the next section that there are ways to overcome this di?culty.

We can see that the problem of hedged prediction is intimately connected with the problem of testing randomness.Di?erent versions of the universal notion of randomness were de?ned by Kolmogorov,Martin-L¨o f and Levin(see, e.g.,[24])based on the existence of universal Turing machines.Adapted to our current setting,Martin-L¨o f’s de?nition is as follows.Let Z be the set of all possible examples(assumed to be a measurable space);as each example consists of an object and a label,Z=X×Y,where X is the set of all possible objects and Y,|Y|>1,is the set of all possible labels.We will use Z?as the notation for all?nite sequences of examples.A function t:Z?→[0,1]is a randomness test if

1.for all ∈(0,1),all n∈{1,2,...}and all probability distributions P on

P n{z∈Z n:t(z)≤ }≤ ;(3)

2.t is upper semicomputable.

The?rst condition means that the randomness test is required to be valid:if, for example,we observe t(z)≤1%for our data set z,then either the data set was not generated independently from the same probability distribution P or a rare(of probability at most1%,under any P)event has occurred.The second condition means that we should be able to compute the test,in a weak sense(we cannot require computability in the usual sense,since the universal test can only be upper semicomputable:it can work forever to discover all patterns in the data sequence that make it non-random).Martin-L¨o f(developing Kolmogorov’s earlier ideas)proved that there exists a smallest,to within a constant factor, randomness test.

Let us?x a smallest randomness test,call it the universal test,and call the value it takes on a data sequence the randomness level of this sequence.A ran-dom sequence is one whose randomness level is not small;this is rather informal, but it is clear that for?nite data sequences we cannot have a clear-cut division of all sequences into random and non-random(like the one de?ned by Martin-L¨o f [25]for in?nite sequences).If t is a randomness test,not necessarily universal, the value that it takes on a data sequence will be called the randomness level detected by t.

Remark The word‘random’is used in(at least)two di?erent senses in the existing literature.In this article we need both but,luckily,the di?erence does not matter within our current framework.First,randomness can refer to the assumption that the examples are generated independently from the same distribution;this is the origin of our‘assumption of randomness’.Second,a data sequence is said to be random with respect to a statistical model if the universal test(a generalization of the notion of universal test as de?ned above) does not detect any lack of conformity between the two.Since the only statistical model we are interested in in this article is the one embodying the assumption of randomness,we have a perfect agreement between the two senses. Prediction with con?dence and credibility

Once we have a randomness test t,universal or not,we can use it for hedged pre-diction.There are two natural ways to package the results of such predictions: in this subsection we will describe the way that can only be used in classi?cation problems.If the randomness test is not computable,we can imagine an oracle answering questions about its values.

Given the training set(1)and the test object x l+1,we can act as follows:?consider all possible values Y∈Y for the label y l+1;

??nd the randomness level detected by t for every possible completion(2);

?predict the label Y corresponding to a completion with the largest ran-domness level detected by t;

?output as the con?dence in this prediction one minus the second largest randomness level detected by t;

?output as the credibility of this prediction the randomness level detected by t of the output prediction Y(i.e.,the largest randomness level detected by t over all possible labels).

To understand the intuition behind con?dence,let us tentatively choose a con-ventional‘signi?cance level’,say1%.(In the terminology of this article,this corresponds to a‘con?dence level’of99%,i.e.,100%minus1%.)If the con?-dence in our prediction is99%or more and the prediction is wrong,the actual data sequence belongs to an a priori chosen set of probability at most1%(the set of all data sequences with randomness level detected by t not exceeding1%).

Intuitively,low credibility means that either the training set is non-random or the test object is not representative of the training set(say,in the training set we have images of digits and the test object is that of a letter).

Con?dence predictors

In regression problems,con?dence,as de?ned in the previous subsection,is not a useful quantity:it will typically be equal to0.A better approach is to choose a range of con?dence levels1? ,and for each of them specify a prediction set Γ ?Y,the set of labels deemed possible at the con?dence level1? .We will always consider nested prediction sets:Γ 1?Γ 2when 1≥ 2.A con?dence predictor is a function that maps each training set,each new object,and each con?dence level1? (formally,we allow to take any value in(0,1))to the corresponding prediction setΓ .For the con?dence predictor to be valid the probability that the true label will fall outside the prediction setΓ should not exceed ,for each .

We might,for example,choose the con?dence levels99%,95%and80%,and refer to the99%prediction setΓ1%as the highly con?dent prediction,to the 95%prediction setΓ5%as the con?dent prediction,and to the80%prediction setΓ20%as the casual prediction.Figure1shows how such a family of prediction sets might look in the case of a rectangular label space Y.The casual prediction pinpoints the target quite well,but we know that this kind of prediction can be wrong with probability20%.The con?dent prediction is much bigger.If we want to be highly con?dent(make a mistake only with probability1%),we must accept an even lower accuracy;there is even a completely di?erent location that we cannot rule out at this level of con?dence.

Given a randomness test,again universal or not,we can de?ne the corre-sponding con?dence predictor as follows:for any con?dence level1? ,the corresponding prediction set consists of the Y s such that the randomness level of the completion(2)detected by the test is greater than .The condition(3) of validity for statistical tests implies that a con?dence predictor de?ned in this way is always valid.

The con?dence predictor based on the universal test(the universal con?dence predictor)is an interesting object for mathematical investigation(see,e.g.,[50],

Figure1:An example of a nested family of prediction sets(casual prediction in black,con?dent prediction in dark grey,and highly con?dent prediction in light grey).

Section4),but it is not computable and so cannot be used in practice.Our goal in the following sections will be to?nd computable approximations to it.

3Conformal prediction

In the previous section we explained how randomness tests can be used for prediction.The connection between testing and prediction is,of course,well understood and have been discussed at length by philosophers[32]and statis-ticians(see,e.g.,the textbook[9],Section7.5).In this section we will see how some popular prediction algorithms can be transformed into randomness tests and,therefore,be used for producing hedged predictions.

Let us start with the most successful recent development in machine learning, support vector machines([42,43],with a key idea going back to the generalized portrait method[44]).Suppose the label space is Y={?1,1}(we are dealing with the binary classi?cation problem).With each set of examples

(x1,y1),...,(x n,y n)(4) one associates an optimization problem whose solution produces nonnegative numbersα1,...,αn(‘Lagrange multipliers’).These numbers determine the prediction rule used by the support vector machine(see[43],Chapter10,for details),but they also are interesting objects in their own right.Eachαi, i=1,...,n,tells us how strange an element of the set(4)the corresponding example(x i,y i)is.Ifαi=0,(x i,y i)?ts set(4)very well(in fact so well that such examples are uninformative,and the support vector machine ignores them when making predictions).The elements withαi>0are called support vectors, and the large value ofαi indicates that the corresponding(x i,y i)is an outlier.

Applying this procedure to the completion(2)in the role of set(4)(so that n=l+1),we can?nd the correspondingα1,...,αl+1.If Y is di?erent from the actual label y l+1,we expect(x l+1,Y)to be an outlier in the set(2)and so αl+1be large as compared withα1,...,αl.A natural way to compareαl+1to

Table1:Selected test examples from the USPS data set:the p-values of digits (0–9),true and predicted labels,and con?dence and credibility values.

0123456789true

label

pre-

diction

confi-

dence

credi-

bility

0.01%0.11%0.01%0.01%0.07%0.01%100%0.01%0.01%0.01%6699.89%100% 0.32%0.38% 1.07%0.67% 1.43%0.67%0.38%0.33%0.73%0.78%6498.93% 1.43% 0.01%0.27%0.03%0.04%0.18%0.01%0.04%0.01%0.12%100%9999.73%100%

the otherαs is to look at the ratio

p Y:=|{i=1,...,l+1:αi≥αl+1}|

l+1

,(5)

which we call the p-value associated with the possible label Y for x l+1.In words, the p-value is the proportion of theαs which are at least as large as the lastα.

The methodology of support vector machines(as described in[42,43])is directly applicable only to the binary classi?cation problems,but the general case can be reduced to the binary case by the standard‘one-against-one’or ‘one-against-the-rest’procedures.This allows us to de?ne the strangeness values α1,...,αl+1for general classi?cation problems(see[51],p.59,for details),which in turn determine the p-values(5).

The function that assigns to each sequence(2)the corresponding p-value, de?ned by expression(5),is a randomness test(this will follow from Theorem 1stated in Section5below).Therefore,the p-values,which are our approxima-tions to the corresponding randomness levels,can be used for hedged prediction as described in the previous section.For example,in the case of binary classi-?cation,if the p-value p?1is small while p1is not small,we can predict1with con?dence1?p?1and credibility p1.Typical credibility will be1:for most data sets the percentage of support vectors is small([43],Chapter12),and so we can expectαl+1=0when Y=y l+1.

Remark When the order of examples is irrelevant,we refer to the data set(4) as a set,although as a mathematical object it is a multiset rather than a set since it can contain several copies of the same example.We will continue to use this informal terminology(to be completely accurate,we would have to say ‘data multiset’instead of‘data set’!)

Table1illustrates the results of hedged prediction for a popular data set of hand-written digits called the USPS data set[23].The data set contains9298 digits represented as a16×16matrix of pixels;it is divided into a training set of size7291and a test set of size2007.For several test examples the table shows the p-values for each possible label,the actual label,the predicted label,con?dence,and credibility,computed using the support vector method with the polynomial kernel of degree5.To interpret the numbers in this table, remember that high(i.e.,close to100%)con?dence means that all labels except the predicted one are unlikely.If,say,the?rst example were predicted wrongly, this would mean that a rare event(of probability less than1%)had occurred; therefore,we expect the prediction to be correct(which it is).In the case of the

second example,con?dence is also quite high(more than95%),but we can see that the credibility is low(less than5%).From the con?dence we can conclude that the labels other than4are excluded at level5%,but the label4itself is also excluded at the level5%.This shows that the prediction algorithm was unable to extract from the training set enough information to allow us to con?dently classify this example:the strangeness of the labels di?erent from4may be due to the fact that the object itself is strange;perhaps the test example is very di?erent from all examples in the training set.Unsurprisingly,the prediction for the second example is wrong.

In general,high con?dence shows that all alternatives to the predicted label are unlikely.Low credibility means that the whole situation is suspect;as we have already mentioned,we will obtain a very low credibility if the new example is a letter(whereas all training examples are digits).Credibility will also be low if the new example is a digit written in an unusual way.Notice that typically credibility will not be low provided the data set was generated independently from the same distribution:the probability that credibility will not exceed some threshold (such as1%)is at most .In summary,we can trust a prediction if (1)the con?dence is close to100%and(2)the credibility is not low(say,is not less than5%).

Many other prediction algorithms can be used as underlying algorithms for hedged prediction.For example,we can use the nearest neighbours technique

to associate

αi:= k

j=1

,i=1,...,n,(6)

with the elements(x i,y i)of the set(4),where d+

is the j th shortest distance

from x i to other objects labelled in the same way as x i,and d?

ij is the j th short-

est distance from x i to the objects labelled di?erently from x i;the parameter k∈{1,2,...}in Equation(6)is the number of nearest neighbours taken into account.The distances can be computed in a feature space(that is,the distance between x∈X and x ∈X can be understood as F(x)?F(x ) ,F mapping the object space X into a feature,typically Hilbert,space),and so de?nition (6)can also be used with the kernel nearest neighbours.

The intuition behind Equation(6)is as follows:a typical object x i labelled by,say,y will tend to be surrounded by other objects labelled by y;and if this is the case,the correspondingαi will be small.In the untypical case that there are objects whose labels are di?erent from y nearer than objects labelled y,αi will become larger.Therefore,theαs re?ect the strangeness of examples.

The p-values computed from Equation(6)can again be used for hedged prediction.It is a general empirical fact that the accuracy and reliability of the hedged predictions are in line with the error rate of the underlying algorithm. For example,in the case of the USPS data set,the1-nearest neighbour algorithm (i.e.,the one with k=1)achieves the error rate of2.2%,and the hedged predictions based on Equation(6)are highly con?dent(achieve con?dence of at least99%)for more than95%of the test examples.

General de?nition

The general notion of conformal predictor can be de?ned as follows.A noncon-formity measure is a function that assigns to every data sequence(4)a sequence of numbersα1,...,αn,called nonconformity scores,in such a way that inter-changing any two examples(x i,y i)and(x j,y j)leads to the interchange of the corresponding nonconformity scoresαi andαj(with all other nonconformity scores una?ected).The corresponding conformal predictor maps each data set (1),l=0,1,...,each new object x l+1,and each con?dence level1? ∈(0,1) to the prediction set

Γ (x1,y1,...,x l,y l,x l+1):={Y∈Y:p Y> },(7) where p Y are de?ned by Equation(5)withα1,...,αl+1being the nonconformity scores corresponding to the data sequence(2).

We have already remarked that associating with each completion(2)the p-value(5)gives a randomness test;this is true in general.This implies that for each l the probability of the event

y l+1∈Γ (x1,y1,...,x l,y l,x l+1)

is at least1? .

This de?nition works for both classi?cation and regression,but in the case of classi?cation we can summarize the prediction sets(7)by two numbers:the con?dence

sup{1? :|Γ |≤1}(8) and the credibility

inf{ :|Γ |=0}.(9)

Computationally e?cient regression

As we have already mentioned,the algorithms described so far cannot be ap-plied directly in the case of regression,even if the randomness test is e?ciently computable:now we cannot consider all possible values Y for y l+1since there are in?nitely many of them.However,there might still be computationally ef-?cient ways to?nd the prediction setsΓ .The idea is that ifαi are de?ned as the residuals

αi:=|y i?f Y(x i)|(10) where f Y:X→R is a regression function?tted to the completed data set(2), thenαi may have a simple expression in terms of Y,leading to an e?cient way of computing the prediction sets(via Equations(5)and(7)).This idea was implemented in[28]in the case where f Y is found from the ridge regression, or kernel ridge regression,procedure,with the resulting algorithm of hedged prediction called the ridge regression con?dence machine.For a much fuller description of the ridge regression con?dence machine(and its modi?cations in the case where the simple residuals(10)are replaced by the fancier‘deleted’or ‘studentized’residuals)see[51],Section2.3.

4Bayesian approach to conformal prediction Bayesian methods have become very popular in both machine learning and statistics thanks to their power and versatility,and in this section we will see how Bayesian ideas can be used for designing e?cient conformal predictors.We will only describe results of computer experiments(following[26])with arti?cial data sets,since for real-world data sets there is no way to make sure that the Bayesian assumption is satis?ed.

Suppose X=R p(each object is a vector of p real-valued attributes)and our model of the data-generating mechanism is

y i=w·x i+ξi,i=1,2,...,(11) whereξi are independent standard Gaussian random variables and the weight vector w∈R p is distributed as N(0,(1/a)I p)(we use the notation I p for the unit p×p matrix and N(0,A)for the p-dimensional Gaussian distribution with mean0and covariance matrix A);a is a positive constant.The actual data-generating mechanism used in our experiments will correspond to this model with a set to1.

Under the model(11)the best(in the mean-square sense)?t to a data set (4)is provided by the ridge regression procedure with parameter a(for details, see,e.g.,[51],Section10.3).Using the residuals(10)with f Y found by ridge regression with parameter a leads to an e?cient conformal predictor which will be referred to as the ridge regression con?dence machine with parameter a. Each prediction set output by the ridge regression con?dence machine will be replaced by its convex hull,the corresponding prediction interval.

To test the validity and e?ciency of the ridge regression con?dence machine the following procedure was used.Ten times a vector w∈R5was independently generated from the distribution N(0,I5).For each of the10values of w,100 training objects and100test objects were independently generated from the uniform distribution on[?10,10]5and for each object x its label y was generated as w·x+ξ,with all theξstandard Gaussian and independent.For each of the 1000test objects and each con?dence level1? the prediction setΓ for its label was found from the corresponding training set using the ridge regression con?dence machine with parameter a=1.The solid line in Figure2shows the con?dence level against the percentage of test examples whose labels were not covered by the corresponding prediction intervals at that con?dence level.Since conformal predictors are always valid,the percentage outside the prediction interval should never exceed100minus the con?dence level,up to statistical ?uctuations,and this is con?rmed by the picture.

A natural measure of e?ciency of con?dence predictors is the mean width of their prediction intervals,at di?erent con?dence levels:the algorithm is the more e?cient the narrower prediction intervals it produces.The solid line in Figure3shows the con?dence level against the mean(over all test examples) width of the prediction intervals at that con?dence level.

Since we know the data-generating mechanism,the approach via conformal prediction appears somewhat roundabout:for each test object we could instead

confidence level (%)% o u t s i d e p r e d i c t i o n i n t e r v a l s Figure 2:Validity for the ridge regression con?dence machine.

confidence level (%)m e a n p r e d i c t i o n i n t e r v a l w i d t h Figure 3:E?ciency for the ridge regression con?dence machine.

?nd the conditional probability distribution of its label,which is Gaussian,and output as the prediction setΓ the shortest(i.e.,centred at the mean of the conditional distribution)interval of conditional probability1? .Figures4 and5are the analogues of Figures2and3for this Bayes-optimal con?dence predictor.The solid line in Figure4demonstrates the validity of the Bayes-optimal con?dence predictor.

What is interesting is that the solid lines in Figures5and3look exactly the same,taking account of the di?erent scales of the vertical axes.The ridge regression con?dence machine appears as good as the Bayes-optimal predictor. (This is a general phenomenon;it is also illustrated,in the case of classi?ca-tion,by the construction in Section3.3of[51]of a conformal predictor that is asymptotically as good as the Bayes-optimal con?dence predictor.) The similarity between the two algorithms disappears when they are given wrong values for a.For example,let us see what happens if we tell the algorithms that the expected value of w is just1%of what it really is(this corresponds to taking a=10000).The ridge regression con?dence machine stays valid(see the dashed line in Figure2),but its e?ciency deteriorates(the dashed line in Figure3).The e?ciency of the Bayes-optimal con?dence predictor(the dashed line in Figure5)is hardly a?ected,but its predictions become invalid(the dashed line in Figure4deviates signi?cantly from the diagonal,especially for the most important large con?dence levels: e.g.,only about15%of labels fall within the90%prediction intervals).The worst that can happen to the ridge regression con?dence machine is that its predictions will become useless(but at least harmless),whereas the Bayes-optimal predictions can become misleading.

Figures2–5also show the graphs for the intermediate value a=1000.Sim-ilar results but for di?erent data sets are also given in[51],Section10.3.A general scheme of Bayes-type conformal prediction is described in[51],pp.102–103.

5On-line prediction

We know from Section3that conformal predictors are valid in the sense that the probability of error

y l+1/∈Γ (x1,y1,...x l,y l,x l+1)(12) at con?dence level1? never exceeds .The word‘probability’means‘un-conditional probability’here:the frequentist meaning of the statement that the probability of event(12)does not exceed is that,if we repeatedly generate many sequences

x1,y1,...,x l,y l,x l+1,y l+1,

the fraction of them satisfying Equation(12)will be at most ,to within sta-tistical?uctuations.To say that we are controlling the number of errors would be an exaggeration because of the arti?cial character of this scheme of repeat-edly generating a new training set and a new test example.Can we say that

confidence level (%)% o u t s i d e p r e d i c t i o n i n t e r v a l s Figure 4:Validity

for the Bayes-optimal con?dence predictor.

confidence level (%)m e a n p r e d i c t i o n i n t e r v a l w i d t h Figure 5:E?ciency for the Bayes-optimal con?dence predictor.

the con?dence level 1? translates into a bound on the number of errors for a natural learning protocol?In this section we show that the answer is ‘yes’for the popular on-line learning protocol,and in the next section we will see to what degree this carries over to other protocols.

In on-line learning the examples are presented one by one.Each time,we observe the object and predict its label.Then we observe the label and go on to the next example.We start by observing the ?rst object x 1and predicting its label y 1.Then we observe y 1and the second object x 2,and predict its label y 2.And so on.At the n th step,we have observed the previous examples (x 1,y 1),...,(x n ?1,y n ?1)and the new object x n ,and our task is to predict y n .The quality of our predictions should improve as we accumulate more and more old examples.This is the sense in which we are learning.

Our prediction for y n is a nested family of prediction sets Γ n ?Y , ∈(0,1).The process of prediction can be summarized by the following protocol:On-line prediction protocol

Err 0:=0, ∈(0,1);Mult 0:=0, ∈(0,1);

Emp 0:=0,

∈(0,1);FOR n =1,2,...:

Reality outputs x n ∈X ;

Predictor outputs Γ n ?Y for all ∈(0,1);

Reality outputs y n ∈Y ;

err n := 1if y n /∈Γ n 0otherwise ,

∈(0,1);Err n :=Err n ?1+err

n , ∈(0,1);mult n := 1if |Γ n |>10otherwise ,

∈(0,1);Mult n :=Mult n ?1+mult n ,

∈(0,1);emp n := 1if |Γ n |=00otherwise ,

∈(0,1);Emp n :=Emp n ?1+emp n ,

∈(0,1)END FOR.

As we said,the family Γ n is assumed nested:Γ 1n ?Γ 2n when 1≥ 2.In this

protocol we also record the cumulative numbers Err n of erroneous prediction sets,Mult n of multiple prediction sets (i.e.,prediction sets containing more than one label)and Emp n of empty prediction sets at each con?dence level 1? .We will discuss the signi?cance of each of these numbers in turn.

The number of erroneous predictions is a measure of validity of our con?-dence predictors:we would like to have Err n ≤ n ,up to statistical ?uctuations.In Figure 6we can see the lines n →Err n for one particular conformal predictor and for three con?dence levels 1? :the solid line for 99%,the dash-dot line for 95%,and the dotted line for 80%.The number of errors made grows linearly,and the slope is approximately 20%for the con?dence level 80%,5%for the

examples c u m u l a t i v e e r r o r s a t d i f f e r e n t c o n f i d e n c e l e v e l s Figure 6:Cumulative numbers of errors for a conformal predictor (the 1-nearest neighbour conformal predictor)run in the on-line mode on the USPS data set (9298hand-written digits,randomly permuted)at the con?dence levels 80%,95%and 99%.

con?dence level 95%,and 1%for the con?dence level 99%.We will see below that this is not accidental.

The number of multiple predictions Mult n is a useful measure of e?ciency in the case of classi?cation:we would like as many as possible of our predictions to be singletons.Figure 7shows the cumulative numbers of errors n →Err 2.5%n (solid line)and multiple predictions n →Mult 2.5%n (dotted line)at the ?xed con?dence level 97.5%.We can see that out of approximately 10,000predictions about 250(approximately 2.5%)were errors and about 300(approximately 3%)were multiple predictions.

We can see that by choosing we are able to control the number of errors.For small (relative to the di?culty of the data set)this might lead to the need sometimes to give multiple predictions.On the other hand,for larger this might lead to empty predictions at some steps,as can be seen from the bottom right corner of Figure 7:when the predictor ceases to make multiple predictions it starts making occasional empty predictions (the dash-dot line).An empty prediction is a warning that the object to be predicted is unusual (the credibility,as de?ned in Section 2,is or less).

It would be a mistake to concentrate exclusively on one con?dence level 1? .If the prediction Γ n is empty,this does not mean that we cannot make any prediction at all:we should just shift our attention to other con?dence

levels (perhaps look at the range of for which Γ n is a singleton).Likewise,Γ n being multiple does not mean that all labels in Γ n are equally likely:slightly

increasing might lead to the removal of some labels.Of course,taking in the

examples c u m u l a t i v e e r r o r s , m u l t i p l e a n d e m p t y p r e d i c t i o n s Figure 7:The on-line performance of the 1-nearest neighbour conformal predic-tor at the con?dence level 97.5%on the USPS data set (randomly permuted).Table 2:A selected test example from a data set of hospital records of patients who su?ered acute abdominal pain [15]:the p -values for the nine possible di-agnostic groups (appendicitis APP,diverticulitis DIV,perforated peptic ulcer PPU,non-speci?c abdominal pain NAP,cholecystitis CHO,intestinal obstruc-tion INO,pancreatitis PAN,renal colic RCO,dyspepsia DYS)and the true label.

APP

DIV PPU NAP CHO INO PAN RCO DYS true label 1.23%0.36%0.16% 2.83% 5.72%0.89% 1.37%0.48%80.56%DYS continuum of predictions sets,for all ∈(0,1),might be too di?cult or tiresome for a human mind,and concentrating on a few conventional levels,as in Figure 1,might be a reasonable compromise.

For example,Table 2gives the p -values for di?erent kinds of abdominal pain obtained for a speci?c patient based on his symptoms.We can see that at the con?dence level 95%the prediction set is multiple,{cholecystitis,dyspepsia }.When we relax the con?dence level to 90%,the prediction set narrows down to {dyspepsia }(the singleton containing only the true label);on the other hand,at the con?dence level 99%the prediction set widens to {appendicitis,non-speci?c abdominal pain,cholecystitis,pancreatitis,dyspepsia }.Such detailed con?-dence information,in combination with the property of validity,is especially valuable in medicine (and some of the ?rst applications of conformal predictors have been to the ?elds of medicine and bioinformatics:see,e.g.,[3,35]).In the case of regression,we will usually have Mult n =n and Emp n =0,and so these are not useful measures of e?ciency.Better measures,such as the ones

used in the previous section,would,for example,take into account the widths of the prediction intervals.

Theoretical analysis

Looking at Figures6and7we might be tempted to guess that the probability of error at each step of the on-line protocol is and that errors are made inde-pendently at di?erent steps.This is not literally true,as a closer examination of the bottom left corner of Figure7reveals.It,however,becomes true(as noticed in[48])if the p-values(5)are rede?ned as

p Y:=|{i:αi>αl+1}|+η|{i:αi=αl+1}|

l+1

,(13)

where i ranges over{1,...,l+1}andη∈[0,1]is generated randomly from the uniform distribution on[0,1](theηs should be independent between themselves and of everything else;in practice they are produced by pseudo-random number generators).The only di?erence between Equations(5)and(13)is that the expression(13)takes more care in breaking the tiesαi=αl+1.Replacing Equation(5)by Equation(13)in the de?nition of conformal predictor we obtain the notion of smoothed conformal predictor.

The validity property for smoothed conformal predictors can now be stated as follows.

Theorem1Suppose the examples

(x1,y1),(x2,y2),...

are generated independently from the same probability distribution.For any smoothed conformal predictor working in the on-line prediction protocol and any con?dence level1? ,the random variables err 1,err 2,...are independent and take value1with probability .

Combining Theorem1with the strong law of large numbers we can see that

lim n→∞Err n

holds with probability one for smoothed conformal predictors.(They are‘well calibrated’.)Since the number of errors made by a conformal predictor never exceeds the number of errors made by the corresponding smoothed conformal

predictor,

lim sup

n→∞Err n

≤

holds with probability one for conformal predictors.(They are‘conservatively well calibrated’.)

6Slow teachers,lazy teachers,and the batch setting

In the pure on-line setting,considered in the previous section,we get an imme-diate feedback(the true label)for every example that we predict.This makes practical applications of this scenario questionable.Imagine,for example,a mail sorting centre using an on-line prediction algorithm for zip code recogni-tion;suppose the feedback about the true label comes from a human‘teacher’. If the feedback is given for every object x i,there is no point in having the pre-diction algorithm:we can just as well use the label provided by the teacher. It would help if the prediction algorithm could still work well,in particular be valid,if only every,say,tenth object were classi?ed by a human teacher(the sce-nario of‘lazy’teachers).Alternatively,even if the prediction algorithm requires the knowledge of all labels,it might still be useful if the labels were allowed to be given not immediately but with a delay(‘slow’teachers).In our mail sorting example,such a delay might make sure that we hear from local post o?ces about any mistakes made before giving a feedback to the algorithm.

In the pure on-line protocol we had validity in the strongest possible sense: at each con?dence level1? each smoothed conformal predictor made errors independently with probability .In the case of weaker teachers(as usual,we are using the word‘teacher’in the general sense of the entity providing the feedback,called Reality in the previous section),we have to accept a weaker notion of validity.Suppose the predictor receives a feedback from the teacher at the end of steps n1,n2,...,n1

? ∈(0,1):Err n/n→ (as n→∞)in probability

if and only if n k/n k?1→1as k→∞.In other words,the validity in the sense of convergence in probability holds if and only if the growth rate of n k is subexponential.(This condition is amply satis?ed for our example of a teacher giving feedback for every tenth object.)

The most standard batch setting of the problem of prediction is in one respect even more demanding than our scenarios of weak teachers.In this setting we are given a training set(1)and our goal is to predict the labels given the objects in the test set

(x l+1,y l+1),...,(x l+k,y l+k).(14) This can be interpreted as a?nite-horizon version of the lazy-teacher setting: no labels are returned after step https://www.wendangku.net/doc/5e8143121.html,puter experiments(see,e.g.,Figure8) show that approximate validity still holds;for related theoretical results,see [51],Section4.4.

非语言交际的功能与作用

非语言交际的功能与作用非语言交际可能被认为是不直接依赖于语言使用的一种交流方式，总体来说，在哪个地方区分分开的语言和非语言交流形式是很难的。我们所需要做的是简单地认识到人类互动的许多方面取决于不能用语言所表达的交流形式，但这对我们相互理解是非常重要的。当然我们必须强调在说话和在文字表达的交流的重要性。我们应该知道许多交流不需要语言也能相互交流。一个人在会议上所穿的衣服可能对其他参与者暗示着他对这次会议是有多么认真地准备。实际上，我们可以利用别人认为是一种交流方式的行为或者是表现的任何方面。非语言交际指除了语言之外的所有交际手段，包括肢体语言，服饰如制服，发型，化妆，等等。拿身势语举例：“身势语”同语言一样，都是文化的一部分。在不同文化中，身势语的意义并不完全相同。各民族有不同的非语言交际方式。首先，来看看各国的文化差异。阿拉伯人经常以亲吻脸颊的方式来进行问候。在日本，人们以弯腰来表达问候，在美国，人们会进行握手。在泰国，为了表示另外一个人靠近，人们往往会前后移动手指，手掌向下。在美国，人们为了吸引别人过来，往往举起手掌，对别人移动着手指。汤加人坐下来来表达对长辈的尊敬；而在西方，往往站着。在美国双腿交叉经常是表示轻松的方式；而在韩国，这是社会所忌讳的。在日本，礼物常常用双手交换。穆斯林认为左手不干净，不能用它来吃东西或者是交换物品。佛陀主张沉默是金。而在美国，人们通过谈话来真正表达自己的观点。无论是有意识还是无意识，无论是有目的还是无意，我们都会关于别人的内心想法做出重要的判断和决定---那些他们不能用语言表达的想法。例如，我们可以根据这些非语言信息所表达的意思来评价人际交往的质量高低。在我们和同伴之间从语调到距离再到我们所参与的彼此接触的次数，我们就能收集和同伴间的亲密程度的信号。非语言交际是那么地细微以至于身体区域的一次小移动也能传达一种信息。你的第一次和同伴握手到触碰他（她）的脸颊，你在传达一种信息，如果你的触碰得到回应，那么这种信息就表现得极为重要。非语言交际也很重要，因为我们会利用他们的行为来分析他们的情感或精神特征。如果我们见到一些人紧握拳头，表情严肃，不用说就知道这个人很不高兴。如果我们听到某个人的声音在颤抖，看见他(她)的手在颤抖，无论他（她）说什么，我们可以推测这个人很害怕或者充满好奇。我们的情感在我们的姿态，脸部，眼神中能反映出来，像害怕，高兴，生气，或者是伤心---所以不用一个字我们就可以全盘托出。正因为如此，我们很多人很大程度上都依赖于通过我们的眼睛所学到的东西。实际上，研究表明当语言信息和非语言信息相矛盾时，我们更愿意相信非语言信息而不是语言信息。非语言交际在人类互动过程中表现的很重要，因为它经常要对第一印象负责。思考一下你的第一判断是多久一次的时间建立在一个人皮肤的颜色或者是穿戴方式上的。更重要的是，那些初级信息往往影响对于其他一切有关的东西的看法。甚至连我们怎么选择朋友和伙伴也是建立在第一印象上。我们经常因为别人对自己的吸引力而靠近某些人。非言语交际对人类互动很重要因为我们许多非言语行为不能很有意识地控制住。这意味着它们有些是相对地具有歪曲性和欺骗性。当我们为难时，很难控制红色的脸。当我们生气时，我们同样很难控制紧下巴。在交流中，我们的非言语交际有许多功能和用途。重复的功能一个非言语信息能重复一个言语信息。人们经常使用非语言信息来重复他们努力想表达的观点。我们会做出一只手顶住另一只手的姿势来强调一个人应该暂停，此时此刻我们实际上可以说“暂停”。或者我们在说完“新的图书馆在这幢楼南面”，我们会用手指向某个特定

Chaptr15.交际功能CommunicativeFunction

Chapter15 . 交际功能Communicative Function Part One Have a try ?In the conversation below, two guests are visiting friends at their house. Read the conversation and answer the following 3 questions. ①There are four speakers, A, B, C and D. Which ones live at the house, and which ones are visitors? ②Does everyone know each other? How do you know? ③ A says: “Shall I just put these upstairs?”What do you think these are? A: Actually, I wonder if they’re in. Oh, they are in. B: They obviously are. C: Hello. A: Hello. C: Come in. B: I’m Mike. C: How are you? B: Fine. A: Shall I just put these upstairs? C: Well, yeah. Can you put them in our room, please? A: Sure. C: How were the roads? A: Oh, fine. No problem. B: No problems. No. A: Are you in there, Alison? Mmmm. Hello, there. D: Hello. A: Do you mind if I put my bag here? D: Oh, go ahead. Want a cup of tea? A: Yeah. ?Match these questions from the conversation to their functions. Shall I just put these upstairs? Can you put them in our room, please? Do you mind if I put my bag here? a request asking for permission an offer

语言的作用是为了交际

语言的作用是为了交际，语言教学的目的是要教会学生使用语言。中学英语教学中如何培养学生的初步运用英语进行交际的能力成了刻不容缓的任务。一、听是进行交际的前提要学会一种语言，第一步就是听。只有身临其境, 置身于语言环境之中，才能收到良好的学习效果。所以，在英语课堂上，教师应最大限度地给学生以听的训练，使其不但听见，而且听懂。同时用神态表情的夸张表露与肢体语言的显性传递，以及借助身边事物来帮助学生理解教师的说话意图。在课堂上坚持用英语讲解、操练，并利用实物、图片、投影、录音等直观手段，给学生以启发，使他们在视觉、听觉方面得到统一，学生就会逐渐提高听的能力，不懂的句子渐渐懂了，不熟悉的句子渐渐听熟悉了，听力提高了，开口就有了保证。说与听是紧密联系在一起的，二者密不可分。影响中学生英语口语交际的主要问题是开口说，在学生有了一定的语言基础后，首先可以通过培养学生提问的能力来解决开口说的问题，帮助学生开展简单的交流。积极主动地用英语提问是展开交流、培养交际能力的一种重要的方法。提问方式多种多样，可互相提问；可借助于图样、幻灯片等提问；或对一幅图画提问；然后让学生互相来回答。这种训练学生提问的方法，既能提高学生的英语思维能力，又能提高学生的英语口语交际能力。当然，口语训练的方法很多，只有通过多种多样的语言实践，让学生在真实的语言环境中感受地道的英语和锻炼自己的语言表达能力，才能真正提高他们的口语水平和交际能力。二、克服心理障碍，激发学生运用英语交际的兴趣中学生中普遍存在的自卑、害怕、羞于开口、畏难等是导致他们在英语口语学习中消极、被动的主要心理原因。为此，应对其进行适时合理的心理调节，消除心理障碍，引导那些存在着心理障碍的中学生们正确进行英语口语学习。作为老师应让学生认识到，英语口语学习并不可怕，在口语交际中出错是很正常和自然的现象，谁都免不了要出错，口语学习之初最重要的不是能不能说，说得好不好的问题，而是敢不敢说的问题。而且在让学生用英语交流时，学生说错了，不要批评，鼓励他下次继续努力，说对了更应该表扬。并且口语交流重在思想的传达，只要能达到交流的目的，有一些错误是可以忽略的，并且随着训练的深入、水平的提高，错误可以逐渐减少，流利性和准确性俱佳的英语口语能力是可以获得的。三、培养用英语思维的习惯，加强英语思维能力的训练在教学中教师要坚持“ 尽量使用英语，适当利用母语” 的教学原则，以减少学生对母语的依赖性，母语对英语教学的负迁移。教师的教和学生的学都尽量不用母语为中介的翻译法，即使使用也应该加强分析对比。要求学生使用英汉双解词典并逐步过渡到使用英英词典，这有利于学生准确掌握词汇的内含和外延，因为用一种语言解释另一种语言不一定都能做到一一对应完全准确。教师都有这样的经历，即有些英语词句用汉语很难解释，甚至会出现越解释越难的现象。在这种情况下，我们应该经常给出一些包含该词、句的句子，让学生在具体的语境中去猜测、理解。所给出的语境应尽力和该词句所处的语境相似，而且是学生熟悉或容易接受的，这样既可以给学生的理解以铺垫，达到帮助学生理解掌握词、句的目的，又能增强语言实践的量，也能有效提高学生的英语理解能力，有助于培养学生运用英语思维的习惯。有了好的英语思维习惯，学生口语能力的提高也就有了一定的前提保障。四、创设情景，鼓励学生大胆开口，培养说的能力我国古代教育家孔子曰：“知之者不如好之者，好之者不如乐之者。”就已经道出了兴趣的重要性。俗话说：“兴趣是最好的老师” 。只有激发学生对口语表达产生浓厚的兴趣，才能使教师的教学与训练有所起色。因此，给学生设置一种语言学习的环境，让学生身置语境，激发学生的兴趣，通过语言气氛的感染，在自觉不自觉的状态中去说，以期达到掌

《跨国经营学》第九章volkswagen hedging strategy案例分析中英文

《跨国经营学》大作业（15-16-1）案例 Evolution of Strategy at Procter & Gamble 选自Charles W.L.Hill《International Business》(9ed.) 学生梅先婷学号 13429109 专业班级营销131 小组成员刘金星梅先婷彭娇娇日期第十七周

Case synopsis V olkswagen is Europe’s largest car maker. Many multinational companies like V olkswagen do hedging of their international currencies to minimize their risk of adverse exchange rates. Generally, V olkswagen hedges about 70% of their international currency. But, in 2003, V olkswagen decided to hedge only 30% of their currency against the US Dollar. V olkswagen manufactures its cars in Europe and exports to US. One of their best-selling models Jetta costs about €14000 to make in Germany and sells for $15000 in US. At the time of manufacturing the Euro to Dollar value was 1 to 1. So with the then current exchange rate, Jetta made V olkswagen about $1000 or €1000 profit per car. During the year 2003, there was a sharp rise in Euro against Dollar. Earlier which was €1=$1, now in 2003 being traded as €1=$1.25. So, each dollar earned will now only profit €0.80 which squeezed V olkswagen. At the exchange rate of €1=$1.25, V olkswagen only gets €12000 per Jetta sold, that to say V olkswagen lost €2000 per car sold in the US. Unfortunately, V olkswagen only had 30% of their currency hedged to €1=$1 price. This drastic increase in Euro value caused a major drop in operating profit of V olkswagen and its fourth quarter profits drop 95 percent to mere €50 million. Traditionally V olkswagen had hedged 70% of their currencies. In that case, they wouldn’t have lost such a huge operating profit due to the exchange rates hike. Strategy to buying forward guarantees that at some future point, V olkswagen would be able to exchange its US earned dollars in to Euros at the Pre-Determined rates of €1=$1, regardless of what the current exchange price is. In case of depreciation in the price of Euro, V olkswagen would have made more profits by hedging less of its currency which seems the case being anticipated. Hedging is also costly as the foreign exchange dealers will charge a high commission for forward currency selling, but after such a huge loss in their profit, V olkswagen decided to get back to its historic hedging of 70% of its foreign currency exposure. knowledge point： The foreign exchange market is a market for converting the currency of one country into that of another country.An exchange rate is simply the rate at which one currency is converted into another. The foreign exchange market serves two main functions.The first is to convert the currency of one country into the currency of another.The second is to provide some insurance against foreign exchange risk,or the adverse consequences of unpredictable changes in exchange rates. The spot exchange rate is the rate at which a foreign exchange dealer converts

交际语言教学法

交际语言教学法内容提要：本文介绍了交际语言教学法这一以培养学习者的语言运用和交际能力为主要目标的语言教学方法，介绍了其发展，主要内容和特点并结合实例就此方法在外语教学中的运用进行了分析。关键词：交际语言教学法外语教学语言交际能力随着我国的社会、经济、文化等活动进一步融入到国际化和全球化的体系之中，我国对于外语人才的语言交际能力的要求也逐渐提高。作为外语教学工作者，能够在教学工作中有效地使用交际语言教学法对于组织教学过程，使学生更好地达到学以致用的目的具有重要意义。本文就交际语言教学的起源，特点以及如何在外语教学中应用此方法进行了逐一探讨。交际语言教学（Communicative Language Teaching）产生于七十年代初期，社会语言学家海默斯在1971年发表的《论交际能力》（On Communicative Competence）被认为是交际法的直接理论根据。其创始人之一是英国语言学家 D. A. Wilkins，1976年维尔金斯出版了《意念教学大纲》（Notional Syllabuses）一书，把交际法置于更可靠的基础之上。交际语言教学法经过近30年的发展已逐渐成为一种为世界语言教学界所普遍认同的教学思想和方向。它的理论主要来自社会语言学、心理语言学，并受到话语分析、语言哲学、人类学、社会学等多门学科的影响。交际法认为语言是交际的工具，学会一种语言不仅要掌握其语言形式和使用规则，还要学会具体运用，也就是说

要知道在什么场合运用。其核心是教语言应当教学生怎样使用语言，用语言达到交际的目的，而不是把教会学生一套语法规则和零碎的词语用法作为语言教学的最终目标。所以交际教学法强调的是要教授语言功能方面的知识，学生如果没有掌握这门语言的交际本领，没有具备这门语言交际方面的能力，就不能说学会了这一门外语。交际教学法强调要把学生真正置于尽可能真实的交际场景中，并且要由学生亲历一种活的交际活动的过程。根据交际教学法的原则，教师和学生都要注重运用所学外语进行真实的课堂以及课外交际活动，在尽量模拟现实的交际情景中来进行教学和学习，才能有利于培养学生的语言交际能力。本文以下部分就交际教学法的原则谈谈运用此方法进行教学的一些具体方式和做法。一．尽可能在课堂教学中运用真实的交际场景我们传统的外语教学概念中多注重语法形式是否完善，在交际教学法中更注重的是交流者是否能正确流利的表意，即是否“get the ideas across”。而这种侧重面对于培养我国所急需的实用型外语人才具有重要意义。因此，在教学中要尽可能多地运用真实场景来训练学生的实际语言应用及应变能力。而对于真实场景的选用则要有一定的取舍，我们应尽量选择符合实际需要或符合中国大学生在今后工作中实际应用的语言情景。例如作为旅游院校的学生，今后工作中在很大程度上要接触或进入旅游行业，那么选用和旅游相关的场景则比较合适。笔者在教授英语专业成人本科三年级口语课时就选用了在带外国旅游团过程中可能发生的真实场景让学生进行角色扮演，进行

Hedging

Hedging Definition: In finance, a hedge is a position established in one market in an attempt to offset exposure to price changes or fluctuations in some opposite position with the goal of minimizing one's exposure to unwanted risk. The hedging portfolio will need to be modified whenever the variables affecting the option change with time. In a realistic case of non-zero transaction costs these modifications cannot be performed too frequently and some compromise strategy may be required. Hedging is a way of reducing uncertainty over the future path of volatile commodity prices such as the cost of fuel. Now, let’s take Delta Hedging as an example to explain the hedging process. Consider a portfolio whose value depends on the current stock price S=S (0) and is hence denoted by V(S). Its dependence on S can be measured by the derivative , called the delta of the portfolio. For small price variations from S to S + ΔS the value of the portfolio will change by . The principle of delta hedging is based on embedding derivative securities in a portfolio, the value of which does not alter too much when S varies. Why hedge: Hedging plays an important role in financial market. Here gives a brief description of the key ideas that why hedge and hedging brings preferable values. Increased borrowing capacity. Reducing the volatility of corporate value will increase the willingness of lenders to provide debt. Costs of financial distress. Hedging, by reducing the variance of future cash flows will reduce the chances of a corporation with debt falling into financial distress. Managerial reasons for hedging. The management of a corporation might elect to hedge financial risks for reasons other than shareholder value. Progressive tax scales. If the company tax scale is progressive, with higher marginal tax rates for higher income level, then companies with less volatile income will pay less tax on average. Hedging strategies: Examples of hedging include: Forward exchange contract for currencies Currency future contracts Money Market Operations for currencies Forward Exchange Contract for interest (FRA) Money Market Operations for interest Future contracts for interest A case study:

语言交际的重要性

语言交际的重要性每个人在很小的时候就开始学习语言，它是我们日常交流和沟通中必不可少的工具，也是是人类最基本、最重要的一种生存能力和社会行为。社会的存在和发展不能没有语言，不能没有语言的运用，也不能没有言语交际活动。而在当今社会，一个人的语言水平、言语行为和言语交际能力，完全可以体现出他的道德修养的深与浅，人格教养的好与坏，个人能力的高与低。语言是一种灵活的东西，一句话可以表达多种意思，如“我要炒饭。”可以理解为我要做炒饭，也可以理解为，我要吃份炒饭。你可以用灵活的话语教导他人，不会热闹别人，同时也具有说服性。我看过一则某国的交通安全广告:驾驶汽车时速不超过30 英里,你可以饱览本地美丽景色;超过60英里,请到法庭做客;超过80英里,欢迎光顾本地设备最新的急救医院;上了100英里,请君安息吧! 汽车时速“超过60英里”,即违反了该国的交通安全法规,属于违法行为,应受到法律的惩处,故委婉地表达为“请到法庭做客”。“光顾”本为“敬辞”,多用于商店表示对顾客的欢迎。车速得到80英里,很可能会发生车祸,受伤人员要到医院进行抢救,这后果多严重!故委婉地说是“光顾”。“安息”多用于对死者的悼念。车速达100英里,定会车毁人亡,它不说“你肯定会死”,而委婉地说为“请君安息吧”。这幅广告语言表现出风趣幽默的风格,对那些违章行车者有着很好的劝导和警示作用。会用语言就不代表会交际，善于交际的人往往在职场上会很占优势，同时在大家交谈中可以活跃气氛，给大家留有好的印象，反而不善于交际的人不但在职场上处于略势，而且还容易被大家忽视和排斥。所以懂得交际，善于交际很重要。在与人交流时，要避开人们避讳的，反感的话题，评论他人时要委婉一点，否则即使你很能说，也会让人感到厌恶。有一个故事，春秋时期发生过一件事，烛邹替齐景公饲养的爱鸟不小心飞走了，景公发怒要杀烛邹。在这千钧一发的时候，宰相晏婴站出来说：“烛邹这书呆子有三大罪状，请大王让我列举完以后，再按罪论处。”得到景公的允许后，晏婴把烛邹叫到景公的面前说：“你为大王管理着爱鸟，却让它飞走了，这是第一条罪状；你使得我们大王因为鸟的事杀人，这是第二条罪状；更严重的是各国诸侯听了这件事后，以为大王重视鸟而轻视知识分子，这是第三条罪状。”数完这些所谓罪状后，便请景公把烛邹杀掉。景公尽管残忍，但从晏婴的话里听出了利害，就对晏婴说：“不要杀了，我听从你的命令就是了。”从这个故事可以看出委婉地说话不但可与教育他人，还可以救人的性命。要想提高语言交际能力，从这三个方面入手，首先言语交际主体因素，交际主体是指言语交际的参与者，即从事言语交际活动的个人和团体。其中的团体指各种政府或民间组织，当然也包括媒介组织。言语交际是一个涉及交际主体的双向互动过程，包括说话者的话语选择和听话者对话语的理解。交际中表达者和接受者都是交际的参与者，而且双方表达和接受的地位随时变换。所以，严格地说，言语交际的双方都应当视为交际主体。所以我们每个人都是话语的接受者又是话语的表达者，会说话对我们每个人都很重要。然而，预设因素在言语交际话语选择和理解的动态过程中充当着一个变量，它可以在很大程度上制约交际主体的言语行为。因此，言语交际主体在言语交际活动中应在充分考虑预设因素的基础上随时调整自己的言语行为，避免给交际带

HEDGING的交际功能

. HEDGING的交际功能浙江财经学院外语系潘晓霞* 浙江大学外国语言文化与国际交流学院黄建滨** 摘要：HEDGE/HEDGING作为模糊语言学中的特殊语言现象，越来越受到国内外学者的关注。然而随着对其研究的不断深入，HEDGE/HEDGING的概念也不断地发生变化。然而对这一概念学者们至今未达成一致，不同的学者提出了不同的定义。因而，系统地回顾HEDGE/HEDGING的研究很有必要。本文首先概述HEDGE/HEDGING 的概念演变过程，接着简要回顾HEDGE/HEDGING在不同领域的研究情况，然后分析和探讨其主要交际功能，并在文章的最后提出用“模糊调和”来指称HEDGING可能会比“模糊限制语”更合适。关键词：模糊限制语交际功能语篇功能人际功能礼貌策略 1. HEDGE和HEDGING研究综述 1.1 HEDGE和HEDGING的概念演变语言学字典中解释HEDGE和HEDGING这两个概念的词条很少，对它们的定义多半基于LAKOFF最先提出的定义。HEDGE 作为一个语言学术语是由美国语言学家https://www.wendangku.net/doc/5e8143121.html,KOFF（1972）最早提出的，虽然在这之前，ZADEH（1965）和WEINREICH（1966）已注意到了这种语言现象和概念。根据LAKOFF的定义，HEDGES指的是那些“将事物弄得模模糊糊，或将原本模糊的事物弄得不那么模糊的词语”（LAKOFF，1972，234），诸如sort of, strictly speaking。ZADEH（1972）按照LAKOFF 的定义从语义学和逻辑学的角度分析了英语中的HEDGES，诸如very，slightly，technically，practically。 HEDGE作为一个语言学术语最初指的是一种用来修饰一述语或名词短语的成员隶属关系的表达，通常为一词语或短语。由于这类词或短语有着共同的特征，即可以改变某些词的模糊程度，或者说它们在某种程度上起了限制的作用，所以在最早的译文中廖东平（1982）用“模糊限制词”对应英语中的HEDGES。继而，何自然（1985，27）改用“模糊限制语”，按照LAKOFF 的定义，“一些令听话者得不到确切信息的词语，如kinda（=kind of），sort of等是模糊限制语；而一些表达推测或不确定含义的词语，如I guess，I think等，也算是模糊限制语。”之后，中国学者便一直沿用“模糊限制语”来介绍国外有关HEDGE的研究成果，或研究英语中的这一语言现象（孙1986，陈&李1994，李1995，赵1999，何&姜1994，冯1999，杨2001）。然而，随着国外语言学界对HEDGE研究的不断深入，HEDGE的概念发生了很大的变化，不再停留在原先的层面上。自二十世纪七十年代早期以来，尤其自HEDGE被语用学家和语篇分析学家采用以来，HEDGE 的概念就已远远偏离它最初的含义。这个术语不再仅指用于修饰一述语或一名词短语的成员隶属关系的表达。早在二十世纪七十年代，FRASER（1975）和https://www.wendangku.net/doc/5e8143121.html,KOFF（1977）就曾注意到某些动词和句法结构传达各种模糊限制施事语(hedged performatives) (e.g. I suppose/guess/think that Harry is coming; Won’t you open the door?)。模糊限制施事语的提出拓宽了HEDGE的概念。此外，HEDGE还用来修饰说话者对整个命题的真值性所负的责任，而不仅仅修饰话语中某一部分的成员隶属关系。一些学者，如VANDA KOPPLE （1985），认为HEDGES (e.g. perhaps, seem, might, to a certain extent）是修饰整个命题的真值性而不是使其中个别成分变得模糊。HEDGE概念的这种扩展使一些研究者认为应该区分两类HEDGES。 *潘晓霞，女，浙江财经学院外语系教师，硕士，研究方向：语用学，词汇学。 **黄建滨，男，浙江大学外国语学院教授，教育部大学外语教学指导委员会委员，研究方向：词汇学，语言教学。通讯地址：310058 浙江杭州浙江大学紫金港校区外国语学院。E-mail：jianbin32@https://www.wendangku.net/doc/5e8143121.html, 或huangjb@https://www.wendangku.net/doc/5e8143121.html,

语言交际在生活中的作用与影响

。言语交际时，要注意语言得体。语言表达“得体” ，指能够恰当使用语言，符合语境和语体的要求。“语境”有“内部语境”和“外部语境”“内部语境”主要指文。章的上下文，如文体、句式、语言间的搭配和使用习惯等。“外部语境”指言语交际时的各种情境条件，如说话的目的，说话的场合，需要表达的方式，发话者的身份、职业、处境，受话者的年龄、性别、经历、思想性格、爱好、文化水平、心理需求、职业处境等。在什么场合和什么人要说什么话？比如说，我们不能在火葬场的附近贴上如是标语“计划生育利国利民“，也不能让医院的导诊在送走病人时说“欢迎光临” ，不能在耄老人的寿宴上唱“东边的太阳就要落山了……” 言语既是交际的一种心理现象，也是展现交际心理的一种过程，所以在进行言语交际的同时就必须做到说话得体，恰如其分。这也会让对方有一个良好的印象，继而进行下一步的交际活动。然而，任何夸大其词，或是不看对象，词不达意，都会影响交际心理的展现，妨碍相互间的交流。例如怎样称呼别人，这中间就大有文章。两人见面，第一个词便是称呼，它既是见面礼，也是进入双方交际大门的通行证。在交际活动中，言语要很注意其分寸，该说则说，不该说则一句都不说。这些原则也要看自己和对方的熟悉程度而言的，总的来说，就是应视对象和交际目标而定。交际中难免会有赞美和祝贺之词，当然也会有批评等负面的说辞。所以，在言语交际的时候需要注意言语的得体，以保持与别人之间默契的联系。 “舌为利害本，口是祸福门。＂“语言切勿刺入骨髓，戏谑

切勿中人心病。”“当着矮人，别说短话。“良言一句三冬暖，恶语伤人六月寒。”希望我们所有人在交往的时候都能很好的使用语言，让语言的艺术浇出美丽的鲜花！们的交际是个十分复杂的过程，受各种不同的交际对象、交际目的的限制。只要我们恰当的运用交际语言，交际语言定会在为满足人们日趋复杂的社会生活需要发挥更大的作用。所以如何做好语言交际就显得尤为重要。

语言交际功能课堂教学

注重语言交际功能的对话教学活动设计语言的本质是交际。交际能力是指人们能有效地运用一种语言知识进行人际交往和文字交流的能力。英语课堂教学中怎样通过言语交际来培养学生的综合语言运用能力是我们语言教学的目标。如何围绕“语言交际功能”来设计英语课堂教学，特别是利用对话这一题材来设计活动，多年来是英语教师极为关注的问题。我认为，在对话教学设计的过程中，要注意各个环节：呈现理解时凸显语言功能，操练巩固时发挥语言的交际功能，拓展活动中运用语言的交际功能，这样的设计才能以学生为本，激发和培养学生学习英语的兴趣，提高学生英语语言运用能力。一、呈现理解时凸显语言功能 1. 创设符合并略高于学生认知水平的情景。小学英语对话教学中，老师在对话教学一开始就应利用各种媒介（实物、图片、多媒体、动作、表情和语调等）巧妙地设置情景，通过视、听、说活动呈现新句型，让学生感受其应用的场合和意义。创设的情景要略高于学生的认知水平，让他们“跳一跳，就能摘到苹果”，学到知识。 2. 利用多媒体材料创设对话语境。多媒体课件形象生动的特点在英语教学中起着重要的作用，现在已经广泛地运用在各科的教学中。教学五年级下册unit 4what are you doing 一课时，用课件展示了一幅动物王国中各种动物和谐相处的画面，通过听声音猜动物的活动，复习已学句型“what’s this（that）？/what are these（those）？”并引出新句型，具体步骤如下：t：（让学生听老虎叫声）what’s this？

s：it’s a tiger. t：right. look！what’s the tiger doing？这时，屏幕上老虎弹钢琴的画面逐渐放大，并传出悠扬的琴声。根据这一有趣的情景，学生很快便领悟了新句型的意思，有的学生甚至能用“the tiger is playing the piano”回答问题。接着，屏幕上又出现一只笨狗熊扭动着肥胖身体跳舞的画面，笔者让男女学生用新句型轮换回答，从而加深了学生对新句型的理解。 3. 对话呈现注重交际情景的真实性和趣味性。这些源于生活并与学生生活实际为基础的活动，对学生英语语言交际能力的培养十分重要。在教学四年级下册unit5how much is it？part a let’s talk 这一部分时，将教室布置成一个服装店，拉开帘子，露出了事先准备好的衣服架子，架子上有各种衣服，衣服上有大的价格标签。教师扮演营业员，出售衣服，学生和老师进行实际交际活动，从而引出句型：can i help you？how much is it（are they）？运用直观教具配合动作、表情和手势等，呈现了一个购物的真实情景，学生在今后的学习中就能明白can i help you 这一句型真实的运用场景。二、操练巩固时发挥交际性能 1. 创设符合学生实际的真实交际情景。句型教学中的交际性操练是指运用所学句型开展具有信息沟通的活动。这是一种在模拟的或真实的情景中进行的语言操练。教师应尽量运用学生熟练掌握的句型，提出真实性的问题或开展运用所学句型的游戏活动，以培养学生用英语做事的能力。 2. 灵活运用文本，适当进行增删。教材是教学的依据，在教学

态势语言在口语交际中的作用

态势语言在口语交际中的作用 [摘要] 态势语言作为有声语言的一种辅助形式,是口语交际的重要辅助手段,是人们传递信息的重要方式之一。但在日常沟通交流中,人们往往只重视有声语言的表达而常常忽略态势语的作用。要想使口语交际取得良好效果,态势语言起着举足轻重的作用。 [关键词] 态势语言口语交际作用人类传递信息的方式主要有三种,即书面语言、口头语言和态势语言。态势语言也叫体态语言、无声语言,是指人际交往中用以表情达意的姿态、神情和形体动作。它作为有声语言的一种辅助形式,是口语交际的重要辅助手段,甚至可以部分代替有声语言或表达有声语言难以表达的感情和态度。美国心理学家艾伯特-梅瑞宾认为,在一条信息的传递效果中,词语的作用只占7%,声音的作用占38%,而面部表情占55%。正如体态语研究的先驱欧文•戈夫曼所说:“尽管一个人可能停止说话,但是他不能停止身体习惯动作的传播。”所以,要想取得良好的沟通效果,一定要注意态势语的运用。一、态势语的种类在日常口语交流中,态势语言几乎包括人们的各个方面,涵盖面广,非常丰富,根据表情部位,主要可分为以下几种: 1.表情语,指人的面部表情,即通过面部表情来交流情感,传递信息的语言。表情语的核心是目光语和微笑语。目光语是通过眼的动作和眼神来传递信息的。微笑语是通过面部的笑容传递和善、有好信息的特殊语言。目光和微笑可以表达内心的情感、愿望和信任等,传达着最丰富、最有效的信息。 2.手势语,即通过人的上肢特别是手来传递信息的一种表现力很强的态势语。据其功能和内容可以分为情意手势,即用来表达说话人的思想情感、态度、意向的手势。象形手势,即用来模拟和比划事物以引起听者联想和想像的手势。象征手势,即用来表示较为复杂的情感和抽象的概念。指示手势,即用来指明谈论的具体对象。 3.体姿语,指通过身体的姿势、动作来表达情感、传递信息的态势语。主要包

HEDGE ACCOUNTING (Part 1)

HEDGE ACCOUNTING (Part 1) ACCA P2 考试：HEDGE ACCOUNTING (Part 1) The article explains the basic principles of hedging and the current accounting regulations as set out in IAS 39, Financial Instruments: Recognition and Measurement(IAS 39). The article concludes by considering the weaknesses of IAS 39 and how those weaknesses are addressed by the proposed changes issued by the IASB in September 2012. BASIC PRINCIPLES OF HEDGING Are you risk adverse? I think I am. For example, as a property owner I have an insurance policy to protect me from the risk of incurring a loss if my house were to burn down. Companies will face many risks and if they seek to cover these risks then they are said to be hedging. Hedging therefore is a risk management process whereby risk adverse companies firstly identify and quantify that they have a risk and secondly seek to cover that risk. THE HEDGED ITEM Risks come in many forms for companies. For example there is a risk that the fair value of assets and liabilities that they hold might increase or decrease, that in future the price of the goods they buy or sell might change, that interest rates on their borrowings or deposits might change, and that foreign exchange rates may move. A hedged item is defined as an item that exposes the entity to risk of changes in fair value or future cash flows and is designated as being hedged. THE HEDGING INSTRUMENT In order to protect themselves from losses on hedged items companies enter into contracts to cover any loss arising. These contracts often not only eliminate the risk but also eliminate any potential gain. These contracts are termed the hedging instrument. A hedging instrument is defined as a contract whose fair value or cash flows are expected to offset changes in the fair value or cash flows of a designated hedged item. Hedging instruments are normally a type of financial instrument known as a derivative. I have written about the accounting for financial instruments (see 'Related links'). To recap, a financial instrument is a contract that gives rise to a financial asset of one entity and a financial liability or equity instrument of another entity. A derivative is so called because its value changes in response to the change in an underlying variable such as an interest rate, a commodity, a security price, or an index. Derivatives often require no initial investment, or one that is smaller than would be required for a contract with similar response to changes in market factors; and are settled at a future date. An example of a derivative is a forward contract. Forward contracts are contracts to purchase or sell a specific quantity of something, eg a commodity, or a foreign currency at a specified price determined at the outset, with delivery or settlement at a specified future date. For example a farmer may enter into a forward contract with a supermarket to sell in 12 months a specific amount of crop at a certain price. In this way the producer (the farmer) is