文档库 最新最全的文档下载
当前位置:文档库 › Large-scale Incremental Processing Using Distributed Transactions and Notifications

Large-scale Incremental Processing Using Distributed Transactions and Notifications

Large-scale Incremental Processing Using Distributed Transactions and Notifications
Large-scale Incremental Processing Using Distributed Transactions and Notifications

Large-scale Incremental Processing Using Distributed Transactions and Noti?cations

Daniel Peng and Frank Dabek

dpeng@https://www.wendangku.net/doc/4a15773122.html,,fdabek@https://www.wendangku.net/doc/4a15773122.html,

Google,Inc.

Abstract

Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents ar-rive.This task is one example of a class of data pro-cessing tasks that transform a large repository of data via small,independent mutations.These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput require-ments of these tasks:Google’s indexing system stores tens of petabytes of data and processes billions of up-dates per day on thousands of machines.MapReduce and other batch-processing systems cannot process small up-dates individually as they rely on creating large batches for ef?ciency.

We have built Percolator,a system for incrementally processing updates to a large data set,and deployed it to create the Google web search index.By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator,we process the same number of documents per day,while reducing the average age of documents in Google search results by50%.

1Introduction

Consider the task of building an index of the web that can be used to answer search queries.The indexing sys-tem starts by crawling every page on the web and pro-cessing them while maintaining a set of invariants on the index.For example,if the same content is crawled un-der multiple URLs,only the URL with the highest Page-Rank[28]appears in the index.Each link is also inverted so that the anchor text from each outgoing link is at-tached to the page the link points to.Link inversion must work across duplicates:links to a duplicate of a page should be forwarded to the highest PageRank duplicate if necessary.

This is a bulk-processing task that can be expressed as a series of MapReduce[13]operations:one for clus-tering duplicates,one for link inversion,etc.It’s easy to maintain invariants since MapReduce limits the paral-lelism of the computation;all documents?nish one pro-cessing step before starting the next.For example,when the indexing system is writing inverted links to the cur-rent highest-PageRank URL,we need not worry about its PageRank concurrently changing;a previous MapRe-duce step has already determined its PageRank.

Now,consider how to update that index after recrawl-ing some small portion of the web.It’s not suf?cient to run the MapReduces over just the new pages since,for example,there are links between the new pages and the rest of the web.The MapReduces must be run again over the entire repository,that is,over both the new pages and the old pages.Given enough computing resources, MapReduce’s scalability makes this approach feasible, and,in fact,Google’s web search index was produced in this way prior to the work described here.However, reprocessing the entire web discards the work done in earlier runs and makes latency proportional to the size of the repository,rather than the size of an update.

The indexing system could store the repository in a DBMS and update individual documents while using transactions to maintain invariants.However,existing DBMSs can’t handle the sheer volume of data:Google’s indexing system stores tens of petabytes across thou-sands of machines[30].Distributed storage systems like Bigtable[9]can scale to the size of our repository but don’t provide tools to help programmers maintain data invariants in the face of concurrent updates.

An ideal data processing system for the task of main-taining the web search index would be optimized for in-cremental processing;that is,it would allow us to main-tain a very large repository of documents and update it ef?ciently as each new document was crawled.Given that the system will be processing many small updates concurrently,an ideal system would also provide mech-anisms for maintaining invariants despite concurrent up-dates and for keeping track of which updates have been processed.

The remainder of this paper describes a particular in-cremental processing system:Percolator.Percolator pro-vides the user with random access to a multi-PB reposi-tory.Random access allows us to process documents in-

Figure1:Percolator and its dependencies dividually,avoiding the global scans of the repository that MapReduce requires.To achieve high throughput, many threads on many machines need to transform the repository concurrently,so Percolator provides ACID-compliant transactions to make it easier for programmers to reason about the state of the repository;we currently implement snapshot isolation semantics[5].

In addition to reasoning about concurrency,program-mers of an incremental system need to keep track of the state of the incremental computation.To assist them in this task,Percolator provides observers:pieces of code that are invoked by the system whenever a user-speci?ed column changes.Percolator applications are structured as a series of observers;each observer completes a task and creates more work for“downstream”observers by writing to the table.An external process triggers the?rst observer in the chain by writing initial data into the table. Percolator was built speci?cally for incremental pro-cessing and is not intended to supplant existing solutions for most data processing https://www.wendangku.net/doc/4a15773122.html,putations where the result can’t be broken down into small updates(sorting a?le,for example)are better handled by MapReduce. Also,the computation should have strong consistency requirements;otherwise,Bigtable is suf?cient.Finally, the computation should be very large in some dimen-sion(total data size,CPU required for transformation, etc.);smaller computations not suited to MapReduce or Bigtable can be handled by traditional DBMSs.

Within Google,the primary application of Percola-tor is preparing web pages for inclusion in the live web search index.By converting the indexing system to an incremental system,we are able to process individual documents as they are crawled.This reduced the aver-age document processing latency by a factor of100,and the average age of a document appearing in a search re-sult dropped by nearly50percent(the age of a search re-sult includes delays other than indexing such as the time between a document being changed and being crawled). The system has also been used to render pages into images;Percolator tracks the relationship between web pages and the resources they depend on,so pages can be reprocessed when any depended-upon resources change. 2Design

Percolator provides two main abstractions for per-forming incremental processing at large scale:ACID transactions over a random-access repository and ob-servers,a way to organize an incremental computation.

A Percolator system consists of three binaries that run on every machine in the cluster:a Percolator worker,a Bigtable[9]tablet server,and a GFS[20]chunkserver. All observers are linked into the Percolator worker, which scans the Bigtable for changed columns(“noti-?cations”)and invokes the corresponding observers as a function call in the worker process.The observers perform transactions by sending read/write RPCs to Bigtable tablet servers,which in turn send read/write RPCs to GFS chunkservers.The system also depends on two small services:the timestamp oracle and the lightweight lock service.The timestamp oracle pro-vides strictly increasing timestamps:a property required for correct operation of the snapshot isolation protocol. Workers use the lightweight lock service to make the search for dirty noti?cations more ef?cient.

From the programmer’s perspective,a Percolator repository consists of a small number of tables.Each table is a collection of“cells”indexed by row and col-umn.Each cell contains a value:an uninterpreted array of bytes.(Internally,to support snapshot isolation,we rep-resent each cell as a series of values indexed by times-tamp.)

The design of Percolator was in?uenced by the re-quirement to run at massive scales and the lack of a requirement for extremely low latency.Relaxed latency requirements let us take,for example,a lazy approach to cleaning up locks left behind by transactions running on failed machines.This lazy,simple-to-implement ap-proach potentially delays transaction commit by tens of seconds.This delay would not be acceptable in a DBMS running OLTP tasks,but it is tolerable in an incremental processing system building an index of the web.Percola-tor has no central location for transaction management; in particular,it lacks a global deadlock detector.This in-creases the latency of con?icting transactions but allows the system to scale to thousands of machines.

2.1Bigtable overview

Percolator is built on top of the Bigtable distributed storage system.Bigtable presents a multi-dimensional sorted map to users:keys are(row,column,times-tamp)tuples.Bigtable provides lookup and update oper-ations on each row,and Bigtable row transactions enable atomic read-modify-write operations on individual rows. Bigtable handles petabytes of data and runs reliably on large numbers of(unreliable)machines.

A running Bigtable consists of a collection of tablet servers,each of which is responsible for serving several tablets(contiguous regions of the key space).A master coordinates the operation of tablet servers by,for exam-ple,directing them to load or unload tablets.A tablet is stored as a collection of read-only?les in the Google

SSTable format.SSTables are stored in GFS;Bigtable relies on GFS to preserve data in the event of disk loss. Bigtable allows users to control the performance charac-teristics of the table by grouping a set of columns into a locality group.The columns in each locality group are stored in their own set of SSTables,which makes scan-ning them less expensive since the data in other columns need not be scanned.

The decision to build on Bigtable de?ned the over-all shape of Percolator.Percolator maintains the gist of Bigtable’s interface:data is organized into Bigtable rows and columns,with Percolator metadata stored along-side in special columns(see Figure5).Percolator’s API closely resembles Bigtable’s API:the Percolator li-brary largely consists of Bigtable operations wrapped in Percolator-speci?c computation.The challenge,then,in implementing Percolator is providing the features that Bigtable does not:multirow transactions and the ob-server framework.

2.2Transactions

Percolator provides cross-row,cross-table transac-tions with ACID snapshot-isolation semantics.Percola-tor users write their transaction code in an imperative language(currently C++)and mix calls to the Percola-tor API with their code.Figure2shows a simpli?ed ver-sion of clustering documents by a hash of their contents. In this example,if Commit()returns false,the transac-tion has con?icted(in this case,because two URLs with the same content hash were processed simultaneously) and should be retried after a backoff.Calls to Get()and Commit()are blocking;parallelism is achieved by run-ning many transactions simultaneously in a thread pool. While it is possible to incrementally process data with-out the bene?t of strong transactions,transactions make it more tractable for the user to reason about the state of the system and to avoid the introduction of errors into a long-lived repository.For example,in a transactional web-indexing system the programmer can make assump-tions like:the hash of the contents of a document is al-ways consistent with the table that indexes duplicates. Without transactions,an ill-timed crash could result in a permanent error:an entry in the document table that cor-responds to no URL in the duplicates table.Transactions also make it easy to build index tables that are always up to date and consistent.Note that both of these exam-ples require transactions that span rows,rather than the single-row transactions that Bigtable already provides. Percolator stores multiple versions of each data item using Bigtable’s timestamp dimension.Multiple versions are required to provide snapshot isolation[5],which presents each transaction with the appearance of reading from a stable snapshot at some timestamp.Writes appear in a different,later,timestamp.Snapshot isolation pro-bool UpdateDocument(Document doc){

Transaction t(&cluster);

t.Set(doc.url(),"contents","document",doc.contents()); int hash=Hash(doc.contents());

//dups table maps hash→canonical URL

string canonical;

if(!t.Get(hash,"canonical-url","dups",&canonical)){ //No canonical yet;write myself in

t.Set(hash,"canonical-url","dups",doc.url());

}//else this document already exists,ignore new copy

return https://www.wendangku.net/doc/4a15773122.html,mit();

}

Figure2:Example usage of the Percolator API to perform ba-sic checksum clustering and eliminate documents with the same content.

[t] Figure3:T ransactions under snapshot isolation perform reads at a start timestamp(represented here by an open square)and writes at a commit timestamp(closed circle).In this example, transaction2would not see writes from transaction1since trans-action2’s start timestamp is before transaction1’s commit times-tamp.T ransaction3,however,will see writes from both1and2. T ransaction1and2are running concurrently:if they both write the same cell,at least one will abort.

tects against write-write con?icts:if transactions A and B,running concurrently,write to the same cell,at most one will commit.Snapshot isolation does not provide serializability;in particular,transactions running under snapshot isolation are subject to write skew[5].The main advantage of snapshot isolation over a serializable proto-col is more ef?cient reads.Because any timestamp rep-resents a consistent snapshot,reading a cell requires only performing a Bigtable lookup at the given timestamp;ac-quiring locks is not necessary.Figure3illustrates the re-lationship between transactions under snapshot isolation. Because it is built as a client library accessing Bigtable,rather than controlling access to storage itself, Percolator faces a different set of challenges implement-ing distributed transactions than traditional PDBMSs. Other parallel databases integrate locking into the sys-tem component that manages access to the disk:since each node already mediates access to data on the disk it can grant locks on requests and deny accesses that violate locking requirements.

By contrast,any node in Percolator can(and does)is-sue requests to directly modify state in Bigtable:there is no convenient place to intercept traf?c and assign locks. As a result,Percolator must explicitly maintain locks. Locks must persist in the face of machine failure;if a lock could disappear between the two phases of com-

key bal:data bal:lock bal:write

Bob 6:6:6:data@5 5:$105:5:

Joe 6:6:6:data@5 5:$25:5:

1.Initial state:Joe’s account contains$2dollars,Bob’s$10.

Bob 7:$37:I am primary7:

6:6:6:data@5 5:$105:5:

Joe 6:6:6:data@5 5:$25:5:

2.The transfer transaction begins by locking Bob’s account balance by writing the lock column.This lock is the primary for the transaction.The transaction also writes data at its start timestamp,7.

Bob 7:$37:I am primary7:

6:6:6:data@5 5:$105:5:

Joe 7:$97:primary@Bob.bal7:

6:6:6:data@5 5:$25:5:

3.The transaction now locks Joe’s account and writes Joe’s new balance(again,at the start timestamp).The lock is a secondary for the transaction and contains a reference to the primary lock (stored in row“Bob,”column“bal”);in case this lock is stranded due to a crash,a transaction that wishes to clean up the lock needs the location of the primary to synchronize the cleanup.

Bob 8:8:8:data@7 7:$37:7:

6:6:6:data@5 5:$105:5:

Joe 7:$97:primary@Bob.bal7:

6:6:6:data@5 5:$25:5:

4.The transaction has now reached the commit point:it erases the primary lock and replaces it with a write record at a new timestamp(called the commit timestamp):8.The write record contains a pointer to the timestamp where the data is stored. Future readers of the column“bal”in row“Bob”will now see the value$3.

Bob 8:8:8:data@7 7:$37:7:

6:6:6:data@5 5:$105:5:

Joe 8:8:8:data@7 7:$97:7:

6:6:6:data@5 5:$25:5:

5.The transaction completes by adding write records and deleting locks at the secondary cells.In this case,there is only one secondary:Joe.

Figure4:This?gure shows the Bigtable writes performed by a Percolator transaction that mutates two rows.The transaction transfers7dollars from Bob to Joe.Each Percolator column is stored as3Bigtable columns:data,write metadata,and lock metadata.Bigtable’s timestamp dimension is shown within each cell;12:“data”indicates that“data”has been written at Bigtable timestamp12.Newly written data is shown in boldface.

Column Use

c:lock An uncommitted transaction is writing this cell;contains the location of primary lock c:write Committed data present;stores the Bigtable timestamp of the data

c:data Stores the data itself

c:notify Hint:observers may need to run

c:ack O Observer“O”has run;stores start timestamp of successful last run

Figure5:The columns in the Bigtable representation of a Per-colator column named“c.”

mit,the system could mistakenly commit two transac-tions that should have con?icted.The lock service must provide high throughput;thousands of machines will be requesting locks simultaneously.The lock service should also be low-latency;each Get()operation requires read-ing locks in addition to data,and we prefer to minimize this latency.Given these requirements,the lock server will need to be replicated(to survive failure),distributed and balanced(to handle load),and write to a persistent data store.Bigtable itself satis?es all of our requirements, and so Percolator stores its locks in special in-memory columns in the same Bigtable that stores data and reads or modi?es the locks in a Bigtable row transaction when accessing data in that row.

We’ll now consider the transaction protocol in more detail.Figure6shows the pseudocode for Percolator transactions,and Figure4shows the layout of Percolator data and metadata during the execution of a transaction. These various metadata columns used by the system are described in Figure5.The transaction’s constructor asks the timestamp oracle for a start timestamp(line6),which determines the consistent snapshot seen by Get().Calls to Set()are buffered(line7)until commit time.The ba-sic approach for committing buffered writes is two-phase commit,which is coordinated by the client.Transactions on different machines interact through row transactions on Bigtable tablet servers.

In the?rst phase of commit(“prewrite”),we try to lock all the cells being written.(To handle client failure, we designate one lock arbitrarily as the primary;we’ll discuss this mechanism below.)The transaction reads metadata to check for con?icts in each cell being writ-ten.There are two kinds of con?icting metadata:if the transaction sees another write record after its start times-tamp,it aborts(line32);this is the write-write con?ict that snapshot isolation guards against.If the transaction sees another lock at any timestamp,it also aborts(line 34).It’s possible that the other transaction is just being slow to release its lock after having already committed below our start timestamp,but we consider this unlikely, so we abort.If there is no con?ict,we write the lock and

1class Transaction{

2struct Write{Row row;Column col;string value;};

3vectorwrites;

4int start ts;

5

6Transaction():start ts(oracle.GetTimestamp()){}

7void Set(Write w){writes.push back(w);}

8bool Get(Row row,Column c,string*value){

9while(true){

10bigtable::Txn T=bigtable::StartRowTransaction(row);

11//Check for locks that signal concurrent writes.

12if(T.Read(row,c+"lock",[0,start ts])){

13//There is a pending lock;try to clean it and wait

14BackoffAndMaybeCleanupLock(row,c);

15continue;

16}

17

18//Find the latest write below our start timestamp.

19latest write=T.Read(row,c+"write",[0,start ts]);

20if(!latest write.found())return false;//no data

21int data ts=latest write.start timestamp();

22*value=T.Read(row,c+"data",[data ts,data ts]);

23return true;

24}

25}

26//Prewrite tries to lock cell w,returning false in case of con?ict. 27bool Prewrite(Write w,Write primary){

28Column c=w.col;

29bigtable::Txn T=bigtable::StartRowTransaction(w.row);

30

31//Abort on writes after our start timestamp...

32if(T.Read(w.row,c+"write",[start ts,∞]))return false; 33//...or locks at any timestamp.

34if(T.Read(w.row,c+"lock",[0,∞]))return false;

35

36T.Write(w.row,c+"data",start ts,w.value);

37T.Write(w.row,c+"lock",start ts,

38{primary.row,primary.col});//The primary’s location. 39return https://www.wendangku.net/doc/4a15773122.html,mit();

40}

41bool Commit(){

42Write primary=writes[0];

43vectorsecondaries(writes.begin()+1,writes.end());

44if(!Prewrite(primary,primary))return false;

45for(Write w:secondaries)

46if(!Prewrite(w,primary))return false;

47

48int commit ts=oracle.GetTimestamp();

49

50//Commit primary?rst.

51Write p=primary;

52bigtable::Txn T=bigtable::StartRowTransaction(p.row);

53if(!T.Read(p.row,p.col+"lock",[start ts,start ts]))

54return false;//aborted while working

55T.Write(p.row,p.col+"write",commit ts,

56start ts);//Pointer to data written at start ts.

57T.Erase(p.row,p.col+"lock",commit ts);

58if(!https://www.wendangku.net/doc/4a15773122.html,mit())return false;//commit point

59

60//Second phase:write out write records for secondary cells. 61for(Write w:secondaries){

62bigtable::Write(w.row,w.col+"write",commit ts,start ts); 63bigtable::Erase(w.row,w.col+"lock",commit ts);

64}

65return true;

66}

67}//class Transaction

Figure6:Pseudocode for Percolator transaction protocol.the data to each cell at the start timestamp(lines36-38). If no cells con?ict,the transaction may commit and proceeds to the second phase.At the beginning of the second phase,the client obtains the commit timestamp from the timestamp oracle(line48).Then,at each cell (starting with the primary),the client releases its lock and make its write visible to readers by replacing the lock with a write record.The write record indicates to read-ers that committed data exists in this cell;it contains a pointer to the start timestamp where readers can?nd the actual data.Once the primary’s write is visible(line58), the transaction must commit since it has made a write visible to readers.

A Get()operation?rst checks for a lock in the times-tamp range[0,start timestamp],which is the range of timestamps visible in the transaction’s snapshot(line12). If a lock is present,another transaction is concurrently writing this cell,so the reading transaction must wait un-til the lock is released.If no con?icting lock is found, Get()reads the latest write record in that timestamp range (line19)and returns the data item corresponding to that write record(line22).

Transaction processing is complicated by the possibil-ity of client failure(tablet server failure does not affect the system since Bigtable guarantees that written locks persist across tablet server failures).If a client fails while a transaction is being committed,locks will be left be-hind.Percolator must clean up those locks or they will cause future transactions to hang inde?nitely.Percolator takes a lazy approach to cleanup:when a transaction A encounters a con?icting lock left behind by transaction B,A may determine that B has failed and erase its locks. It is very dif?cult for A to be perfectly con?dent in its judgment that B is failed;as a result we must avoid a race between A cleaning up B’s transaction and a not-actually-failed B committing the same transaction.Per-colator handles this by designating one cell in every transaction as a synchronizing point for any commit or cleanup operations.This cell’s lock is called the primary lock.Both A and B agree on which lock is primary(the location of the primary is written into the locks at all other cells).Performing either a cleanup or commit op-eration requires modifying the primary lock;since this modi?cation is performed under a Bigtable row transac-tion,only one of the cleanup or commit operations will succeed.Speci?cally:before B commits,it must check that it still holds the primary lock and replace it with a write record.Before A erases B’s lock,A must check the primary to ensure that B has not committed;if the primary lock is still present,then it can safely erase the lock.

When a client crashes during the second phase of commit,a transaction will be past the commit point (it has written at least one write record)but will still

have locks outstanding.We must perform roll-forward on these transactions.A transaction that encounters a lock can distinguish between the two cases by inspecting the primary lock:if the primary lock has been replaced by a write record,the transaction which wrote the lock must have committed and the lock must be rolled forward,oth-erwise it should be rolled back(since we always commit the primary?rst,we can be sure that it is safe to roll back if the primary is not committed).To roll forward,the transaction performing the cleanup replaces the stranded lock with a write record as the original transaction would have done.

Since cleanup is synchronized on the primary lock,it is safe to clean up locks held by live clients;however, this incurs a performance penalty since rollback forces the transaction to abort.So,a transaction will not clean up a lock unless it suspects that a lock belongs to a dead or stuck worker.Percolator uses simple mechanisms to determine the liveness of another transaction.Running workers write a token into the Chubby lockservice[8] to indicate they belong to the system;other workers can use the existence of this token as a sign that the worker is alive(the token is automatically deleted when the process exits).To handle a worker that is live,but not working, we additionally write the wall time into the lock;a lock that contains a too-old wall time will be cleaned up even if the worker’s liveness token is valid.To handle long-running commit operations,workers periodically update this wall time while committing.

2.3Timestamps

The timestamp oracle is a server that hands out times-tamps in strictly increasing order.Since every transaction requires contacting the timestamp oracle twice,this ser-vice must scale well.The oracle periodically allocates a range of timestamps by writing the highest allocated timestamp to stable storage;given an allocated range of timestamps,the oracle can satisfy future requests strictly from memory.If the oracle restarts,the timestamps will jump forward to the maximum allocated timestamp(but will never go backwards).To save RPC overhead(at the cost of increasing transaction latency)each Percolator worker batches timestamp requests across transactions by maintaining only one pending RPC to the oracle.As the oracle becomes more loaded,the batching naturally increases to compensate.Batching increases the scalabil-ity of the oracle but does not affect the timestamp guar-antees.Our oracle serves around2million timestamps per second from a single machine.

The transaction protocol uses strictly increasing times-tamps to guarantee that Get()returns all committed writes before the transaction’s start timestamp.To see how it provides this guarantee,consider a transaction R reading at timestamp T R and a transaction W that com-mitted at timestamp T W

2.4Noti?cations

Transactions let the user mutate the table while main-taining invariants,but users also need a way to trigger and run the transactions.In Percolator,the user writes code(“observers”)to be triggered by changes to the ta-ble,and we link all the observers into a binary running alongside every tablet server in the system.Each ob-server registers a function and a set of columns with Per-colator,and Percolator invokes the function after data is written to one of those columns in any row. Percolator applications are structured as a series of ob-servers;each observer completes a task and creates more work for“downstream”observers by writing to the table. In our indexing system,a MapReduce loads crawled doc-uments into Percolator by running loader transactions, which trigger the document processor transaction to in-dex the document(parse,extract links,etc.).The docu-ment processor transaction triggers further transactions like clustering.The clustering transaction,in turn,trig-gers transactions to export changed document clusters to the serving system.

Noti?cations are similar to database triggers or events in active databases[29],but unlike database triggers, they cannot be used to maintain database invariants.In particular,the triggered observer runs in a separate trans-action from the triggering write,so the triggering write and the triggered observer’s writes are not atomic.No-ti?cations are intended to help structure an incremental computation rather than to help maintain data integrity. This difference in semantics and intent makes observer behavior much easier to understand than the complex se-mantics of overlapping triggers.Percolator applications consist of very few observers—the Google indexing system has roughly10observers.Each observer is ex-plicitly constructed in the main()of the worker binary, so it is clear what observers are active.It is possible for several observers to observe the same column,but we avoid this feature so it is clear what observer will run when a particular column is https://www.wendangku.net/doc/4a15773122.html,ers do need to be wary about in?nite cycles of noti?cations,but Percolator does nothing to prevent this;the user typically constructs

a series of observers to avoid in?nite cycles.

We do provide one guarantee:at most one observer’s transaction will commit for each change of an observed column.The converse is not true,however:multiple writes to an observed column may cause the correspond-ing observer to be invoked only once.We call this feature message collapsing,since it helps avoid computation by amortizing the cost of responding to many noti?cations. For example,it is suf?cient for https://www.wendangku.net/doc/4a15773122.html, to be reprocessed periodically rather than every time we discover a new link pointing to it.

To provide these semantics for noti?cations,each ob-served column has an accompanying“acknowledgment”column for each observer,containing the latest start timestamp at which the observer ran.When the observed column is written,Percolator starts a transaction to pro-cess the noti?cation.The transaction reads the observed column and its corresponding acknowledgment column. If the observed column was written after its last acknowl-edgment,then we run the observer and set the acknowl-edgment column to our start timestamp.Otherwise,the observer has already been run,so we do not run it again. Note that if Percolator accidentally starts two transac-tions concurrently for a particular noti?cation,they will both see the dirty noti?cation and run the observer,but one will abort because they will con?ict on the acknowl-edgment column.We promise that at most one observer will commit for each noti?cation.

To implement noti?cations,Percolator needs to ef?-ciently?nd dirty cells with observers that need to be run. This search is complicated by the fact that noti?cations are rare:our table has trillions of cells,but,if the system is keeping up with applied load,there will only be mil-lions of noti?cations.Additionally,observer code is run on a large number of client processes distributed across a collection of machines,meaning that this search for dirty cells must be distributed.

To identify dirty cells,Percolator maintains a special “notify”Bigtable column,containing an entry for each dirty cell.When a transaction writes an observed cell, it also sets the corresponding notify cell.The workers perform a distributed scan over the notify column to?nd dirty cells.After the observer is triggered and the transac-tion commits,we remove the notify cell.Since the notify column is just a Bigtable column,not a Percolator col-umn,it has no transactional properties and serves only as a hint to the scanner to check the acknowledgment col-umn to determine if the observer should be run.

To make this scan ef?cient,Percolator stores the notify column in a separate Bigtable locality group so that scan-ning over the column requires reading only the millions of dirty cells rather than the trillions of total data cells. Each Percolator worker dedicates several threads to the scan.For each thread,the worker chooses a portion of the table to scan by?rst picking a random Bigtable tablet, then picking a random key in the tablet,and?nally scan-ning the table from that position.Since each worker is scanning a random region of the table,we worry about two workers running observers on the same row con-currently.While this behavior will not cause correctness problems due to the transactional nature of noti?cations, it is inef?cient.To avoid this,each worker acquires a lock from a lightweight lock service before scanning the row. This lock server need not persist state since it is advisory and thus is very scalable.

The random-scanning approach requires one addi-tional tweak:when it was?rst deployed we noticed that scanning threads would tend to clump together in a few regions of the table,effectively reducing the parallelism of the scan.This phenomenon is commonly seen in pub-lic transportation systems where it is known as“platoon-ing”or“bus clumping”and occurs when a bus is slowed down(perhaps by traf?c or slow loading).Since the num-ber of passengers at each stop grows with time,loading delays become even worse,further slowing the bus.Si-multaneously,any bus behind the slow bus speeds up as it needs to load fewer passengers at each stop.The result is a clump of buses arriving simultaneously at a stop[19].Our scanning threads behaved analogously:a thread that was running observers slowed down while threads“behind”it quickly skipped past the now-clean rows to clump with the lead thread and failed to pass the lead thread because the clump of threads overloaded tablet servers.To solve this problem,we modi?ed our system in a way that public transportation systems can-not:when a scanning thread discovers that it is scanning the same row as another thread,it chooses a new random location in the table to scan.To further the transporta-tion analogy,the buses(scanner threads)in our city avoid clumping by teleporting themselves to a random stop(lo-cation in the table)if they get too close to the bus in front of them.

Finally,experience with noti?cations led us to intro-duce a lighter-weight but semantically weaker noti?ca-tion mechanism.We found that when many duplicates of the same page were processed concurrently,each trans-action would con?ict trying to trigger reprocessing of the same duplicate cluster.This led us to devise a way to no-tify a cell without the possibility of transactional con?ict. We implement this weak noti?cation by writing only to the Bigtable“notify”column.To preserve the transac-tional semantics of the rest of Percolator,we restrict these weak noti?cations to a special type of column that can-not be written,only noti?ed.The weaker semantics also mean that multiple observers may run and commit as a result of a single weak noti?cation(though the system tries to minimize this occurrence).This has become an important feature for managing con?icts;if an observer

frequently con?icts on a hotspot,it often helps to break it into two observers connected by a non-transactional noti?cation on the hotspot.

2.5Discussion

One of the inef?ciencies of Percolator relative to a MapReduce-based system is the number of RPCs sent per work-unit.While MapReduce does a single large read to GFS and obtains all of the data for10s or100s of web pages,Percolator performs around50individual Bigtable operations to process a single document.

One source of additional RPCs occurs during commit. When writing a lock,we must do a read-modify-write operation requiring two Bigtable RPCs:one to read for con?icting locks or writes and another to write the new lock.To reduce this overhead,we modi?ed the Bigtable API by adding conditional mutations which implements the read-modify-write step in a single RPC.Many con-ditional mutations destined for the same tablet server can also be batched together into a single RPC to fur-ther reduce the total number of RPCs we send.We create batches by delaying lock operations for several seconds to collect them into batches.Because locks are acquired in parallel,this adds only a few seconds to the latency of each transaction;we compensate for the additional la-tency with greater parallelism.Batching also increases the time window in which con?icts may occur,but in our low-contention environment this has not proved to be a problem.

We also perform the same batching when reading from the table:every read operation is delayed to give it a chance to form a batch with other reads to the same tablet server.This delays each read,potentially greatly increasing transaction latency.A?nal optimization miti-gates this effect,however:prefetching.Prefetching takes advantage of the fact that reading two or more values in the same row is essentially the same cost as reading one value.In either case,Bigtable must read the entire SSTable block from the?le system and decompress it. Percolator attempts to predict,each time a column is read,what other columns in a row will be read later in the transaction.This prediction is made based on past be-havior.Prefetching,combined with a cache of items that have already been read,reduces the number of Bigtable reads the system would otherwise do by a factor of10. Early in the implementation of Percolator,we decided to make all API calls blocking and rely on running thou-sands of threads per machine to provide enough par-allelism to maintain good CPU utilization.We chose this thread-per-request model mainly to make application code easier to write,compared to the event-driven model. Forcing users to bundle up their state each of the(many) times they fetched a data item from the table would have made application development much more dif?cult.Our experience with thread-per-request was,on the whole, positive:application code is simple,we achieve good uti-lization on many-core machines,and crash debugging is simpli?ed by meaningful and complete stack traces.We encountered fewer race conditions in application code than we feared.The biggest drawbacks of the approach were scalability issues in the Linux kernel and Google infrastructure related to high thread counts.Our in-house kernel development team was able to deploy?xes to ad-dress the kernel issues.

3Evaluation

Percolator lies somewhere in the performance space between MapReduce and DBMSs.For example,because Percolator is a distributed system,it uses far more re-sources to process a?xed amount of data than a tradi-tional DBMS would;this is the cost of its scalability. Compared to MapReduce,Percolator can process data with far lower latency,but again,at the cost of additional resources required to support random lookups.These are engineering tradeoffs which are dif?cult to quantify:how much of an ef?ciency loss is too much to pay for the abil-ity to add capacity endlessly simply by purchasing more machines?Or:how does one trade off the reduction in development time provided by a layered system against the corresponding decrease in ef?ciency?

In this section we attempt to answer some of these questions by?rst comparing Percolator to batch pro-cessing systems via our experiences with converting a MapReduce-based indexing pipeline to use Percola-tor.We’ll also evaluate Percolator with microbench-marks and a synthetic workload based on the well-known TPC-E benchmark[1];this test will give us a chance to evaluate the scalability and ef?ciency of Percolator rela-tive to Bigtable and DBMSs.

All of the experiments in this section are run on a sub-set of the servers in a Google data center.The servers run the Linux operating system on x86processors;each ma-chine is connected to several commodity SATA drives.

3.1Converting from MapReduce

We built Percolator to create Google’s large“base”index,a task previously performed by MapReduce.In our previous system,each day we crawled several billion documents and fed them along with a repository of ex-isting documents through a series of100MapReduces. The result was an index which answered user queries. Though not all100MapReduces were on the critical path for every document,the organization of the system as a series of MapReduces meant that each document spent 2-3days being indexed before it could be returned as a search result.

The Percolator-based indexing system(known as Caf-feine[25]),crawls the same number of documents,

but we feed each document through Percolator as it is crawled.The immediate advantage,and main design goal,of Caffeine is a reduction in latency:the median document moves through Caffeine over100x faster than the previous system.This latency improvement grows as the system becomes more complex:adding a new clus-tering phase to the Percolator-based system requires an extra lookup for each document rather an extra scan over the repository.Additional clustering phases can also be implemented in the same transaction rather than in an-other MapReduce;this simpli?cation is one reason the number of observers in Caffeine(10)is far smaller than the number of MapReduces in the previous system(100). This organization also allows for the possibility of per-forming additional processing on only a subset of the repository without rescanning the entire repository. Adding additional clustering phases isn’t free in an in-cremental system:more resources are required to make sure the system keeps up with the input,but this is still an improvement over batch processing systems where no amount of resources can overcome delays introduced by stragglers in an additional pass over the repository.Caf-feine is essentially immune to stragglers that were a seri-ous problem in our batch-based indexing system because the bulk of the processing does not get held up by a few very slow operations.The radically-lower latency of the new system also enables us to remove the rigid distinc-tions between large,slow-to-update indexes and smaller, more rapidly updated indexes.Because Percolator frees us from needing to process the repository each time we index documents,we can also make it larger:Caffeine’s document collection is currently3x larger than the previ-ous system’s and is limited only by available disk space. Compared to the system it replaced,Caffeine uses roughly twice as many resources to process the same crawl rate.However,Caffeine makes good use of the ex-tra resources.If we were to run the old indexing system with twice as many resources,we could either increase the index size or reduce latency by at most a factor of two (but not do both).On the other hand,if Caffeine were run with half the resources,it would not be able to process as many documents per day as the old system(but the doc-uments it did produce would have much lower latency). The new system is also easier to operate.Caffeine has far fewer moving parts:we run tablet servers,Percola-tor workers,and chunkservers.In the old system,each of a hundred different MapReduces needed to be individ-ually con?gured and could independently fail.Also,the “peaky”nature of the MapReduce workload made it hard to fully utilize the resources of a datacenter compared to Percolator’s much smoother resource usage.

The simplicity of writing straight-line code and the ability to do random lookups into the repository makes developing new features for Percolator easy.Under

Crawl rate (Percentage of repository updated per hour)

C

l

u

s

t

e

r

i

n

g

l

a

t

e

n

c

y

(

s

)

Figure7:Median document clustering delay for Percolator (dashed line)and MapReduce(solid line).For MapReduce,all documents?nish processing at the same time and error bars represent the min,median,and max of three runs of the clus-tering MapReduce.For Percolator,we are able to measure the delay of individual documents,so the error bars represent the 5th-and95th-percentile delay on a per-document level. MapReduce,random lookups are awkward and costly. On the other hand,Caffeine developers need to reason about concurrency where it did not exist in the MapRe-duce paradigm.Transactions help deal with this concur-rency,but can’t fully eliminate the added complexity. To quantify the bene?ts of moving from MapRe-duce to Percolator,we created a synthetic benchmark that clusters newly crawled documents against a billion-document repository to remove duplicates in much the same way Google’s indexing pipeline operates.Docu-ments are clustered by three clustering keys.In a real sys-tem,the clustering keys would be properties of the docu-ment like redirect target or content hash,but in this exper-iment we selected them uniformly at random from a col-lection of750M possible keys.The average cluster in our synthetic repository contains3.3documents,and93%of the documents are in a non-singleton cluster.This dis-tribution of keys exercises the clustering logic,but does not expose it to the few extremely large clusters we have seen in practice.These clusters only affect the latency tail and not the results we present here.In the Percola-tor clustering implementation,each crawled document is immediately written to the repository to be clustered by an observer.The observer maintains an index table for each clustering key and compares the document against each index to determine if it is a duplicate(an elabora-tion of Figure2).MapReduce implements clustering of continually arriving documents by repeatedly running a sequence of three clustering MapReduces(one for each clustering key).The sequence of three MapReduces pro-cesses the entire repository and any crawled documents that accumulated while the previous three were running. This experiment simulates clustering documents crawled at a uniform rate.Whether MapReduce or Perco-lator performs better under this metric is a function of the how frequently documents are crawled(the crawl rate)

and the repository size.We explore this space by?xing the size of the repository and varying the rate at which new documents arrive,expressed as a percentage of the repository crawled per hour.In a practical system,a very small percentage of the repository would be crawled per hour:there are over1trillion web pages on the web(and ideally in an indexing system’s repository),far too many to crawl a reasonable fraction of in a single day.When the new input is a small fraction of the repository(low crawl rate),we expect Percolator to outperform MapRe-duce since MapReduce must map over the(large)repos-itory to cluster the(small)batch of new documents while Percolator does work proportional only to the small batch of newly arrived documents(a lookup in up to three in-dex tables per document).At very large crawl rates where the number of newly crawled documents approaches the size of the repository,MapReduce will perform better than Percolator.This cross-over occurs because stream-ing data from disk is much cheaper,per byte,than per-forming random lookups.At the cross-over the total cost of the lookups required to cluster the new documents un-der Percolator equals the cost to stream the documents and the repository through MapReduce.At crawl rates higher than that,one is better off using MapReduce. We ran this benchmark on240machines and measured the median delay between when a document is crawled and when it is clustered.Figure7plots the median la-tency of document processing for both implementations as a function of crawl rate.When the crawl rate is low, Percolator clusters documents faster than MapReduce as expected;this scenario is illustrated by the leftmost pair of points which correspond to crawling1percent of doc-uments per hour.MapReduce requires approximately20 minutes to cluster the documents because it takes20 minutes just to process the repository through the three MapReduces(the effect of the few newly crawled doc-uments on the runtime is negligible).This results in an average delay between crawling a document and cluster-ing of around30minutes:a random document waits10 minutes after being crawled for the previous sequence of MapReduces to?nish and then spends20minutes be-ing processed by the three MapReduces.Percolator,on the other hand,?nds a newly loaded document and pro-cesses it in two seconds on average,or about1000x faster than MapReduce.The two seconds includes the time to ?nd the dirty noti?cation and run the transaction that per-forms the clustering.Note that this1000x latency im-provement could be made arbitrarily large by increasing the size of the repository.

As the crawl rate increases,MapReduce’s processing time grows correspondingly.Ideally,it would be propor-tional to the combined size of the repository and the input which grows with the crawl rate.In practice,the running time of a small MapReduce like this is limited by strag-

Bigtable Percolator Relative Read/s155********.94

Write/s3100372320.23

Figure8:The overhead of Percolator operations relative to Bigtable.Write overhead is due to additional operations Percola-tor needs to check for con?icts.

glers,so the growth in processing time(and thus cluster-ing latency)is only weakly correlated to crawl rate at low crawl rates.The6percent crawl rate,for example,only adds150GB to a1TB data set;the extra time to process 150GB is in the noise.The latency of Percolator is rela-tively unchanged as the crawl rate grows until it suddenly increases to effectively in?nity at a crawl rate of40% per hour.At this point,Percolator saturates the resources of the test cluster,is no longer able to keep up with the crawl rate,and begins building an unbounded queue of unprocessed documents.The dotted asymptote at40% is an extrapolation of Percolator’s performance beyond this breaking point.MapReduce is subject to the same effect:eventually crawled documents accumulate faster than MapReduce is able to cluster them,and the batch size will grow without bound in subsequent runs.In this particular con?guration,however,MapReduce can sus-tain crawl rates in excess of100%(the dotted line,again, extrapolates performance).

These results show that Percolator can process docu-ments at orders of magnitude better latency than MapRe-duce in the regime where we expect real systems to op-erate(single-digit crawl rates).

3.2Microbenchmarks

In this section,we determine the cost of the trans-actional semantics provided by Percolator.In these ex-periments,we compare Percolator to a“raw”Bigtable. We are only interested in the relative performance of Bigtable and Percolator since any improvement in Bigtable performance will translate directly into an im-provement in Percolator performance.Figure8shows the performance of Percolator and raw Bigtable running against a single tablet server.All data was in the tablet server’s cache during the experiments and Percolator’s batching optimizations were disabled.

As expected,Percolator introduces overhead relative to Bigtable.We?rst measure the number of random writes that the two systems can perform.In the case of Percolator,we execute transactions that write a single cell and then commit;this represents the worst case for Percolator overhead.When doing a write,Percolator in-curs roughly a factor of four overhead on this benchmark. This is the result of the extra operations Percolator re-quires for commit beyond the single write that Bigtable issues:a read to check for locks,a write to add the lock, and a second write to remove the lock record.The read, in particular,is more expensive than a write and accounts

for most of the overhead.In this test,the limiting fac-tor was the performance of the tablet server,so the addi-tional overhead of fetching timestamps is not measured. We also tested random reads:Percolator performs a sin-gle Bigtable operation per read,but that read operation is somewhat more complex than the raw Bigtable oper-ation(the Percolator read looks at metadata columns in addition to data columns).

3.3Synthetic Workload

To evaluate Percolator on a more realistic work-load,we implemented a synthetic benchmark based on TPC-E[1].This isn’t the ideal benchmark for Percola-tor since TPC-E is designed for OLTP systems,and a number of Percolator’s tradeoffs impact desirable prop-erties of OLTP systems(the latency of con?icting trans-actions,for example).TPC-E is a widely recognized and understood benchmark,however,and it allows us to un-derstand the cost of our system against more traditional databases.

TPC-E simulates a brokerage?rm with customers who perform trades,market search,and account inquiries. The brokerage submits trade orders to a market ex-change,which executes the trade and updates broker and customer state.The benchmark measures the number of trades executed.On average,each customer performs a trade once every500seconds,so the benchmark scales by adding customers and associated data.

TPC-E traditionally has three components–a cus-tomer emulator,a market emulator,and a DBMS run-ning stored SQL procedures.Since Percolator is a client library running against Bigtable,our implementation is a combined customer/market emulator that calls into the Percolator library to perform operations against Bigtable.Percolator provides a low-level Get/Set/iterator API rather than a high-level SQL interface,so we created indexes and did all the‘query planning’by hand. Since Percolator is an incremental processing system rather than an OLTP system,we don’t attempt to meet the TPC-E latency targets.Our average transaction latency is 2to5seconds,but outliers can take several minutes.Out-liers are caused by,for example,exponential backoff on con?icts and Bigtable tablet unavailability.Finally,we made a small modi?cation to the TPC-E transactions.In TPC-E,each trade result increases the broker’s commis-sion and increments his trade count.Each broker services a hundred customers,so the average broker must be up-dated once every5seconds,which causes repeated write con?icts in Percolator.In Percolator,we would imple-ment this feature by writing the increment to a side table and periodically aggregating each broker’s increments; for the benchmark,we choose to simply omit this write. Figure9shows how the resource usage of Percolator scales as demand increases.We will measure resource

cores

T

P

S

Figure9:T ransaction rate on a TPC-E-like benchmark as a func-tion of cores used.The dotted line shows linear scaling. usage in CPU cores since that is the limiting resource in our experimental environment.We were able to pro-cure a small number of machines for testing,but our test Bigtable cell shares the disk resources of a much larger production cluster.As a result,disk bandwidth is not a factor in the system’s performance.In this ex-periment,we con?gured the benchmark with increasing numbers of customers and measured both the achieved performance and the number of cores used by all parts of the system including cores used for background main-tenance such as Bigtable compactions.The relationship between performance and resource usage is essentially linear across several orders of magnitude,from11cores to15,000cores.

This experiment also provides an opportunity to mea-sure the overheads in Percolator relative to a DBMS. The fastest commercial TPC-E system today performs 3,183tpsE using a single large shared-memory machine with64Intel Nehalem cores with2hyperthreads per core[33].Our synthetic benchmark based on TPC-E per-forms11,200tps using15,000cores.This comparison is very rough:the Nehalem cores in the comparison ma-chine are signi?cantly faster than the cores in our test cell (small-scale testing on Nehalem processors shows that they are20-30%faster per-thread compared to the cores in the test cluster).However,we estimate that Percola-tor uses roughly30times more CPU per transaction than the benchmark system.On a cost-per-transaction basis, the gap is likely much less than30since our test clus-ter uses cheaper,commodity hardware compared to the enterprise-class hardware in the reference machine. The conventional wisdom on implementing databases is to“get close to the iron”and use hardware as directly as possible since even operating system structures like disk caches and schedulers make it hard to implement an ef?cient database[32].In Percolator we not only in-terposed an operating system between our database and the hardware,but also several layers of software and net-work links.The conventional wisdom is correct:this ar-rangement has a cost.There are substantial overheads in

0.0

10.020.030.040.050.060.016:20

16:40

17:00

17:20

17:40

18:00

T r a n s a c t i o n s p e r S e c o n d

TradeResult (tps)

Figure 10:Recovery of tps after 33%tablet server mortality

preparing requests to go on the wire,sending them,and processing them on a remote machine.To illustrate these overheads in Percolator,consider the act of mutating the database.In a DBMS,this incurs a function call to store the data in memory and a system call to force the log to hardware controlled RAID array.In Percolator,a client performing a transaction commit sends multiple RPCs to Bigtable,which commits the mutation by logging it to 3chunkservers,which make system calls to actually ?ush the data to https://www.wendangku.net/doc/4a15773122.html,ter,that same data will be com-pacted into minor and major sstables,each of which will be again replicated to multiple chunkservers.

The CPU in?ation factor is the cost of our layering.In exchange,we get scalability (our fastest result,though not directly comparable to TPC-E,is more than 3x the current of?cial record [33]),and we inherit the useful features of the systems we build upon,like resilience to failures.To demonstrate the latter,we ran the benchmark with 15tablet servers and allowed the performance to stabilize.Figure 10shows the performance of the system over time.The dip in performance at 17:09corresponds to a failure event:we killed a third of the tablet servers.Performance drops immediately after the failure event but recovers as the tablets are reloaded by other tablet servers.We allowed the killed tablet servers to restart so performance eventually returns to the original level.4

Related Work

Batch processing systems like MapReduce [13,22,24]are well suited for ef?ciently transforming or analyz-ing an entire corpus:these systems can simultaneously use a large number of machines to process huge amounts of data quickly.Despite this scalability,re-running a MapReduce pipeline on each small batch of updates re-sults in unacceptable latency and wasted work.Over-lapping or pipelining the adjacent stages can reduce la-tency [10],but straggler shards still set the minimum time to complete the pipeline.Percolator avoids the ex-pense of repeated scans by,essentially,creating indexes

on the keys used to cluster documents;one of criticisms leveled by Stonebraker and DeWitt in their initial critique of MapReduce [16]was that MapReduce did not support such indexes.

Several proposed modi?cations to MapReduce [18,26,35]reduce the cost of processing changes to a reposi-tory by allowing workers to randomly read a base repos-itory while mapping over only newly arrived work.To implement clustering in these systems,we would likely maintain a repository per clustering phase.Avoiding the need to re-map the entire repository would allow us to make batches smaller,reducing latency.DryadInc [31]attacks the same problem by reusing identical portions of the computation from previous runs and allowing the user to specify a merge function that combines new in-put with previous iterations’outputs.These systems rep-resent a middle-ground between mapping over the en-tire repository using MapReduce and processing a single document at a time with Percolator.

Databases satisfy many of the requirements of an incremental system:a RDBMS can make many inde-pendent and concurrent changes to a large corpus and provides a ?exible language for expressing computa-tion (SQL).In fact,Percolator presents the user with a database-like interface:it supports transactions,itera-tors,and secondary indexes.While Percolator provides distributed transactions,it is by no means a full-?edged DBMS:it lacks a query language,for example,as well as full relational operations such as join.Percolator is also designed to operate at much larger scales than exist-ing parallel databases and to deal better with failed ma-chines.Unlike Percolator,database systems tend to em-phasize latency over throughput since a human is often waiting for the results of a database query.

The organization of data in Percolator mirrors that of shared-nothing parallel databases [7,15,4].Data is distributed across a number of commodity machines in shared-nothing fashion:the machines communicate only via explicit RPCs;no shared memory or shared disks are used.Data stored by Percolator is partitioned by Bigtable into tablets of contiguous rows which are distributed among machines;this mirrors the declustering performed by parallel databases.

The transaction management of Percolator builds on a long line of work on distributed transactions for database systems.Percolator implements snapshot isolation [5]by extending multi-version timestamp ordering [6]across a distributed system using two-phase commit.

An analogy can be drawn between the role of ob-servers in Percolator to incrementally move the system towards a “clean”state and the incremental maintenance of materialized views in traditional databases (see Gupta and Mumick [21]for a survey of the ?eld).In practice,while some indexing tasks like clustering documents by

contents could be expressed in a form appropriate for in-cremental view maintenance it would likely be hard to express the transformation of a raw document into an in-dexed document in such a form.

The utility of parallel databases and,by extension, a system like Percolator,has been questioned several times[17]over their history.Hardware trends have,in the past,worked against parallel databases.CPUs have become so much faster than disks that a few CPUs in a shared-memory machine can drive enough disk heads to service required loads without the complexity of dis-tributed transactions:the top TPC-E benchmark results today are achieved on large shared-memory machines connected to a SAN.This trend is beginning to reverse itself,however,as the enormous datasets like those Per-colator is intended to process become far too large for a single shared-memory machine to handle.These datasets require a distributed solution that can scale to1000s of machines,while existing parallel databases can utilize only100s of machines[30].Percolator provides a sys-tem that is scalable enough for Internet-sized datasets by sacri?cing some(but not all)of the?exibility and low-latency of parallel databases.

Distributed storage systems like Bigtable have the scalability and fault-tolerance properties of MapReduce but provide a more natural abstraction for storing a https://www.wendangku.net/doc/4a15773122.html,ing a distributed storage system allows for low-latency updates since the system can change state by mu-tating the repository rather than rewriting it.However, Percolator is a data transformation system,not only a data storage system:it provides a way to structure com-putation to transform that data.In contrast,systems like Dynamo[14],Bigtable,and PNUTS[11]provide highly available data storage without the attendant mechanisms of transformation.These systems can also be grouped with the NoSQL databases(MongoDB[27],to name one of many):both offer higher performance and scale better than traditional databases,but provide weaker semantics. Percolator extends Bigtable with multi-row,dis-tributed transactions,and it provides the observer inter-face to allow applications to be structured around noti?-cations of changed data.We considered building the new indexing system directly on Bigtable,but the complexity of reasoning about concurrent state modi?cation without the aid of strong consistency was daunting.Percolator does not inherit all of Bigtable’s features:it has limited support for replication of tables across data centers,for example.Since Bigtable’s cross data center replication strategy is consistent only on a per-tablet basis,replica-tion is likely to break invariants between writes in a dis-tributed transaction.Unlike Dynamo and PNUTS which serve responses to users,Percolator is willing to accept the lower availability of a single data center in return for stricter consistency.

Several research systems have,like Percolator,ex-tended distributed storage systems to include strong con-sistency.Sinfonia[3]provides a transactional interface to a distributed repository.Earlier published versions of Sinfonia[2]also offered a noti?cation mechanism simi-lar to the Percolator’s observer model.Sinfonia and Per-colator differ in their intended use:Sinfonia is designed to build distributed infrastructure while Percolator is in-tended to be used directly by applications(this probably explains why Sinfonia’s authors dropped its noti?cation mechanism).Additionally,Sinfonia’s mini-transactions have limited semantics compared to the transactions pro-vided by RDBMSs or Percolator:the user must specify a list of items to compare,read,and write prior to issu-ing the transaction.The mini-transactions are suf?cient to create a wide variety of infrastructure but could be limiting for application builders.

CloudTPS[34],like Percolator,builds an ACID-compliant datastore on top of a distributed storage sys-tem(HBase[23]or Bigtable).Percolator and CloudTPS systems differ in design,however:the transaction man-agement layer of CloudTPS is handled by an interme-diate layer of servers called local transaction managers that cache mutations before they are persisted to the un-derlying distributed storage system.By contrast,Perco-lator uses clients,directly communicating with Bigtable, to coordinate transaction management.The focus of the systems is also different:CloudTPS is intended to be a backend for a website and,as such,has a stronger focus on latency and partition tolerance than Percolator. ElasTraS[12],a transactional data store,is architec-turally similar to Percolator;the Owning Transaction Managers in ElasTraS are essentially tablet servers.Un-like Percolator,ElasTraS offers limited transactional se-mantics(Sinfonia-like mini-transactions)when dynami-cally partitioning the dataset and has no support for struc-turing computation.

5Conclusion and Future Work

We have built and deployed Percolator and it has been used to produce Google’s websearch index since April, 2010.The system achieved the goals we set for reducing the latency of indexing a single document with an accept-able increase in resource usage compared to the previous indexing system.

The TPC-E results suggest a promising direction for future investigation.We chose an architecture that scales linearly over many orders of magnitude on commodity machines,but we’ve seen that this costs a signi?cant30-fold overhead compared to traditional database architec-tures.We are very interested in exploring this tradeoff and characterizing the nature of this overhead:how much is fundamental to distributed storage systems,and how much can be optimized away?

Acknowledgments

Percolator could not have been built without the assis-tance of many individuals and teams.We are especially grateful to the members of the indexing team,our pri-mary users,and the developers of the many pieces of in-frastructure who never failed to improve their services to meet our increasingly large demands.

References

[1]TPC benchmark E standard speci?cation version1.9.0.Tech.

rep.,Transaction Processing Performance Council,September 2009.

[2]A GUILERA,M.K.,K ARAMANOLIS, C.,M ERCHANT, A.,

S HAH,M.,AND V EITCH,A.Building distributed applications using Sinfonia.Tech.rep.,Hewlett-Packard Labs,2006.

[3]A GUILERA,M.K.,M ERCHANT,A.,S HAH,M.,V EITCH,A.,

AND K ARAMANOLIS,C.Sinfonia:a new paradigm for building scalable distributed systems.In SOSP’07(2007),ACM,pp.159–174.

[4]B ARU,C.,F ECTEAU,G.,G OYAL,A.,H SIAO,H.-I.,J HIN-

GRAN,A.,P ADMANABHAN,S.,W ILSON,W.,AND I H SIAO,

A.G.H.DB2parallel edition,1995.

[5]B ERENSON,H.,B ERNSTEIN,P.,G RAY,J.,M ELTON,J.,

O’N EIL,E.,AND O’N EIL,P.A critique of ANSI SQL isola-tion levels.In SIGMOD(New York,NY,USA,1995),ACM, pp.1–10.

[6]B ERNSTEIN,P.A.,AND G OODMAN,N.Concurrency control

in distributed database systems.ACM Computer Surveys13,2 (1981),185–221.

[7]B ORAL,H.,A LEXANDER,W.,C LAY,L.,C OPELAND,G.,

D ANFORTH,S.,F RANKLIN,M.,H ART,B.,S MITH,M.,AND

V ALDURIEZ,P.Prototyping Bubba,a highly parallel database system.IEEE Transactions on Knowledge and Data Engineering 2,1(1990),4–24.

[8]B URROWS,M.The Chubby lock service for loosely-coupled

distributed systems.In7th OSDI(Nov.2006).

[9]C HANG,F.,D EAN,J.,G HEMAWAT,S.,H SIEH,W.C.,W AL-

LACH,D.A.,B URROWS,M.,C HANDRA,T.,F IKES,A.,AND

G RUBER,R.E.Bigtable:A distributed storage system for struc-

tured data.In7th OSDI(Nov.2006),pp.205–218.

[10]C ONDIE,T.,C ONWAY,N.,A LVARO,P.,AND H ELLERSTIEN,

J.M.MapReduce online.In7th NSDI(2010).

[11]C OOPER,B.F.,R AMAKRISHNAN,R.,S RIVASTAVA,U.,S IL-

BERSTEIN,A.,B OHANNON,P.,J ACOBSEN,H.-A.,P UZ,N., W EAVER,D.,AND Y ERNENI,R.PNUTS:Yahoo!’s hosted data serving platform.In Proceedings of VLDB(2008).

[12]D AS,S.,A GRAWAL,D.,AND A BBADI,A.E.ElasTraS:An

elastic transactional data store in the cloud.In USENIX HotCloud (June2009).

[13]D EAN,J.,AND G HEMAWAT,S.MapReduce:Simpli?ed data

processing on large clusters.In6th OSDI(Dec.2004),pp.137–150.

[14]D E C ANDIA,G.,H ASTORUN,D.,J AMPANI,M.,K AKULAPATI,

G.,L AKSHMAN,A.,P ILCHIN,A.,S IVASUBRAMANIAN,S.,

V OSSHALL,P.,AND V OGELS,W.Dynamo:Amazon’s highly available key-value store.In SOSP’07(2007),pp.205–220.

[15]D EWITT, D.,G HANDEHARIZADEH,S.,S CHNEIDER, D.,

B RICKER,A.,H SIAO,H.-I.,AND R ASMUSSEN,R.The gamma

database machine project.IEEE Transactions on Knowledge and Data Engineering2(1990),44–62.[16]D E W ITT, D.,AND S TONEBRAKER,M.MapReduce:A

major step backwards.https://www.wendangku.net/doc/4a15773122.html,/ database-innovation/mapreduce-a-major-step-backwards/. [17]D E W ITT,D.J.,AND G RAY,J.Parallel database systems:the

future of database processing or a passing fad?SIGMOD Rec.

19,4(1990),104–112.

[18]E KANAYAKE,J.,L I,H.,Z HANG,B.,G UNARATHNE,T.,B AE,

S.-H.,Q IU,J.,AND F OX,G.Twister:A runtime for iterative MapReduce.In The First International Workshop on MapReduce and its Applications(2010).

[19]G ERSHENSON,C.,AND P INEDA,L.A.Why does public trans-

port not arrive on time?The pervasiveness of equal headway in-stability.PLoS ONE4,10(102009).

[20]G HEMAWAT,S.,G OBIOFF,H.,AND L EUNG,S.-T.The Google

?le system.vol.37,pp.29–43.

[21]G UPTA,A.,AND M UMICK,I.S.Maintenance of materialized

views:Problems,techniques,and applications,1995.

[22]Hadoop.https://www.wendangku.net/doc/4a15773122.html,/.

[23]HBase.https://www.wendangku.net/doc/4a15773122.html,/.

[24]I SARD,M.,B UDIU,M.,Y U,Y.,B IRRELL,A.,AND F ETTERLY,

D.Dryad:Distributed data-parallel programs from sequential

building blocks.In EuroSys’07(New York,NY,USA,2007), ACM,pp.59–72.

[25]I YER,S.,AND C UTTS,M.Help test some next-generation

infrstructure.https://www.wendangku.net/doc/4a15773122.html,/2009/ 08/help-test-some-next-generation.html,August2009.

[26]L OGOTHETIS,D.,O LSTON,C.,R EED,B.,W EBB,K.C.,AND

Y OCUM,K.Stateful bulk processing for incremental analytics.

In SoCC’10:Proceedings of the1st ACM symposium on cloud computing(2010),pp.51–62.

[27]MongoDB.https://www.wendangku.net/doc/4a15773122.html,/.

[28]P AGE,L.,B RIN,S.,M OTWANI,R.,AND W INOGRAD,T.The

PageRank citation ranking:Bringing order to the web.Tech.rep., Stanford Digital Library Technologies Project,1998.

[29]P ATON,N.W.,AND D′I AZ,O.Active database systems.ACM

Computing Surveys31,1(1999),63–103.

[30]P AVLO,A.,P AULSON,E.,R ASIN,A.,A BADI,D.J.,D EWITT,

D.J.,M ADDEN,S.,AND S TONEBRAKER,M.A comparison of

approaches to large-scale data analysis.In SIGMOD’09(June 2009),ACM.

[31]P OPA,L.,B UDIU,M.,Y U,Y.,AND I SARD,M.DryadInc:

Reusing work in large-scale computations.In USENIX workshop on Hot Topics in Cloud Computing(2009).

[32]S TONEBRAKER,M.Operating system support for database man-

https://www.wendangku.net/doc/4a15773122.html,munications of the ACM24,7(1981),412–418.

[33]NEC Express5800/A1080a-E TPC-E results.https://www.wendangku.net/doc/4a15773122.html,/

tpce/results/tpce result detail.asp?id=110033001,Mar.2010. [34]W EI,Z.,P IERRE,G.,AND C HI,C.-H.CloudTPS:Scalable

transactions for Web applications in the cloud.Tech.Rep.IR-CS-053,Vrije Universiteit,Amsterdam,The Netherlands,Feb.

2010.https://www.wendangku.net/doc/4a15773122.html,/publi/CSTW AC ircs53.html. [35]Z AHARIA,M.,C HOWDHURY,M.,F RANKLIN,M.,S HENKER,

S.,AND S TOICA,I.Spark:Cluster computing with working sets.

In2nd USENIX workshop on Hot Topics in Cloud Computing (2010).

基于FPGA的通用异步收发器设计(串口通信)

FPGA串行通用异步收发器设计 实验目的:1、掌握QuartusII6.0等EDA工具软件的基本使用; 2、熟悉VHDL硬件描述语言编程及其调试方法; 3、学习用FPGA实现接口电路设计。 实验内容: 本实验目标是利用FPGA逻辑资源,编程设计实现一个串行通用异步收发器。实验环境为EDA实验箱。电路设计采用VHDL硬件描述语言编程实现,开发软件为QuartusII6.0。 1、UART简介 UART(Universal Asynchronous Receiver Transmitter通用异步收发器)是一种应用广泛的短距离串行传输接口。常常用于短距离、低速、低成本的通讯中。8250、8251、NS16450等芯片都是常见的UART器件。 基本的UART通信只需要两条信号线(RXD、TXD)就可以完成数据的相互通信,接收与发送是全双工形式。TXD是UART发送端,为输出;RXD是UART接收端,为输入。 UART的基本特点是: (1)在信号线上共有两种状态,可分别用逻辑1(高电平)和逻辑0(低电平)来区分。在发送器空闲时,数据线应该保持在逻辑高电平状态。 (2)起始位(Start Bit):发送器是通过发送起始位而开始一个字符传送,起始位使数据线处于逻辑0状态,提示接受器数据传输即将开始。 (3)数据位(Data Bits):起始位之后就是传送数据位。数据位一般为8位一个字节的数据(也有6位、7位的情况),低位(LSB)在前,高位(MSB)在后。 (4)校验位(parity Bit):可以认为是一个特殊的数据位。校验位一般用来判断接收的数据位有无错误,一般是奇偶校验。在使用中,该位常常取消。 (5)停止位:停止位在最后,用以标志一个字符传送的结束,它对应于逻辑1状态。 (6)位时间:即每个位的时间宽度。起始位、数据位、校验位的位宽度是一致的,停止位有0.5位、1位、1.5位格式,一般为1位。 (7)帧:从起始位开始到停止位结束的时间间隔称之为一帧。 (8)波特率:UART的传送速率,用于说明数据传送的快慢。在串行通信中,数据是按位进行传送的,因此传送速率用每秒钟传送数据位的数目来表示,称之为波特率。如波特率9600=9600bps(位/秒)。 FPGA UART系统组成:如下图所示,FPGA UART由三个子模块组成:波特率发生器;接收模块;发送模块; 2、模块设计:

单片机串口通讯必备基础知识

单片机串口通讯必备基础知识 你想熟悉单片机,那必须先看看单片机的结构和特殊寄存器,这是你编写软件的关键。至于串口通信需要用到那些特殊功能寄存器呢,它们是SCON,TCON,TMOD,SCON等,各代表什么含义呢? SBUF 数据缓冲寄存器 这是一个可以直接寻址的串行口专用寄存器。有朋友这样问起过“为何在串行口收发中,都只是使用到同一个寄存器SBUF?而不是收发各用一个寄存器。”实际上SBUF 包含了两个独立的寄存器,一个是发送寄存,另一个是接收寄存器,但它们都共同使用同一个寻址地址-99H。CPU 在读SBUF 时会指到接收寄存器,在写时会指到发送寄存器,而且接收寄存器是双缓冲寄存器,这样可以避免接收中断没有及时的被响应,数据没有被取走,下一帧数据已到来,而造成的数据重叠问题。发送器则不需要用到双缓冲,一般情况下我们在写发送程序时也不必用到发送中断去外理发送数据。操作SBUF寄存器的方法则很简单,只要把这个99H 地址用关键字sfr定义为一个变量就可以对其进行读写操作了,如sfr SBUF = 0x99;当然你也可以用其它的名称。通常在标准的reg51.h 或at89x51.h 等头文件中已对其做了定义,只要用#include 引用就可以了。 SCON 串行口控制寄存器 通常在芯片或设备中为了监视或控制接口状态,都会引用到接口控制寄存器。SCON 就是51 芯片的串行口控制寄存器。它的寻址地址是98H,是一个可以位寻址的寄存器,作用就是监视和控制51 芯片串行口的工作状态。51 芯片的串口可以工作在几个不同的工作模式下,其工作模式的设置就是使用SCON 寄存器。它的各个位的具体定义如下: SM0 SM1 SM2 REN TB8 RB8 TI RI SM0、SM1 为串行口工作模式设置位,这样两位可以对应进行四种模式的设置。串行口工作模式设置。 SM0 SM1 模式 功能 波特率 0 0 0 同步移位寄存器 fosc/12 0 1 1 8位UART 可变 1 0 2 9位UART fosc/32 或fosc/64 1 1 3 9位UART 可变 在这里只说明最常用的模式1,其它的模式也就一一略过,有兴趣的朋友可以找相关的硬件资料查看。表中的fosc 代表振荡器的频率,也就是晶振的频率。UART 为(Universal Asynchronous Receiver)的英文缩写。

经测试的FPGA串口通信VHDL程序

实验三、FPGA串行通用异步收发器设计 实验目的:1、掌握QuartusII6.0等EDA工具软件的基本使用; 2、熟悉VHDL硬件描述语言编程及其调试方法; 3、学习用FPGA实现接口电路设计。 实验内容: 本实验目标是利用FPGA逻辑资源,编程设计实现一个串行通用异步收发器。实验环境为EDA实验箱。电路设计采用VHDL硬件描述语言编程实现,开发软件为QuartusII6.0。 1、UART简介 UART(Universal Asynchronous Receiver Transmitter通用异步收发器)是一种应用广泛的短距离串行传输接口。常常用于短距离、低速、低成本的通讯中。8250、8251、NS16450等芯片都是常见的UART器件。 基本的UART通信只需要两条信号线(RXD、TXD)就可以完成数据的相互通信,接收与发送是全双工形式。TXD是UART发送端,为输出;RXD是UART接收端,为输入。 UART的基本特点是: (1)在信号线上共有两种状态,可分别用逻辑1(高电平)和逻辑0(低电平)来区分。在发送器空闲时,数据线应该保持在逻辑高电平状态。 (2)起始位(Start Bit):发送器是通过发送起始位而开始一个字符传送,起始位使数据线处于逻辑0状态,提示接受器数据传输即将开始。 (3)数据位(Data Bits):起始位之后就是传送数据位。数据位一般为8位一个字节的数据(也有6位、7位的情况),低位(LSB)在前,高位(MSB)在后。 (4)校验位(parity Bit):可以认为是一个特殊的数据位。校验位一般用来判断接收的数据位有无错误,一般是奇偶校验。在使用中,该位常常取消。 (5)停止位:停止位在最后,用以标志一个字符传送的结束,它对应于逻辑1状态。 (6)位时间:即每个位的时间宽度。起始位、数据位、校验位的位宽度是一致的,停止位有0.5位、1位、1.5位格式,一般为1位。 (7)帧:从起始位开始到停止位结束的时间间隔称之为一帧。 (8)波特率:UART的传送速率,用于说明数据传送的快慢。在串行通信中,数据是按位进行传送的,因此传送速率用每秒钟传送数据位的数目来表示,称之为波特率。如波特率9600=9600bps(位/秒)。 UART的数据帧格式为: FPGA UART系统组成:如下图所示,FPGA UART由三个子模块组成:波特率发生器;接收模块;发送模块; 2、模块设计:

声音信号的获取与处理

实验一声音信号的获取与处理 声音媒体是较早引入计算机系统的多媒体信息之一,从早期的利用PC机内置喇叭发声,发展到利用声卡在网上实现可视电话,声音一直是多媒体计算机中重要的媒体信息。在软件或多媒体作品中使用数字化声音是多媒体应用最基本、最常用的手段。通常所讲的数字化声音是数字化语音、声响和音乐的总称。在多媒体作品中可以通过声音直接表达信息、制造某种效果和气氛、演奏音乐等。逼真的数字声音和悦耳的音乐,拉近了计算机与人的距离,使计算机不仅能播放声音,而且能“听懂”人的声音是实现人机自然交流的重要方面之一。 采集(录音)、编辑、播放声音文件是声卡的基本功能,利用声卡及控制软件可实现对多种音源的采集工作。在本实验中,我们将利用声卡及几种声音处理软件,实现对声音信号的采集、编辑和处理。 实验所需软件: Windows录音机(Windows98内含) Creative WaveStudio(Creative Sound Blaster系列声卡自带) Syntrillium Cool Edit 2000(下载网址:https://www.wendangku.net/doc/4a15773122.html,) 进行实验的基本配置: Intel Pentium 120 CPU或同级100%的兼容处理器 大于16MB的内存 8位以上的DirectX兼容声卡 1.1 实验目的和要求 本实验通过麦克风录制一段语音信号作为解说词并保存,通过线性输入录制一段音乐信号作为背景音乐并保存。为录制的解说词配背景音乐并作相应处理,制作出一段完整的带背景音乐的解说词。 1.2 预备知识 1.数字音频和模拟音频 模拟音频和数字音频在声音的录制和播放方面有很大不同。模拟声音的录制是将代表声音波形的电信号转换到适当的媒体上,如磁带或唱片。播放时将纪录在媒体上的信号还原为波形。模拟音频技术应用广泛,使用方便。但模拟的声音信号在多次重复转录后,会使模拟信号衰弱,造成失真。 数字音频就是将模拟的(连续的)声音波形数字化(离散化),以便利用数字计算机进行处理,主要包括采样和量化两个方面。 2.数字音频的质量 数字音频的质量取决于采样频率和量化位数这两个重要参数。采样频率是对声音波形每秒钟进行采样的次数。人耳听觉的频率上限在2OkHz左右,根据采样理论,为了保证声音

第7章PIC单片机串行口及串行通信技术.pdf

第7章PIC18FXX2串行口及串行通信技术 ?教学目标 串行通信基本知识 串行口及应用 PIC18FXX2与PC机间通信软件的设计

本章知识点概要 ? 1.什么是串行通信,串行通信有什么优点? ? 2.串行通信协议 ? 3.什么是波特率? ? 4.PIC18FXX2中的串行口工作方式及应用 ? 5.PIC18FXX2点对点通信 ?针对PIC18FXX2串行口而言,概括为以下问题: 1、波特率设计,初始化SPBRG 2、设定通信协议(工作方式选择,SYNC) 3、如何启动PIC18FXX2接收、发送数据? 4、如何检查数据是否接收或发送完毕?

7.1 7.1 串行通信基本知识串行通信基本知识 ?在实际工作中,计算机的CPU 与外部设备之间常常要进行信息交换,一台计算机与其他计算机之间也要交换信息,所有这些信息交换均可称为通信。 ?通信方式有两种,即并行通信和串行通信。 ?采用哪种通信方式?----通常根据信息传送的距离决定例如,PC 机与外部设备(如打印机等)通信时,如果距离小于30 m ,可采用并行通信方式;当距离大于30 m 时,则要采用串行通信方式。PIC18FXX2单片机具有并行和串行二种基本通信方式。

并行通信 ?并行通信是指数据的各 位同时进行传送(发送 或接收)的通信方式。 ?优点:传送速度快; ?缺点:数据有多少位, 就需要多少根传送线。 ?例如,右图PIC18FXX2 单片机与外部设备之间 的数据传送就属于并行 通信。

串行通信 ?串行通信是指数据一位(bit)一位按顺序传送的通信方式。?优点:只需一对传输线(利用电话线就可作为传输线),大大降低了传送成本,特别适用于远距离通信; ?缺点:传送速度较低。假设并行传送N位数据所需时间为T,那么串行传送的时间至少为N*T,实际上总是大于N*T。 接收设备发送设备 D2 D1 D0 D3 D7 D6 D5 D4

我的processing学习笔记

楼主作为一个纯粹的工科男,却阴差阳错到了机关坐办公室,沦落为天天写材料为生,内心实在是千万个草泥马在狂奔。为了保持工科男的本性,也为了不被这些无聊的材料压成神经病,决定给自己找点乐趣。去年先是接触了一下arduino,编程完成了一个遥控小车,利用appinventor编了个手机遥控软件,基本实现了在手机屏幕上画图形,小车就在地上按画的路径行走。开始心挺大,想进一步学习做个小四轴,自平衡小车,激光雕刻机等,但是由于现在每天下班第一任务是陪孩子玩,加之学习硬件还是比较烧钱,结果就荒废了。上个月无意中发现了processing,又看了一些大神的作品,觉得很有意思,而且学习软件比学习硬件时间上比较灵活(比如每天中午可以学习半小时),所以决定学习一下,丰富一下自己的业余生活。为了避免再次半途而废特开此贴记录过程(不过还是很难说,哈哈)。 今天先补上前段时间零星学习的内容: 学习用教材:《爱上processing》、《代码本色》。 一、已学习的常用简单命令 1.设置屏幕尺寸:size(宽,高); 2.画点:point(x,y); 3.划线:line(x1,y1,x2,y2); 4.画三角形:triangle(x1,y1,x2,y2,x3,y3); 5.四边形:quad(x1,y1,x2,y2,x3,y3,x4,y4); 6.角度换算为弧度:radians(角度); 7.长方形:rect(x,y,宽,高); 8.椭圆:ellipse(x,y,宽,高);//目前用这个命令用的最多,嘿嘿 9.平滑曲线:smooth(); 10.关闭平滑曲线:noSmooth(); 11.设置线宽:strokeWeight(x); 12.背景颜色:background(x,x,x);//只填一个参数为灰度,0为黑255为白;填三个参 数为RGB颜色 13.fill(x,x,x,x)//颜色同上,第四个参数为透明度 14.鼠标当前坐标:mouseX,mouseY 15.上一帧鼠标坐标:pmouseX,pmouseY 16.测量两点之间的距离:dist(x1,y1,x2,y2); 17.mousePressed:布尔量,鼠标按下后为真(true,false) 18.mouseButton:返回值:LEFT,CENTER,RIGHT 用来判断哪个键按下 。。。。。 还有一些图形命令,目前用的还很少,暂时没学。 二、编程格式 1.首先创建不在setup()和draw()函数的变量(全局变量) 2.setup(){…};内的命令执行一遍 3.draw(){…}; 内的命令持续执行 三、面向对象编程 由于《代码本色》完全用的是面向对象的编程方式,而本人大学时学的计算机语言是老掉牙的FORTRAN,基本上就没听说过面向对象的编程,只有重新学习面向对象的编程方法,不过学习了两天,试着编了几个小程序,发现这种编程方法确实很强大。这里就照搬一下《爱上processing》里的描述,具体的还是得自己编几个程才能体会: class xxx //1.创建一个类 {

学习“声音素材的获取与处理”心得体会

学习“声音素材的获取与处理”心得体会 东风中学祁聪2014年11月6、13、20、27日,我学习了“声音素材的获取与处理”的课程,通过学习我的到了一些心得体会。 首先,学习了声音素材的的获取: 一、声音素材主要包括背景音乐、解说词、郎诵、效果声及评语分析等等。 二、多媒体课件中的声音主要包括人声、音乐和音响效果三大类。 三、恰当的使用音乐和音响效果的作用 四、设计声音素材时的注意事项 五、数字声音、声音文件的采集和制作可以有以下7种方式、音频素材的获取方法、利用属性查找音频素材资源地址方法、利用属性查找音频素材资源地址方法、利用话筒录制声音的步骤、录音音量列表名词解释 通过这些学习我知道了声音的获取、录制、格式、编辑等方法。 其次、学习了MP3、WAV格式的区别。 1——MP3(MPEG AUDIO LAYER 3)是一种具有高压缩率的音响信号文件。虽然它音乐信号的压缩比例较高,但依然可以与CD/MD 的音质媲美。MP3高达10比1的压缩比例。使一张CD-R/RW上可以容纳10张普通CD的音乐。达到可以长时间播放音乐。您可以从互联网或其它渠道获取MP3格式的音乐。 2——WMA(WINDOW MEDIA AUDIO)是微软公司所开发的。引

导示来音乐的声音压缩技术。其音质可以与MP3媲美,有较高的压缩。有部分歌曲制成WMA格式音乐的大小可以达到MP3的三分之一!只要通过WINDOW MEDIA PLAYER 7.0以上的版本,就能将您喜爱的音乐编辑成WMA档案。 3——WAV(Waveform)格式是微软公司开发的一种声音文件格式,也叫波形声音文件,是最早的数字音频格式,被Windows平台及其应用程序广泛支持。WAV格式支持许多压缩算法,支持多种音频位数、采样频率和声道,采用44.1kHz的采样频率,16位量化位数,因此WAV的音质与CD相差无几,但WAV格式对存储空间需求太大不便于交流和传播。 总之,学习了这个课程,我学会了很多的东西,特别是在计算机信息处理得到了很大的提升。对声音的处理也学到了很多的东西。

DSP课程设计 同步串口通信在TMS320C643上实现

摘要 进入21世纪之后,数字化浪潮正在席卷全球,数字信号处理器DSP(Digital Signal Processor)正是这场数字化革命的核心,无论在其应用的广度还是深度方面,都在以前所未有的速度向前发展。数字信号处理是利用计算机或专用处理设备,以数字的形式对信号进行分析、采集、合成、变换、滤波、估算、压缩、识别等加工处理,以便提取有用的信息并进行有效的传输与应用。 DSP可以代表数字信号处理技术(Digital Signal Processing),也可以代表数字信号处理器(Digital Signal Processor)。前者是理论和计算方法上的技术,后者是指实现这些技术的通用或专用可编程微处理器芯片。 本文就是就是基于DSP原理及应用编写设计的同步串口通信在TMS320C643上实现。其集成开发环境为CCS,工作平台是SEED-DTK 。CCS 是TI公司推出的用于开发DSP芯片的集成开发环境,它采用Windows风格界面,集编辑、编译、链接、软件仿真、硬件调试以及实时跟踪等功能于一体,极大地方便了DSP芯片的开发与设计,是目前使用最为广泛的DSP开发软件之一。SEED-DTK(DSP Teaching Kit)是一套可以满足大学本科、研究生和教师科研工作的综合实验设备。SEED-DTK 是我公司在总结以往产品的基础上,以独特的多DSP 结构、强大的DSP 主板功能、丰富的外围实验电路、精心设计的实验程序、精湛的产品工艺形成的高性能产品。 关键字:同步串口通信 DSP CCS SEED-DTK

目录 一.功能描述 ---------------------------------------------------------- 3二.概要设计 ---------------------------------------------------------- 3 2.1 McBSP 介绍------------------------------------------------- 3 2.2 设计目的------------------------------------------------------ 4 2.3 设计概要------------------------------------------------------ 4三.详细设计 ---------------------------------------------------------- 4 3.1 实验程序功能与结构说明 -------------------------------- 4 3.2 程序流程图 ---------------------------------------------------- 5四.调试过程及效果 ------------------------------------------------- 5 4.1 实验准备------------------------------------------------------ 5 4.2 调试过程及效果 -------------------------------------------- 6 4.2.1 创建源文件 -------------------------------------------- 6 4.2.2 创建工程文件 ----------------------------------------- 7 4.2.2 设置编译与连接选项 -------------------------------- 8 4.2.3 工程编译与调试 ------------------------------------ 10 五.存在问题 -------------------------------------------------------- 12 六. 心得-------------------------------------------------------------- 12 七.参考文献 -------------------------------------------------------- 12 附录(源程序) ----------------------------------------------------- 13

Halcon学习笔记

Halcon学习笔记 1、Halcon的自我描述 Program Logic Each program consists of a sequence of HALCON operators The program can be structured into procedures The sequence can be extended by using control operators like if, for, repeat, or while The results of the operators are passed via variables No implicit data passing is applied Input parameters of operators can be variables or expressions Output parameters are always variables HDevelop has no features to design a graphical user interface An HDevelop program is considered as a prototypic solution of the vision part of an application HDevelop is typically not used for the final application 由此可以看出,Halcon的定位是一个类库,有着完整、快速实现函数,同时提供了HDevelop 作为快速开发的图形化(IDE)界面;但是,Halcon程序并不是一个完整的最终应用软件,它没有用户界面,也不提供显示的数据(公用的数据格式)。 Halcon的初学者也应当从参考Halcon的程序入手,熟悉Halcon类库,也即HDevelop-Based Programming;在此基础上,进入ORClass-Oriented Programming。这也是Halcon推荐的开发方式: The vision part is solved with HDevelop,and the application is developed with C++ or Visual Basic。 2、HDevelop界面的学习 通过阅读Halcon的PPT,学到了下面一些有用的信息: 文件——浏览示例,可以看到很多有用的例子; 程序窗体中,可以浏览与编辑Procedues(过程),这个其实就是自定义函数咯~还可以自己修改这些过程,并添加说明文档; F4——将函数语句注释掉;F3——激活; 本地过程(Local Procedue)与外部过程(Externel Procedue) 3、基本语法结构 Halcon的语法结构 类似于Pascal 与Visual Basic,大部分的语句是Halcon提供的算子,此外也包含了少部分的控制语句; 不允许单独声明变量; 提供自动的内存管理(初始化、析构及OverWrite),但句柄则需要显示释放; C++(算子模式) 通过代码导出,以C++为例,默认导出为算子型的语法结构,而非面向对象的;在此模式下,全部函数声明为全局类型,数据类型只需要用Hobject、HTuple两类类型进行声明; C++(面向对象) 可以以面向对象的方式重写代码,也即利用类及类的成员函数; 在这种模式下,控制变量的类型仍未HTuple,而图形数据可以由多种类型,如HImage等;其他语言(略)

声音的获取与处理

声音的获取与处理(初中信息技术八年级)【教学设计学科名称】 声音的获取与处理是甘肃教育、甘肃声像出版社出版的初中信息技术八年级教材全一册模块一《多媒体素材的获取与处理》第三节教学内容。0 【学情分析】 授课对象是八年级学生。八年级学生经过前两节内容的学习,已基本具备多渠道获取信息的能力,对电脑的操作使用,文字信息、数据信息、多媒体信息均具备了一定的处理能力。而本节课的内容《声音的获取与处理》对学生来说应该是新奇、好玩的,且“学会了是有用的”。从内容上比较容易使学生主动注意,激发他们的求学欲。 【教材内容分析】 本节内容是甘肃教育、甘肃声像出版社出版的初中信息技术八年级教材全一册模块一《多媒体素材的获取与处理》第三节教学内容。本节主要让学生学会使用“录音机”录音,学会使用“豪杰超级解霸”抓取cd唱片中的声音,掌握使用“录音机”处理声音效果的方法。要求学生通过本节课的学习能了解声音文件的获取途径与方法,能正确选择适合的声音文件格式,并初步掌握声音文件的播放、转换和编辑。 【教学目标】

知识与技能:学会使用“录音机”录音,学会使用“豪杰超级解霸”抓取cd唱片中的声音,掌握使用“录音机”处理声音效果的方法。 过程与方法:采用创设情景、任务驱动的教学法,将知识技能融合于生活任务中,创设能激发学生兴趣的任务情境。采用简单合适的分组方法,在任务操作过程中融入合作交流的因子,倡导合作探究学习。 情感态度与价值观:培养学生根据实际需要主动运用多媒体处理工具加工和表达信息的能力关,并关注声音文件的版权问题,尊重知识产权。 【教学重难点分析】 教学重点:“录音机”录音方法,“豪杰超级解霸”抓取cd唱片中的声音 教学难点:使用“录音机”处理声音效果方法的掌握 【教学课时】 2课时 【教学过程】

串口通讯

串口通信的基本知识概念(232 422 485) 串口通信的基本概念: 1,什么是串口? 2,什么是RS-232? 3,什么是RS-422? 4,什么是RS-485? 5,什么是握手? 1,什么是串口 串口是计算机上一种非常通用设备通信的协议(不要与通用串行总线 Universal Serial Bus或者USB混淆)。大多数计算机包含两个基于RS232的串口。串口同时也是仪器仪表设备通用的通信协议;很多GPIB兼容的设备也带有RS-232口。同时,串口通信协议也可以用于获取远程采集设备的数据。 串口通信的概念非常简单,串口按位(bit)发送和接收字节。尽管比按字节(byte) 的并行通信慢,但是串口可以在使用一根线发送数据的同时用另一根线接收数据。它 很简单并且能够实现远距离通信。比如IEEE488定义并行通行状态时,规定设备线总 常不得超过20米,并且任意两个设备间的长度不得超过2米;而对于串口而言,长度可达1200米。 典型地,串口用于ASCII码字符的传输。通信使用3根线完成:(1)地线,(2)发送,(3)接收。由于串口通信是异步的,端口能够在一根线上发送数据同时在另一根线上接收数据。其他线用于握手,但是不是必须的。串口通信最重要的参数是波特率、数据位、停止位和奇偶校验。对于两个进行通行的端口,这些参数必须匹配: a,波特率:这是一个衡量通信速度的参数。它表示每秒钟传送的bit的个数。例如 300波特表示每秒钟发送300个bit。当我们提到时钟周期时,我们就是指波特率例如如果协议需要4800波特率,那么时钟是4800Hz。这意味着串口通信在数据线上的采 样率为4800Hz。通常电话线的波特率为14400,28800和36600。波特率可以远远大于这些值,但是波特率和距离成反比。高波特率常常用于放置的很近的仪器间的通信,典型的例子就是GPIB设备的通信。 b,数据位:这是衡量通信中实际数据位的参数。当计算机发送一个信息包,实际的数据不会是8位的,标准的值是5、7和8位。如何设置取决于你想传送的信息。比如,标准的ASCII码是0~127(7位)。扩展的ASCII码是0~255(8位)。如果数据使用简单的文本(标准 ASCII码),那么每个数据包使用7位数据。每个包是指一个字节,包括开始/停止位,数据位和奇偶校验位。由于实际数据位取决于通信协议的选取,术语“包”指任何通信的情况。 c,停止位:用于表示单个包的最后一位。典型的值为1,1.5和2位。由于数据是在 传输线上定时的,并且每一个设备有其自己的时钟,很可能在通信中两台设备间出现 了小小的不同步。因此停止位不仅仅是表示传输的结束,并且提供计算机校正时钟同

GSK218M990MA串口通讯软件说明书

串口通讯软件说明书 串口通讯软件为Windows界面,用于PC端向CNC端发送文件、接收文件,或者进行DNC加工。该软件可运行于Win98、WinMe、WinXP及Win2K。 1 程序启动: 直接运行GSK Comm .exe程序。程序启动后界面如下: 2 功能介紹: 1.文件菜单 文件菜单里包括新建、打开和保存程序文件,打印和打印设置,最近打开的文件列 表等功能。 2.编辑菜单 编辑菜单包括剪切、复制、粘贴、撤消、查找、替换等功能。 3.串口菜单 主要是串口的打开和设置。 4.传输方式菜单 包括DNC传输方式、文件发送传输方式、文件接收传输方式。 5.查看菜单

工具栏和状态栏的显示和隐藏。 6.帮助菜单 本软件的版本信息。 3 软件使用: 1. DNC传输方式 注:需要将系统I/O通道设为0 1) 通过文件菜单的“打开”按钮或者工具栏的打开按钮打开程序文 件,有必要的话可以利用本软件再进一步编辑。 2) 打开并设置好串口,如上图所示,选择适用于GSK218M,系统默认的 DNC波特率是38400,可通过参数重新设置(具体参考218M系统操作 说明书)。218M系统设置为数据位8位,停止位0位,无奇偶校验。 3) 第一和第二步顺序可相互交换,不影响接下来的传输和加工;但接下去的 步骤必须按顺序操作,否则会影响传输和加工效果。 4) CNC端和机床准备好了之后,按下CNC面板上的按钮。 5) 打开传输方式菜单的“DNC”菜单项或者是按下工具栏的 DNC传输按钮,找到程序开始传送数据。 6) 当“发送字节”数停止时,按下CNC面板上的键接收数据,然后 再按下CNC面板上的按钮开始加工。 7) 接下去的可以正常加工的方式进行操作。 8)传输开始后,本程序会显示出传输的情况,包括传输的文件名,传输的字 节数,传输的行数,传输所用的时间和传输的速度(字节/秒);界面如下: 此时除结束传输之外,请不要对本软件进行其它的操作。加工完后按键

storm学习笔记

Storm 对比Hadoop的批处理,Storm是个实时的、分布式以及具备高容错的计算系统。同Hadoop 一样Storm也可以处理大批量的数据,然而Storm在保证高可靠性的前提下还可以让处理进行的更加实时;也就是说,所有的信息都会被处理。Storm同样还具备容错和分布计算这些特性,这就让Storm可以扩展到不同的机器上进行大批量的数据处理。他同样还有以下的这些特性: ?易于扩展。对于扩展,你只需要添加机器和改变对应的topology(拓扑)设置。Storm 使用Hadoop Zookeeper进行集群协调,这样可以充分的保证大型集群的良好运行。 ?每条信息的处理都可以得到保证。 ?Storm集群管理简易。 ?Storm的容错机能:一旦topology递交,Storm会一直运行它直到topology被废除或者被关闭。而在执行中出现错误时,也会由Storm重新分配任务。 ?尽管通常使用Java,Storm中的topology可以用任何语言设计。 当然为了更好的理解文章,你首先需要安装和设置Storm。需要通过以下几个简单的步骤: ?从Storm官方下载Storm安装文件 ?将bin/directory解压到你的PATH上,并保证bin/storm脚本是可执行的。 Storm组件 Storm集群主要由一个主节点和一群工作节点(worker node)组成,通过Zookeeper进行协调。 主节点: 主节点通常运行一个后台程序—— Nimbus,用于响应分布在集群中的节点,分配任务和监测故障。这个很类似于Hadoop中的Job Tracker。 工作节点: 工作节点同样会运行一个后台程序—— Supervisor,用于收听工作指派并基于要求运行工作进程。每个工作节点都是topology中一个子集的实现。而Nimbus和Supervisor之间的协调则通过Zookeeper系统或者集群。 Zookeeper

《声音的获取与加工》教学设计

手机铃声我来做 ——声音的编辑 ■教材分析 本节课是《初中信息技术(上册)》第7单元音视频获取与编辑第1节《声音的获取与加工》的教学内容。在第一课时掌握了声音的获取和连接后,这节课要求学生通过学习能进一步掌握删除、裁减、混合、制作特殊效果等编辑,以增强声音的表现力,并根据需要制作声音作品。 ■学情分析 对象是初一学生,小班化的学生一般在小学阶段接触信息技术课程少,操作水平不高,GoldWave软件对学生来说是陌生的,操作界面不熟悉,很多专有名词,但这部分内容对于学生来说是新奇的、有趣的、实用的,比较能够激发学生的主动性,所以教学中要注意操作技巧的指导。 ■教学目标 1.知识与技能 (1)能够正确的对音频片断进行删除 (2)能为音频加上回音特效,并熟悉其中参数。 (3)能合理对声音加上混音效果 (4)能按需要对音频进行变声 (5)能根据信息呈现需求,选择适当的方法,对声音信息进行适当的编辑。 2.过程与方法 在小组探究的过程中,体验GoldWave软件对声音素材进行加工编辑的过程,掌握加工音频文件一般思路与方法。 3.情感态度与价值观 (1)在师生与生生的互动交流中培养学生的交流和表达能力; (2)在小组合作探究中,让学生形成自主探究和团体合作意识; (2)感受声音信息在表达、交流中的效果,培养学生音乐审美能力,培养学生主动运用工具软件加工和表达信息的能力。 4.行为与创新 体验信息技术在加工声音文件上的优势,并将所学过的声音编辑知识应用于

生活,进行声音作品的创作,为生活添姿添彩。 ■课时安排 1课时。 ■教学重点与难点 1.教学重点 (1)混合声音的操作方法。 (2)加入回音特效的操作方法。 (3)变声的操作方法 2.教学难点 (1)根据任务对声音文件做有效的删剪。 (2)掌握音频修饰的重点,使得所掌握的技巧为主题服务。■教学方法与手段 教学方法:自主探究、合作学习 教学手段:多媒体辅助教学 ■课前准备 网络多功能教室、任务表 ■教学过程

第三讲 声音的采集与处理

第三讲声音的采集与处理 教学目标: 1.了解常见声音文件的格式。 2.掌握制作声音文件的一般流程。 3.会用Sound Forge等录音软件录制声音。 4.掌握用Sound Forge编辑声音的基本方法,能熟练地对声音文件进行剪辑与合成。 5.掌握熔炼五音,用Sound Forge对声音进行特殊效果处理的方法。 重点: 录音及对声音进行基本编辑的方法。 难点:声音的剪辑、合成及特殊效果处理方法。 一、常用声音文件格式 常用的声音文件格式有:WAV格式、MIDI格式、MP3格式、CDA格式。 WAV格式:WAV格式是多媒体教学软件中常用的声音文件格式,它的兼容性非常好,但文件较大。WAV格式的声音属性,如采样频率、采样位数、声道数直接影响到WAV格式文件的大小。 MIDI格式:是电子乐器声音文件格式, MIDI文件本身只是一些数字信号,占用磁盘空间较小,常作为多媒体教学软件的背景音乐文件。 MP3格式:是一种经过压缩的文件格式,播放时需要专门的MP3播放器。占用磁盘空间较小。 CDA格式:CD唱片中的音乐文件常用CDA格式保存,一般为44kHz,16bits立体声音频质量。 二、声音文件的制作流程 我们在制作多媒体教学软件时,需要各种各样的声音文件,对声音的制作一般分为两个基本阶段:声音的获取阶段,声音的加工处理阶段。声音的获 取有三种方法来源:剥离视频中的声音,录音,使用已有的声音文件。 声音的处理流程是:首先打开声音文件,然后对声音进行基本剪辑,进一

第一节走进Sound Forge 三、走进Sound Forge 我们可以把Sound Forge视为熔炼声音的熔炉,它能够对音频文件(.wav 文件)、视频文件(.avi文件)中的声音进行各种处理,打造出我们需要 的声音效果。在制作多媒体教学软件时,你想对获得的原始声音素材进行灵 活的处理吗?那么走进Sound Forge,让我们来领略它神气强大的功能吧! 好了,下面就让大家轻松亲身体验一下,为一多媒体教学软件制作声音。 首选来欣赏:我为一年级小学语文课文《一次比一次有进步》教学软件制作的声音文件。 下面就让我们用Sound Forge7.0一步步试着为课文录音、配音吧!要完成上面教学软件中的声音,要经过如下步骤: (一)录制声音 1.建立新的声音文件 选择“File”菜单下的“New”命令,新建一声音文件。在弹出的对话框中,设置新建声音文件的格式,即采样位数,声道数(立体声/单声道),采样频率,然后单击“OK”。 2.开始录音 2.1启动录音功能: 你可以用三种方法启动录音功能:按快捷键Ctrl+R; 单击工具栏上的录音按钮——红色圆点键; 选择菜单“Special”\“Transport”\下的“Record(录音)”命令; 2.2设置录音模式: 当你按下录音键后,会弹出一个录音设置对话框。你可以设置:录音模式(Mode),录音起始(start)、停止(End)时间位置。录音时的采 样率(samplerate)、采样位数(sample size)、立体声/单声道(stereo/mono) 的选择。 2.3开始录音:设置完毕后,单击录音设置对话框中的红色录音按钮,即 可用麦克风开始录音。 4.停止录音:按“End”停止按钮即可结束录音。 5.保存声音文件:选择菜单“File”下的“Save as”命令,保存文件。 这样一个自己录制的声音文件已经录制好了。(听听我录制的声音吧) 你想知道吗?(补充材料) (一).声音文件的三个基本属性

processing 学习1~5

processing 学习第一天笔记 Processing Month第一天连接点第一部分 这篇文章中,我们来看一下如何计算一个圆周上的点的坐标,并将他们连接起来。我们将用灵活的方式来实现基于6个点和18个点的图像 计算 要计算这些点的坐标,必须知道圆上的点数量和圆的半径。本例中,我们将画12个点。 int numPoint = 12; float radius = 150; 下一步,我们来算一下每个点之间的角度。众所周知一个整圆的角度是360度或2π弧度,所以用360度除以圆上的点数,就得到两点之间的角度。例子中使用了弧度而不是角度,是因为cos()和sin()函数的形参是弧度数,不是角度数。Processing中有一些关于圆和半圆的常量,TWO_PI就代表了常量PI*2。(这里的PVector其实是类型,代表这一个点) float angle = TWO_PI / numPoint; for(int i=0 ; i.x, points.y,points[j].x,points[j].y ); }

串行通信技术基础知识

串行通信技术基础 在串行通信中,参与通信的两台或多台设备通常共享一条物理通路。发送者依次逐位发送一串数据信号,按一定的约定规则为接收者所接收。由于串行端口通常只是定义了物理层的接口规范,所以为确保每次传送的数据报文能准确到达目的地,使每一个接收者能够接收到所有发向它的数据,必须在通信连接上采取相应的措施。 由于借助串行通信端口所连接的设备在功能、型号上往往互不相同,其中大多数设备出了等待接收数据之外还会有其他的任务,例如,一个数据采集单元需要周期性地收集和存储数据;一个控制器需要负责控制计算机或向其他设备发送报文;一台设备可能会在接收方正在进行其他任务时向它发送信息。因此,必须有能应对多种不同工作状态的一系列规则来保证通信的有效性。这里所讲的保证串行通信的有效性的方法包括:使用轮询或者中断来检测、接收信息;设置通信帧的起始、停止位;建立连接握手;实行对接收数据的确认、数据缓存以及错误检查等。 一、串行通信基本概念 1、连接握手 通信帧的起始位可以引起接收方的注意,但发送方并不知道,也不能确定接收方是否已经做好了接收数据的准备。利用连接握手可以使收发双方确认已经建立了连接关系,接收方已经做好准备,可以进入数据收发状态。 连接握手过程是指发送者在发送一个数据块之前使用一个特定的握手信号来引起接收者的注意,表明要发送数据,接收者则通过握手信号回应发送者,说明它已经做好了接收数据的准备。 连接握手可以通过软件,也可以通过硬件来实现。在软件连接握手中,发送者通过发送一个字节表明它想要发送数据;接收者看到这个字节的时候,也发送一个编码来声明自己可以接收数据;当发送者看到这个信息时,便知道它可以发送数据了。接收者还可以通过另一个编码来告诉发送者停止发送。 在普通的硬件握手中,接收者在准备好了接收数据的时候将相应的握手信号线变为高电平,然后开始全神贯注地监视它的串行输入端口的允许发送端。这个允许发送端与接收者已准备好接收数据的信号端相连,发送者在发送数据之前一直在等待这个信号变化。一旦得到信号说明接收者已处于准备好接收数据的状态,便开始发送数据。接收者可以在任意时候将握手信号变为低电平,即便是在接收一个数据块的过程中间也可以把这根导线带入到低电平。当发送者检测到这个低电平信号时,就应该停止发送。而在完成本次传输之前,发送者还会继续等待握手信号线在此变为高电平,以继续被中止的数据传输。 2、确认 接收者为表明数据已经收到而向发送者回复信息的过程称为确认。有的传输过程可能会收到报文而不需要向相关节点回复确认信息。但是在许多情况下,需要通过确认告之发送者数据已经收到。有的发送者需要根据是否收到信息来采取相应的措施,因而确认对某些通信过程是必需的和有用的。即便接收者没有其他信息要告诉发送者,也要为此单独发一个数据确认已经收到的信息。 确认报文可以是一个特别定义过的字节,例如一个标识接收者的数值。发送者收到确认报文就可以认为数据传输过程正常结束。如果发送者没有收到所希望回复的确认报文,它就认为通信出现了问题,然后将采取重发或者其它行动。 3、中断 中断是一个信号,它通知CPU有需要立即响应的任务。每个中断请求对应一个连接到中断源和中断控制器的信号。通过自动检测端口事件发现中断并转入中断处理。 许多串行端口采用硬件中断。在串口发生硬件中断,或者一个软件缓存的计数器到达一个触发值时,表明某个事件已经发生,需要执行相应的中断响应程序,并对该事件做出及时的反应。这种过程也称为事件驱动。

相关文档