当前位置：文档库 › hadoopdb 介绍

hadoopdb 介绍

HadoopDB:An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

Azza Abouzeid1,Kamil Bajda-Pawlikowski1,

Daniel Abadi1,Avi Silberschatz1,Alexander Rasin2

1Y ale University,2Brown University

{azza,kbajda,dna,avi}@https://www.wendangku.net/doc/2e3240159.html,;alexr@https://www.wendangku.net/doc/2e3240159.html,

ABSTRACT

The production environment for analytical data management ap-plications is rapidly changing.Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines,and moving towards cheaper,lower-end,commodity hardware,typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private“clouds”. At the same time,the amount of data that needs to be analyzed is exploding,requiring hundreds to thousands of machines to work in parallel to perform the analysis.

There tend to be two schools of thought regarding what tech-nology to use for data analysis in such an environment.Propo-nents of parallel databases argue that the strong emphasis on per-formance and ef?ciency of parallel databases makes them well-suited to perform such analysis.On the other hand,others argue that MapReduce-based systems are better suited due to their supe-rior scalability,fault tolerance,and?exibility to handle unstructured data.In this paper,we explore the feasibility of building a hybrid system that takes the best features from both technologies;the pro-totype we built approaches parallel databases in performance and ef?ciency,yet still yields the scalability,fault tolerance,and?exi-bility of MapReduce-based systems.

1.INTRODUCTION

The analytical database market currently consists of$3.98bil-lion[25]of the$14.6billion database software market[21](27%) and is growing at a rate of10.3%annually[25].As business“best-practices”trend increasingly towards basing decisions off data and hard facts rather than instinct and theory,the corporate thirst for systems that can manage,process,and granularly analyze data is becoming insatiable.Venture capitalists are very much aware of this trend,and have funded no fewer than a dozen new companies in recent years that build specialized analytical data management soft-ware(e.g.,Netezza,Vertica,DATAllegro,Greenplum,Aster Data, Infobright,Kick?re,Dataupia,ParAccel,and Exasol),and continue to fund them,even in pressing economic times[18].

At the same time,the amount of data that needs to be stored and processed by analytical database systems is exploding.This is Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment.To copy otherwise,or to republish,to post on servers or to redistribute to lists,requires a fee and/or special permission from the publisher,ACM.

VLDB‘09,August24-28,2009,Lyon,France

Copyright2009VLDB Endowment,ACM000-0-00000-000-0/00/00.partly due to the increased automation with which data can be pro-duced(more business processes are becoming digitized),the prolif-eration of sensors and data-producing devices,Web-scale interac-tions with customers,and government compliance demands along with strategic corporate initiatives requiring more historical data to be kept online for analysis.It is no longer uncommon to hear of companies claiming to load more than a terabyte of structured data per day into their analytical database system and claiming data warehouses of size more than a petabyte[19].

Given the exploding data problem,all but three of the above mentioned analytical database start-ups deploy their DBMS on a shared-nothing architecture(a collection of independent,possibly virtual,machines,each with local disk and local main memory, connected together on a high-speed network).This architecture is widely believed to scale the best[17],especially if one takes hardware cost into account.Furthermore,data analysis workloads tend to consist of many large scan operations,multidimensional ag-gregations,and star schema joins,all of which are fairly easy to parallelize across nodes in a shared-nothing network.Analytical DBMS vendor leader,Teradata,uses a shared-nothing architecture. Oracle and Microsoft have recently announced shared-nothing an-alytical DBMS products in their Exadata1and Madison projects, respectively.For the purposes of this paper,we will call analytical DBMS systems that deploy on a shared-nothing architecture paral-lel databases2.

Parallel databases have been proven to scale really well into the tens of nodes(near linear scalability is not uncommon).However, there are very few known parallel databases deployments consisting of more than one hundred nodes,and to the best of our knowledge, there exists no published deployment of a parallel database with nodes numbering into the thousands.There are a variety of reasons why parallel databases generally do not scale well into the hundreds of nodes.First,failures become increasingly common as one adds more nodes to a system,yet parallel databases tend to be designed with the assumption that failures are a rare event.Second,parallel databases generally assume a homogeneous array of machines,yet it is nearly impossible to achieve pure homogeneity at scale.Third, until recently,there have only been a handful of applications that re-quired deployment on more than a few dozen nodes for reasonable performance,so parallel databases have not been tested at larger scales,and unforeseen engineering hurdles await.

As the data that needs to be analyzed continues to grow,the num-ber of applications that require more than one hundred nodes is be-ginning to multiply.Some argue that MapReduce-based systems 1To be precise,Exadata is only shared-nothing in the storage layer. 2This is slightly different than textbook de?nitions of parallel databases which sometimes include shared-memory and shared-

disk architectures as well.

[8]are best suited for performing analysis at this scale since they were designed from the beginning to scale to thousands of nodes in a shared-nothing architecture,and have had proven success in Google’s internal operations and on the TeraSort benchmark[7]. Despite being originally designed for a largely different application (unstructured text data processing),MapReduce(or one of its pub-licly available incarnations such as open source Hadoop[1])can nonetheless be used to process structured data,and can do so at tremendous scale.For example,Hadoop is being used to manage Facebook’s2.5petabyte data warehouse[20].

Unfortunately,as pointed out by DeWitt and Stonebraker[9], MapReduce lacks many of the features that have proven invaluable for structured data analysis workloads(largely due to the fact that MapReduce was not originally designed to perform structured data analysis),and its immediate grati?cation paradigm precludes some of the long term bene?ts of?rst modeling and loading data before processing.These shortcomings can cause an order of magnitude slower performance than parallel databases[23].

Ideally,the scalability advantages of MapReduce could be com-bined with the performance and ef?ciency advantages of parallel databases to achieve a hybrid system that is well suited for the an-alytical DBMS market and can handle the future demands of data intensive applications.In this paper,we describe our implementa-tion of and experience with HadoopDB,whose goal is to serve as exactly such a hybrid system.The basic idea behind HadoopDB is to use MapReduce as the communication layer above multiple nodes running single-node DBMS instances.Queries are expressed in SQL,translated into MapReduce by extending existing tools,and as much work as possible is pushed into the higher performing sin-gle node databases.

One of the advantages of MapReduce relative to parallel databases not mentioned above is cost.There exists an open source version of MapReduce(Hadoop)that can be obtained and used without cost.Yet all of the parallel databases mentioned above have a nontrivial cost,often coming with seven?gure price tags for large installations.Since it is our goal to combine all of the advantages of both data analysis approaches in our hybrid system, we decided to build our prototype completely out of open source components in order to achieve the cost advantage as well.Hence, we use PostgreSQL as the database layer and Hadoop as the communication layer,Hive as the translation layer,and all code we add we release as open source[2].

One side effect of such a design is a shared-nothing version of PostgreSQL.We are optimistic that our approach has the potential to help transform any single-node DBMS into a shared-nothing par-allel database.

Given our focus on cheap,large scale data analysis,our tar-get platform is virtualized public or private“cloud computing”deployments,such as Amazon’s Elastic Compute Cloud(EC2) or VMware’s private VDC-OS offering.Such deployments signi?cantly reduce up-front capital costs,in addition to lowering operational,facilities,and hardware costs(through maximizing current hardware utilization).Public cloud offerings such as EC2 also yield tremendous economies of scale[14],and pass on some of these savings to the customer.All experiments we run in this paper are on Amazon’s EC2cloud offering;however our techniques are applicable to non-virtualized cluster computing grid deployments as well.

In summary,the primary contributions of our work include:

?We extend previous work[23]that showed the superior per-formance of parallel databases relative to Hadoop.While this previous work focused only on performance in an ideal set-ting,we add fault tolerance and heterogeneous node experi-

ments to demonstrate some of the issues with scaling parallel databases.

?We describe the design of a hybrid system that is designed to yield the advantages of both parallel databases and MapRe-duce.This system can also be used to allow single-node databases to run in a shared-nothing environment.

?We evaluate this hybrid system on a previously published benchmark to determine how close it comes to parallel DBMSs in performance and Hadoop in scalability.

2.RELATED WORK

There has been some recent work on bringing together ideas from MapReduce and database systems;however,this work focuses mainly on language and interface issues.The Pig project at Yahoo [22],the SCOPE project at Microsoft[6],and the open source Hive project[11]aim to integrate declarative query constructs from the database community into MapReduce-like software to allow greater data independence,code reusability,and automatic query optimiza-tion.Greenplum and Aster Data have added the ability to write MapReduce functions(instead of,or in addition to,SQL)over data stored in their parallel database products[16].

Although these?ve projects are without question an important step in the hybrid direction,there remains a need for a hybrid solu-tion at the systems level in addition to at the language and interface levels.This paper focuses on such a systems-level hybrid.

3.DESIRED PROPERTIES

In this section we describe the desired properties of a system de-signed for performing data analysis at the(soon to be more com-mon)petabyte scale.In the following section,we discuss how par-allel database systems and MapReduce-based systems do not meet some subset of these desired properties.

Performance.Performance is the primary characteristic that com-mercial database systems use to distinguish themselves from other solutions,with marketing literature often?lled with claims that a particular solution is many times faster than the competition.A factor of ten can make a big difference in the amount,quality,and depth of analysis a system can do.

High performance systems can also sometimes result in cost sav-ings.Upgrading to a faster software product can allow a corporation to delay a costly hardware upgrade,or avoid buying additional com-pute nodes as an application continues to scale.On public cloud computing platforms,pricing is structured in a way such that one pays only for what one uses,so the vendor price increases linearly with the requisite storage,network bandwidth,and compute power. Hence,if data analysis software product A requires an order of mag-nitude more compute units than data analysis software product B to perform the same task,then product A will cost(approximately) an order of magnitude more than B.Ef?cient software has a direct effect on the bottom line.

Fault Tolerance.Fault tolerance in the context of analytical data workloads is measured differently than fault tolerance in the con-text of transactional workloads.For transactional workloads,a fault tolerant DBMS can recover from a failure without losing any data or updates from recently committed transactions,and in the con-text of distributed databases,can successfully commit transactions and make progress on a workload even in the face of worker node failures.For read-only queries in analytical workloads,there are neither write transactions to commit,nor updates to lose upon node failure.Hence,a fault tolerant analytical DBMS is simply one that

does not have to restart a query if one of the nodes involved in query processing fails.

Given the proven operational bene?ts and resource consumption savings of using cheap,unreliable commodity hardware to build a shared-nothing cluster of machines,and the trend towards extremely low-end hardware in data centers[14],the probability of a node failure occurring during query processing is increasing rapidly.This problem only gets worse at scale:the larger the amount of data that needs to be accessed for analytical queries,the more nodes are required to participate in query processing.This further increases the probability of at least one node failing during query execution.Google,for example,reports an average of1.2 failures per analysis job[8].If a query must restart each time a node fails,then long,complex queries are dif?cult to complete. Ability to run in a heterogeneous environment.As described above,there is a strong trend towards increasing the number of nodes that participate in query execution.It is nearly impossible to get homogeneous performance across hundreds or thousands of compute nodes,even if each node runs on identical hardware or on an identical virtual machine.Part failures that do not cause com-plete node failure,but result in degraded hardware performance be-come more common at scale.Individual node disk fragmentation and software con?guration errors can also cause degraded perfor-mance on some nodes.Concurrent queries(or,in some cases,con-current processes)further reduce the homogeneity of cluster perfor-mance.On virtualized machines,concurrent activities performed by different virtual machines located on the same physical machine can cause2-4%variation in performance[5].

If the amount of work needed to execute a query is equally di-vided among the nodes in a shared-nothing cluster,then there is a danger that the time to complete the query will be approximately equal to time for the slowest compute node to complete its assigned task.A node with degraded performance would thus have a dis-proportionate effect on total query time.A system designed to run in a heterogeneous environment must take appropriate measures to prevent this from occurring.

Flexible query interface.There are a variety of customer-facing business intelligence tools that work with database software and aid in the visualization,query generation,result dash-boarding,and advanced data analysis.These tools are an important part of the analytical data management picture since business analysts are of-ten not technically advanced and do not feel comfortable interfac-ing with the database software directly.Business Intelligence tools typically connect to databases using ODBC or JDBC,so databases that want to work with these tools must accept SQL queries through these interfaces.

Ideally,the data analysis system should also have a robust mech-anism for allowing the user to write user de?ned functions(UDFs) and queries that utilize UDFs should automatically be parallelized across the processing nodes in the shared-nothing cluster.Thus, both SQL and non-SQL interface languages are desirable.

4.BACKGROUND AND SHORTFALLS OF

A V AILABLE APPROACHES

In this section,we give an overview of the parallel database and MapReduce approaches to performing data analysis,and list the properties described in Section3that each approach meets.

4.1Parallel DBMSs

Parallel database systems stem from research performed in the late1980s and most current systems are designed similarly to the early Gamma[10]and Grace[12]parallel DBMS research projects.These systems all support standard relational tables and SQL,and implement many of the performance enhancing techniques devel-oped by the research community over the past few decades,in-cluding indexing,compression(and direct operation on compressed data),materialized views,result caching,and I/O sharing.Most (or even all)tables are partitioned over multiple nodes in a shared-nothing cluster;however,the mechanism by which data is parti-tioned is transparent to the end-user.Parallel databases use an op-timizer tailored for distributed workloads that turn SQL commands into a query plan whose execution is divided equally among multi-ple nodes.

Of the desired properties of large scale data analysis workloads described in Section3,parallel databases best meet the“perfor-mance property”due to the performance push required to compete on the open market,and the ability to incorporate decades worth of performance tricks published in the database research commu-nity.Parallel databases can achieve especially high performance when administered by a highly skilled DBA who can carefully de-sign,deploy,tune,and maintain the system,but recent advances in automating these tasks and bundling the software into appliance (pre-tuned and pre-con?gured)offerings have given many parallel databases high performance out of the box.

Parallel databases also score well on the?exible query interface property.Implementation of SQL and ODBC is generally a given, and many parallel databases allow UDFs(although the ability for the query planner and optimizer to parallelize UDFs well over a shared-nothing cluster varies across different implementations). However,parallel databases generally do not score well on the fault tolerance and ability to operate in a heterogeneous environ-ment properties.Although particular details of parallel database implementations vary,their historical assumptions that failures are rare events and“large”clusters mean dozens of nodes(instead of hundreds or thousands)have resulted in engineering decisions that make it dif?cult to achieve these properties.

Furthermore,in some cases,there is a clear tradeoff between fault tolerance and performance,and parallel databases tend to choose the performance extreme of these tradeoffs.For example, frequent check-pointing of completed sub-tasks increase the fault tolerance of long-running read queries,yet this check-pointing reduces performance.In addition,pipelining intermediate results between query operators can improve performance,but can result in a large amount of work being lost upon a failure.

4.2MapReduce

MapReduce was introduced by Dean et.al.in2004[8]. Understanding the complete details of how MapReduce works is not a necessary prerequisite for understanding this paper.In short, MapReduce processes data distributed(and replicated)across many nodes in a shared-nothing cluster via three basic operations. First,a set of Map tasks are processed in parallel by each node in the cluster without communicating with other nodes.Next,data is repartitioned across all nodes of the cluster.Finally,a set of Reduce tasks are executed in parallel by each node on the partition it receives.This can be followed by an arbitrary number of additional Map-repartition-Reduce cycles as necessary.MapReduce does not create a detailed query execution plan that speci?es which nodes will run which tasks in advance;instead,this is determined at runtime.This allows MapReduce to adjust to node failures and slow nodes on the?y by assigning more tasks to faster nodes and reassigning tasks from failed nodes.MapReduce also checkpoints the output of each Map task to local disk in order to minimize the amount of work that has to be redone upon a failure.

Of the desired properties of large scale data analysis workloads,

MapReduce best meets the fault tolerance and ability to operate in heterogeneous environment properties.It achieves fault tolerance by detecting and reassigning Map tasks of failed nodes to other nodes in the cluster(preferably nodes with replicas of the input Map data).It achieves the ability to operate in a heterogeneous environ-ment via redundant task execution.Tasks that are taking a long time to complete on slow nodes get redundantly executed on other nodes that have completed their assigned tasks.The time to complete the task becomes equal to the time for the fastest node to complete the redundantly executed task.By breaking tasks into small,granular tasks,the effect of faults and“straggler”nodes can be minimized. MapReduce has a?exible query interface;Map and Reduce func-tions are just arbitrary computations written in a general-purpose language.Therefore,it is possible for each task to do anything on its input,just as long as its output follows the conventions de?ned by the model.In general,most MapReduce-based systems(such as Hadoop,which directly implements the systems-level details of the MapReduce paper)do not accept declarative SQL.However,there are some exceptions(such as Hive).

As shown in previous work,the biggest issue with MapReduce is performance[23].By not requiring the user to?rst model and load data before processing,many of the performance enhancing tools listed above that are used by database systems are not possible. Traditional business data analytical processing,that have standard reports and many repeated queries,is particularly,poorly suited for the one-time query processing model of MapReduce.

Ideally,the fault tolerance and ability to operate in heterogeneous environment properties of MapReduce could be combined with the performance of parallel databases systems.In the following sec-tions,we will describe our attempt to build such a hybrid system.

5.HADOOPDB

In this section,we describe the design of HadoopDB.The goal of this design is to achieve all of the properties described in Section3. The basic idea behind HadoopDB is to connect multiple single-node database systems using Hadoop as the task coordinator and network communication layer.Queries are parallelized across nodes using the MapReduce framework;however,as much of the single node query work as possible is pushed inside of the corresponding node databases.HadoopDB achieves fault tolerance and the ability to operate in heterogeneous environments by inheriting the scheduling and job tracking implementation from Hadoop,yet it achieves the performance of parallel databases by doing much of the query processing inside of the database engine.

5.1Hadoop Implementation Background

At the heart of HadoopDB is the Hadoop framework.Hadoop consits of two layers:(i)a data storage layer or the Hadoop Dis-tributed File System(HDFS)and(ii)a data processing layer or the MapReduce Framework.

HDFS is a block-structured?le system managed by a central NameNode.Individual?les are broken into blocks of a?xed size and distributed across multiple DataNodes in the cluster.The NameNode maintains metadata about the size and location of blocks and their replicas.

The MapReduce Framework follows a simple master-slave ar-chitecture.The master is a single JobTracker and the slaves or worker nodes are TaskTrackers.The JobTracker handles the run-time scheduling of MapReduce jobs and maintains information on each TaskTracker’s load and available resources.Each job is bro-ken down into Map tasks based on the number of data blocks that require processing,and Reduce tasks.The JobTracker assigns tasks to TaskTrackers based on locality and load balancing.It achieves

Figure1:The Architecture of HadoopDB

locality by matching a TaskTracker to Map tasks that process data local to it.It load-balances by ensuring all available TaskTrackers are assigned tasks.TaskTrackers regularly update the JobTracker with their status through heartbeat messages.

The InputFormat library represents the interface between the storage and processing layers.InputFormat implementations parse text/binary?les(or connect to arbitrary data sources)and transform the data into key-value pairs that Map tasks can process.Hadoop provides several InputFormat implementations including one that allows a single JDBC-compliant database to be accessed by all tasks in one job in a given cluster.

5.2HadoopDB’s Components

HadoopDB extends the Hadoop framework(see Fig.1)by pro-viding the following four components:

5.2.1Database Connector

The Database Connector is the interface between independent database systems residing on nodes in the cluster and TaskTrack-ers.It extends Hadoop’s InputFormat class and is part of the Input-Format Implementations library.Each MapReduce job supplies the Connector with an SQL query and connection parameters such as: which JDBC driver to use,query fetch size and other query tuning parameters.The Connector connects to the database,executes the SQL query and returns results as key-value pairs.The Connector could theoretically connect to any JDBC-compliant database that resides in the cluster.However,different databases require different read query optimizations.We implemented connectors for MySQL and PostgreSQL.In the future we plan to integrate other databases including open-source column-store databases such as MonetDB and InfoBright.By extending Hadoop’s InputFormat,we integrate seamlessly with Hadoop’s MapReduce Framework.To the frame-work,the databases are data sources similar to data blocks in HDFS.

5.2.2Catalog

The catalog maintains metainformation about the databases.This includes the following:(i)connection parameters such as database location,driver class and credentials,(ii)metadata such as data sets contained in the cluster,replica locations,and data partition-ing properties.

The current implementation of the HadoopDB catalog stores its metainformation as an XML?le in HDFS.This?le is accessed by the JobTracker and TaskTrackers to retrieve information necessary

to schedule tasks and process data needed by a query.In the future, we plan to deploy the catalog as a separate service that would work in a way similar to Hadoop’s NameNode.

5.2.3Data Loader

The Data Loader is responsible for(i)globally repartitioning data on a given partition key upon loading,(ii)breaking apart single node data into multiple smaller partitions or chunks and(iii)?nally bulk-loading the single-node databases with the chunks.

The Data Loader consists of two main components:Global Hasher and Local Hasher.The Global Hasher executes a custom-made MapReduce job over Hadoop that reads in raw data?les stored in HDFS and repartitions them into as many parts as the number of nodes in the cluster.The repartitioning job does not incur the sorting overhead of typical MapReduce jobs.

The Local Hasher then copies a partition from HDFS into the local?le system of each node and secondarily partitions the?le into smaller sized chunks based on the maximum chunk size setting. The hashing functions used by both the Global Hasher and the Local Hasher differ to ensure chunks are of a uniform size.They also differ from Hadoop’s default hash-partitioning function to en-sure better load balancing when executing MapReduce jobs over the data.

5.2.4SQL to MapReduce to SQL(SMS)Planner HadoopDB provides a parallel database front-end to data analysts enabling them to process SQL queries.

The SMS planner extends Hive[11].Hive transforms HiveQL,a variant of SQL,into MapReduce jobs that connect to tables stored as?les in HDFS.The MapReduce jobs consist of DAGs of rela-tional operators(such as?lter,select(project),join,aggregation) that operate as iterators:each operator forwards a data tuple to the next operator after processing it.Since each table is stored as a separate?le in HDFS,Hive assumes no collocation of tables on nodes.Therefore,operations that involve multiple tables usually require most of the processing to occur in the Reduce phase of a MapReduce job.This assumption does not completely hold in HadoopDB as some tables are collocated and if partitioned on the same attribute,the join operation can be pushed entirely into the database layer.

To understand how we extended Hive for SMS as well as the dif-ferences between Hive and SMS,we?rst describe how Hive creates an executable MapReduce job for a simple GroupBy-Aggregation query.Then,we describe how we modify the execution plan for HadoopDB by pushing most of the query processing logic into the database layer.

Consider the following query:

SELECT YEAR(saleDate),SUM(revenue)

FROM sales GROUP BY YEAR(saleDate);

Hive processes the above SQL query in a series of phases:

(1)The parser transforms the query into an Abstract Syntax Tree.

(2)The Semantic Analyzer connects to Hive’s internal catalog, the MetaStore,to retrieve the schema of the sales table.It also populates different data structures with meta information such as the Deserializer and InputFormat classes required to scan the table and extract the necessary?elds.

(3)The logical plan generator then creates a DAG of relational operators,the query plan.

(4)The optimizer restructures the query plan to create a more optimized plan.For example,it pushes?lter operators closer to the table scan operators.A key function of the optimizer is to break up the plan into Map or Reduce phases.In particular,it adds a Repar-tition operator,also known as a Reduce Sink operator,before Join or GroupBy operators.These operators mark the Map and Reduce

Figure2:(a)MapReduce job generated by Hive(b)

MapReduce job generated by SMS assuming sales is par-

titioned by YEAR(saleDate).This feature is still unsup-

ported(c)MapReduce job generated by SMS assuming

no partitioning of sales

phases of a query plan.The Hive optimizer is a simple,na¨?ve,rule-based optimizer.It does not use cost-based optimization techniques. Therefore,it does not always generate ef?cient query plans.This is another advantage of pushing as much as possible of the query pro-cessing logic into DBMSs that have more sophisticated,adaptive or cost-based optimizers.

(5)Finally,the physical plan generator converts the logical query plan into a physical plan executable by one or more MapReduce jobs.The?rst and every other Reduce Sink operator marks a tran-sition from a Map phase to a Reduce phase of a MapReduce job and the remaining Reduce Sink operators mark the start of new MapRe-duce jobs.The above SQL query results in a single MapReduce job with the physical query plan illustrated in Fig.2(a).The boxes stand for the operators and the arrows represent the?ow of data.

(6)Each DAG enclosed within a MapReduce job is serialized into an XML plan.The Hive driver then executes a Hadoop job. The job reads the XML plan and creates all the necessary operator objects that scan data from a table in HDFS,and parse and process one tuple at a time.

The SMS planner modi?es Hive.In particular we intercept the normal Hive?ow in two main areas:

(i)Before any query execution,we update the MetaStore with references to our database tables.Hive allows tables to exist exter-nally,outside HDFS.The HadoopDB catalog,Section5.2.2,pro-vides information about the table schemas and required Deserial-izer and InputFormat classes to the MetaStore.We implemented these specialized classes.

(ii)After the physical query plan generation and before the ex-ecution of the MapReduce jobs,we perform two passes over the physical plan.In the?rst pass,we retrieve data?elds that are actu-ally processed by the plan and we determine the partitioning keys used by the Reduce Sink(Repartition)operators.In the second pass,we traverse the DAG bottom-up from table scan operators to the output or File Sink operator.All operators until the?rst repar-tition operator with a partitioning key different from the database’s key are converted into one or more SQL queries and pushed into the database layer.SMS uses a rule-based SQL generator to recre-ate SQL from the relational operators.The query processing logic that could be pushed into the database layer ranges from none(each

table is scanned independently and tuples are pushed one at a time into the DAG of operators)to all(only a Map task is required to output the results into an HDFS?le).

Given the above GroupBy query,SMS produces one of two dif-ferent plans.If the sales table is partitioned by YEAR(saleDate), it produces the query plan in Fig.2(b):this plan pushes the entire query processing logic into the database layer.Only a Map task is required to output results into an HDFS?le.Otherwise,SMS produces the query plan in Fig.2(c)in which the database layer partially aggregates data and eliminates the selection and group-by operator used in the Map phase of the Hive generated query plan (Fig.2(a)).The?nal aggregation step in the Reduce phase of the MapReduce job,however,is still required in order to merge partial results from each node.

For join queries,Hive assumes that tables are not collocated. Therefore,the Hive generated plan scans each table independently and computes the join after repartitioning data by the join key.In contrast,if the join key matches the database partitioning key,SMS pushes the entire join sub-tree into the database layer.

So far,we only support?lter,select(project)and aggregation operators.Currently,the partitioning features supported by Hive are extremely na¨?ve and do not support expression-based partition-ing.Therefore,we cannot detect if the sales table is partitioned by YEAR(saleDate)or not,therefore we have to make the pes-simistic assumption that the data is not partitioned by this attribute. The Hive build[15]we extended is a little buggy;as explained in Section6.2.5,it fails to execute the join task used in our bench-mark,even when running over HDFS tables3.However,we use the SMS planner to automatically push SQL queries into HadoopDB’s DBMS layer for all other benchmark queries presented in our ex-periments for this paper.

5.3Summary

HadoopDB does not replace Hadoop.Both systems coexist en-abling the analyst to choose the appropriate tools for a given dataset and task.Through the performance benchmarks in the following sections,we show that using an ef?cient database storage layer cuts down on data processing time especially on tasks that require com-plex query processing over structured data such as joins.We also show that HadoopDB is able to take advantage of the fault-tolerance and the ability to run on heterogeneous environments that comes naturally with Hadoop-style systems.

6.BENCHMARKS

In this section we evaluate HadoopDB,comparing it with a MapReduce implementation and two parallel database imple-mentations,using a benchmark?rst presented in[23]4.This benchmark consists of?ve tasks.The?rst task is taken directly from the original MapReduce paper[8]whose authors claim is representative of common MR tasks.The next four tasks are analytical queries designed to be representative of traditional structured data analysis workloads that HadoopDB targets.

We ran our experiments on Amazon EC2“large”instances(zone: us-east-1b).Each instance has7.5GB memory,4EC2Compute Units(2virtual cores),850GB instance storage(2x420GB plus 10GB root partition)and runs64-bit platform Linux Fedora8OS. 3The Hive team resolved these issues in June after we completed the experiments.We plan to integrate the latest Hive with the SMS

Planner.

4We are aware of the writing law that references shouldn’t be used as nouns.However,to save space,we use[23]not as a reference, but as a shorthand for“the SIGMOD2009paper by Pavlo et.al.”

We observed that disk I/O performance on EC2nodes were ini-tially quite slow(25MB/s).Consequently,we initialized some ad-ditional space on each node so that intermediate?les and output of the tasks did not suffer from this initial write slow-down.Once disk space is initialized,subsequent writes are much faster(86MB/s). Network speed is approximately100-110MB/s.We execute each task three times and report the average of the trials.The?nal results from all parallel databases queries are piped from the shell com-mand into a?le.Hadoop and HadoopDB store results in Hadoop’s distributed?le system(HDFS).In this section,we only report re-sults using trials where all nodes are available,operating correctly, and have no concurrent tasks during benchmark execution(we drop these requirements in Section7).For each task,we benchmark per-formance on cluster sizes of10,50,and100nodes.

6.1Benchmarked Systems

Our experiments compare performance of Hadoop,HadoopDB (with PostgreSQL5as the underlying database)and two commercial parallel DBMSs.

6.1.1Hadoop

Hadoop is an open-source version of the MapReduce framework, implemented by directly following the ideas described in the orig-inal MapReduce paper,and is used today by dozens of businesses to perform data analysis[1].For our experiments in this paper,we use Hadoop version0.19.1running on Java1.6.0.We deployed the system with several changes to the default con?guration settings. Data in HDFS is stored using256MB data blocks instead of the de-fault64MB.Each MR executor ran with a maximum heap size of 1024MB.We allowed two Map instances and a single Reduce in-stance to execute concurrently on each node.We also allowed more buffer space for?le read/write operations(132MB)and increased the sort buffer to200MB with100concurrent streams for merging. Additionally,we modi?ed the number of parallel transfers run by Reduce during the shuf?e phase and the number of worker threads for each TaskTracker’s http server to be50.These adjustments follow the guidelines on high-performance Hadoop clusters[13]. Moreover,we enabled task JVMs to be reused.

For each benchmark trial,we stored all input and output data in HDFS with no replication(we add replication in Section7).Af-ter benchmarking a particular cluster size,we deleted the data di-rectories on each node,reformatted and reloaded HDFS to ensure uniform data distribution across all nodes.

We present results of both hand-coded Hadoop and Hive-coded Hadoop(i.e.Hadoop plans generated automatically via Hive’s SQL interface).These separate results for Hadoop are displayed as split bars in the graphs.The bottom,colored segment of the bars repre-sent the time taken by Hadoop when hand-coded and the rest of the bar indicates the additional overhead as a result of the automatic plan-generation by Hive,and operator function-call and dynamic data type resolution through Java’s Re?ection API for each tuple processed in Hive-coded jobs.

6.1.2HadoopDB

The Hadoop part of HadoopDB was con?gured identically to the description above except for the number of concurrent Map tasks, which we set to one.Additionally,on each worker node,Post-greSQL version8.2.5was installed.We increased memory used by the PostgreSQL shared buffers to512MB and the working memory 5Initially,we experimented with MySQL(MyISAM storage layer). However,we found that while simple table scans are up to30% faster,more complicated SQL queries are much slower due to the

lack of clustered indices and poor join algorithms.

size to1GB.We did not compress data in PostgreSQL. Analogous to what we did for Hadoop,we present results of both hand-coded HadoopDB and SMS-coded HadoopDB(i.e.en-tire query plans created by HadoopDB’s SMS planner).These sep-arate results for HadoopDB are displayed as split bars in the graphs. The bottom,colored segment of the bars represents the time taken by HadoopDB when hand-coded and the rest of the bar indicates the additional overhead as a result of the SMS planner(e.g.,SMS jobs need to serialize tuples retrieved from the underlying database and deserialize them before further processing in Hadoop).

6.1.3Vertica

Vertica is a relatively new parallel database system(founded in 2005)[3]based on the C-Store research project[24].Vertica is a column-store,which means that each attribute of each table is stored(and accessed)separately,a technique that has proven to im-prove performance for read-mostly workloads.

Vertica offers a“cloud”edition,which we used for the experi-ments in this paper.Vertica was also used in the performance study of previous work[23]on the same benchmark,so we con?gured Vertica identically to the previous experiments6.The Vertica con-?guration is therefore as follows:All data is compressed.Vertica operates on compressed data directly.Vertica implements primary indexes by sorting the table by the indexed attribute.None of Ver-tica’s default con?guration parameters were changed.

6.1.4DBMS-X

DBMS-X is the same commercial parallel row-oriented database as was used for the benchmark in[23].Since at the time of our VLDB submission this DBMS did not offer a cloud edition,we did not run experiments for it on EC2.However,since our Vertica numbers were consistently10-15%slower on EC2than on the Wis-consin cluster presented in[23]7(this result is expected since the virtualization layer is known to introduce a performance overhead), we reproduce the DBMS-X numbers from[23]on our?gures as a best case performance estimate for DBMS-X if it were to be run on EC2.

6.2Performance and Scalability Benchmarks The?rst benchmark task(the“Grep task”)requires each sys-tem to scan through a data set of100-byte records looking for a three character pattern.This is the only task that requires process-ing largely unstructured data,and was originally included in the benchmark by the authors of[23]since the same task was included in the original MapReduce paper[8].

To explore more complex uses of the benchmarked systems,the benchmark includes four more analytical tasks related to log-?le analysis and HTML document processing.Three of these tasks op-erate on structured data;the?nal task operates on both structured and unstructured data.

The datasets used by these four tasks include a UserVisits table meant to model log?les of HTTP server traf?c,a Documents table containing600,000randomly generated HTML documents,and a Rankings table that contains some metadata calculated over the data in the Documents table.The schema of the tables in the benchmark data set is described in detail in[23].In summary,the UserVisits table contains9attributes,the largest of which is destinationURL which is of type V ARCHAR(100).Each tuple is on the order of150 bytes wide.The Documents table contains two attributes:a URL 6In fact,we asked the same person who ran the queries for this previous work to run the same queries on EC2for our paper

7We used a later version of Vertica in these experiments than[23]. On using the identical version,slowdown was10-15%on EC2.(V ARCHAR(100))and contents(arbitrary text).Finally,the Rank-ings table contains three attributes:pageURL(V ARCHAR(100)), pageRank(INT),and avgDuration(INT).

The data generator yields155million UserVisits records(20GB) and18million Rankings records(1GB)per node.Since the data generator does not ensure that Rankings and UserVisits tuples with the same value for the URL attribute are stored on the same node,a repartitioning is done during the data load,as described later. Records for both the UserVisits and Rankings data sets are stored in HDFS as plain text,one record per line with?elds separated by a delimiting character.In order to access the different attributes at run time,the Map and Reduce functions split the record by the delimiter into an array of strings.

6.2.1Data Loading

We report load times for two data sets,Grep and UserVisits in Fig.3and Fig.4.While grep data is randomly generated and re-quires no preprocessing,UserVisits needs to be repartitioned by destinationURL and indexed by visitDate for all databases during the load in order to achieve better performance on analytical queries (Hadoop would not bene?t from such repartitioning).We describe, brie?y,the loading procedures for all systems:

Hadoop:We loaded each node with an unaltered UserVisits data ?le.HDFS automatically breaks the?le into256MB blocks and stores the blocks on a local DataNode.Since all nodes load their data in parallel,we report the maximum node load time from each cluster.Load time is greatly affected by stragglers.This effect is especially visible when loading UserVisits,where a single slow node in the100-node cluster pushed the overall load time to4355 seconds and to2600seconds on the10-node cluster,despite the average load time of only1100seconds per node.

HadoopDB:We set the maximum chunk size to1GB.Each chunk is located in a separate PostgreSQL database within a node,and processes SQL queries independently of other chunks.We report the maximum node load time as the entire load time for both Grep and UserVisits.

Since the Grep dataset does not require any preprocessing and is only535MB of data per node,the entire data was loaded using the standard SQL COPY command into a single chunk on each node. The Global Hasher partitions the entire UserVisits dataset across all nodes in the cluster.Next,the Local Hasher on each node re-trieves a20GB partition from HDFS and hash-partitions it into20 smaller chunks,1GB each.Each chunk is then bulk-loaded using COPY.Finally,a clustered index on visitDate is created for each chunk.

The load time for UserVisits is broken down into several phases. The?rst repartition carried out by Global Hasher is the most ex-pensive step in the process.It takes nearly half the total load time, 14,000s.Of the remaining16,000s,locally partitioning the data into20chunks takes2500s(15.6%),the bulk copy into tables takes 5200s(32.5%),creating clustered indices,which includes sort-ing,takes7100s(44.4%),?nally vacuuming the databases takes 1200s(7.5%).All the steps after global repartitioning are executed in parallel on all nodes.We observed individual variance in load times.Some nodes required as little as10,000s to completely load UserVisits after global repartitioning was completed.

Vertica:The loading procedure for Vertica is analogous to the one described in[23].The loading time improved since then because a newer version of Vertica(3.0)was used for these experiments.The key difference is that now bulk load COPY command runs on all nodes in the cluster completely in parallel.

DBMS-X:We report the total load time including data compression and indexing from[23].

s e c o n d s

Vertica

DB-X

HadoopDB

Hadoop

Figure 3:Load Grep

(0.5GB/node)

s e c o n d s

Vertica

DB-X

HadoopDB

Hadoop

Figure 4:Load UserVisits

(20GB/node)s e c o n d s

Vertica

DB-X

HadoopDB

Hadoop

Figure 5:Grep Task

In contrast to DBMS-X,the parallel load features of Hadoop,HadoopDB and Vertica ensure all systems scale as the number of nodes increases.Since the speed of loading is limited by the slow-est disk-write speed in the cluster,loading is the only process that cannot bene?t from Hadoop’s and HadoopDB’s inherent tolerance of heterogeneous environments (see section 7)8.

6.2.2Grep Task

Each record consists of a unique key in the ?rst 10bytes,fol-lowed by a 90-byte character string.The pattern “XYZ”is searched for in the 90byte ?eld,and is found once in every 10,000records.Each node contains 5.6million such 100-byte records,or roughly 535MB of data.The total number of records processed for each cluster size is 5.6million times the number of nodes.

Vertica,DBMS-X,HadoopDB,and Hadoop(Hive)all executed the identical SQL:

SELECT *FROM Data WHERE field LIKE ‘%XYZ%’;

None of the benchmarked systems contained an index on the ?eld attribute.Hence,for all systems,this query requires a full table scan and is mostly limited by disk speed.

Hadoop (hand-coded)was executed identically to [23](a sim-ple Map function that performs a sub-string match on “XYZ”).No Reduce function is needed for this task,so the output of the Map function is written directly to HDFS.

HadoopDB’s SMS planner pushes the WHERE clause into the PostgreSQL instances.

Fig.5displays the results (note,the split bars were explained in Section 6.1).HadoopDB slightly outperforms Hadoop as it handles I/O more ef?ciently than Hadoop due to the lack of runtime parsing of data.However,both systems are outperformed by the parallel databases systems.This difference is due to the fact that both Ver-tica and DBMS-X compress their data,which signi?cantly reduces I/O cost ([23]note that compression speeds up DBMS-X by about 50%on all experiments).

6.2.3Selection Task

The ?rst structured data task evaluates a simple selection predi-cate on the pageRank attribute from the Rankings table.There are approximately 36,000tuples on each node that pass this predicate.Vertica,DBMS-X,HadoopDB,and Hadoop(Hive)all executed the identical SQL:

SELECT pageURL,pageRank FROM Rankings WHERE pageRank >10;

Hadoop (hand-coded)was executed identically to [23]:a Map function parses Rankings tuples using the ?eld delimiter,applies the predicate on pageRank,and outputs the tuple’s pageURL and 8

EC2disks are slow on initial writes.Since performance bench-marks are not write-limited,they are not affected by disk-write speeds.Also,we initialized disks before experiments (see Sec-tion 6).

pageRank as a new key/value pair if the predicate succeeds.This task does not require a Reduce function.

HadoopDB’s SMS planner pushes the selection and projection clauses into the PostgreSQL instances.

The performance of each system is presented in Fig.6.Hadoop (with and without Hive)performs a brute-force,complete scan of all data in a ?le.The other systems,however,bene?t from us-ing clustered indices on the pageRank column.Hence,in general HadoopDB and the parallel DBMSs are able to outperform Hadoop.Since data is partitioned by UserVisits destinationURL,the for-eign key relationship between Rankings pageURL and UserVisits destinationURL causes the Global and Local Hasher to repartition Rankings by pageURL.Each Rankings chunk is only 50MB (col-located with the corresponding 1GB UserVisits chunk).The over-head of scheduling twenty Map tasks to process only 1GB of data per node signi?cantly decreases HadoopDB’s performance.

We,therefore,maintain an additional,non-chunked copy of the Rankings table containing the entire 1GB.HadoopDB on this data set outperforms Hadoop because the use of a clustered index on pageRank eliminates the need to sequentially scan the entire data set.HadoopDB scales better relative to DBMS-X and Vertica mainly due to increased network costs of these systems which dominate when query time is otherwise very low.

6.2.4Aggregation Task

The next task involves computing the total adRevenue generated from each sourceIP in the UserVisits table,grouped by either the seven-character pre?x of the sourceIP column or the entire sourceIP column.Unlike the previous tasks,this task requires intermediate results to be exchanged between different nodes in the cluster (so that the ?nal aggregate can be calculated).When grouping on the seven-character pre?x,there are 2000unique groups.When group-ing on the entire sourceIP,there are 2,500,000unique groups.

Vertica,DBMS-X,HadoopDB,and Hadoop(Hive)all executed the identical SQL:

Smaller query:

SELECT SUBSTR(sourceIP,1,7),SUM(adRevenue)FROM UserVisits GROUP BY SUBSTR(sourceIP,1,7);Larger query:

SELECT sourceIP,SUM(adRevenue)FROM UserVisits GROUP BY sourceIP;

Hadoop (hand-coded)was executed identically to [23]:a Map function outputs the adRevenue and the ?rst seven characters of the sourceIP ?eld (or the whole ?eld in the larger query)which gets sent to a Reduce function which performs the sum aggregation for each pre?x (or sourceIP).

The SMS planner for HadoopDB pushes the entire SQL query into the PostgreSQL instances.The output is then sent to Reduce jobs inside of Hadoop that perform the ?nal aggregation (after col-lecting all pre-aggregated sums from each PostgreSQL instance).

s e c o n d s

Figure 6:Selection

Task s e c o n d s

Vertica

DB-X

HadoopDB

Hadoop

Figure 7:Large Aggregation

Task s e c o n d s

Vertica

DB-X

HadoopDB

Hadoop

Figure 8:Small Aggregation Task

The performance numbers for each benchmarked system is dis-played in Fig.7and 8.Similar to the Grep task,this query is limited by reading data off disk.Thus,both commercial systems bene?t from compression and outperform HadoopDB and Hadoop.We observe a reversal of the general rule that Hive adds an over-head cost to hand-coded Hadoop in the “small”(substring)aggrega-tion task (the time taken by Hive is represented by the lower part of the Hadoop bar in Fig.8).Hive performs much better than Hadoop because it uses a hash aggregation execution strategy (it maintains an internal hash-aggregate map in the Map phase of the job),which proves to be optimal when there is a small number of groups.In the large aggregation task,Hive switches to sort-based aggregation upon detecting that the number of groups is more than half the num-ber of input rows per block.In contrast,in our hand-coded Hadoop plan we (and the authors of [23])failed to take advantage of hash aggregation for the smaller query because sort-based aggregation (using Combiners)is a MapReduce standard practice.

These results illustrate the bene?t of exploiting optimizers present in database systems and relational query systems like Hive,which can use statistics from the system catalog or simple optimization rules to choose between hash aggregation and sort aggregation.

Unlike Hadoop’s Combiner,Hive serializes partial aggregates into strings instead of maintaining them in their natural binary rep-resentation.Hence,Hive performs much worse than Hadoop on the larger query.

PostgreSQL chooses to use hash aggregation for both tasks as it can easily ?t the entire hash aggregate table for each 1GB chunk in memory.Hence,HadoopDB outperforms Hadoop on both tasks due to its ef?cient aggregation implementation.

This query is well-suited for systems that use column-oriented storage,since the two attributes accessed in this query (sourceIP and adRevenue)consist of only 20out of the more than 200bytes in each UserVisits tuple.Vertica is thus able to signi?cantly outper-form the other systems due to the commensurate I/O savings.

6.2.5Join Task

The join task involves ?nding the average pageRank of the set of pages visited from the sourceIP that generated the most revenue during the week of January 15-22,2000.The key difference be-tween this task and the previous tasks is that it must read in two different data sets and join them together (pageRank information is found in the Rankings table and revenue information is found in the UserVisits table).There are approximately 134,000records in the UserVisits table that have a visitDate value inside the requisite date range.

Unlike the previous three tasks,we were unable to use the same SQL for the parallel databases and for Hadoop-based systems.This is because the Hive build we extended was unable to execute this

query.Although this build accepts a SQL query that joins,?lters and aggregates tuples from two tables,such a query fails during execution.Additionally,we noticed that the query plan for joins of this type uses a highly inef?cient execution strategy.In particular,the ?ltering operation is planned after joining the tables.Hence,we are only able to present hand-coded results for HadoopDB and Hadoop for this query.

In HadoopDB,we push the selection,join,and partial aggrega-tion into the PostgreSQL instances with the following SQL:

SELECT sourceIP,COUNT(pageRank),SUM(pageRank),SUM(adRevenue)FROM Rankings AS R,UserVisits AS UV WHERE R.pageURL =UV.destURL AND

UV.visitDate BETWEEN ‘2000-01-15’AND ‘2000-01-22’GROUP BY UV.sourceIP;

We then use a single Reduce task in Hadoop that gathers all of the partial aggregates from each PostgreSQL instance to perform the ?nal aggregation.

The parallel databases execute the SQL query speci?ed in [23].Although Hadoop has support for a join operator,this operator requires that both input datasets be sorted on the join key.Such a requirement limits the utility of the join operator since in many cases,including the query above,the data is not already sorted and performing a sort before the join adds signi?cant overhead.We found that even if we sorted the input data (and did not include the sort time in total query time),query performance using the Hadoop join was lower than query performance using the three phase MR program used in [23]that used standard ‘Map’and ‘Reduce’oper-ators.Hence,for the numbers we report below,we use an identical MR program as was used (and described in detail)in [23].

Fig.9summarizes the results of this benchmark task.For Hadoop,we observed similar results as found in [23]:its perfor-mance is limited by completely scanning the UserVisits dataset on each node in order to evaluate the selection predicate.

HadoopDB,DBMS-X,and Vertica all achieve higher perfor-mance by using an index to accelerate the selection predicate and having native support for joins.These systems see slight performance degradation with a larger number of nodes due to the ?nal single node aggregation of and sorting by adRevenue.

6.2.6UDF Aggregation Task

The ?nal task computes,for each document,the number of in-ward links from other documents in the Documents table.URL links that appear in every document are extracted and aggregated.HTML documents are concatenated into large ?les for Hadoop (256MB each)and Vertica (56MB each)at load time.HadoopDB was able to store each document separately in the Documents ta-ble using the TEXT data type.DBMS-X processed each HTML document ?le separately,as described below.

The parallel databases should theoretically be able to use a user-de?ned function,F,to parse the contents of each document and

s e c o n d s

Figure 9:Join

Task

s e c o n d s

Figure 10:UDF Aggregation task

emit a list of all URLs found in the document.A temporary table would then be populated with this list of URLs and then a simple count/group-by query would be executed that ?nds the number of instances of each unique URL.

Unfortunately,[23]found that in practice,it was dif?cult to im-plement such a UDF inside the parallel databases.In DBMS-X,it was impossible to store each document as a character BLOB in-side the DBMS and have the UDF operate on it directly,due to “a known bug in [the]version of the system”.Hence,the UDF was implemented inside the DBMS,but the data was stored in separate HTML documents on the raw ?le system and the UDF made exter-nal calls accordingly.

Vertica does not currently support UDFs,so a simple document parser had to be written in Java externally to the DBMS.This parser is executed on each node in parallel,parsing the concatenated doc-uments ?le and writing the found URLs into a ?le on the local disk.This ?le is then loaded into a temporary table using Vertica’s bulk-loading tools and a second query is executed that counts,for each URL,the number of inward links.

In Hadoop,we employed standard TextInputFormat and parsed each document inside a Map task,outputting a list of URLs found in each document.Both a Combine and a Reduce function sum the number of instances of each unique URL.

In HadoopDB,since text processing is more easily expressed in MapReduce,we decided to take advantage of HadoopDB’s ability to accept queries in either SQL or MapReduce and we used the lat-ter option in this case.The complete contents of the Documents table on each PostgreSQL node is passed into Hadoop with the fol-lowing SQL:

SELECT url,contents FROM Documents;

Next,we process the data using a MR job.In fact,we used iden-tical MR code for both Hadoop and HadoopDB.

Fig.10illustrates the power of using a hybrid system like HadoopDB.The database layer provides an ef?cient storage layer for HTML text documents and the MapReduce framework provides arbitrary processing expression power.

Hadoop outperforms HadoopDB as it processes merged ?les of multiple HTML documents.HadoopDB,however,does not lose the original structure of the data by merging many small ?les into larger ones.Note that the total merge time was about 6000seconds per node.This overhead is not included in Fig.10.

DBMS-X and Vertica perform worse than Hadoop-based systems since the input ?les are stored outside of the database.Moreover,for this task both commercial databases do not scale linearly with the size of the cluster.

6.3Summary of Results Thus Far

In the absence of failures or background processes,HadoopDB is able to approach the performance of the parallel database sys-tems.The reason the performance is not equal is due to the fol-lowing facts:(1)PostgreSQL is not a column-store (2)DBMS-X results are overly optimistic by approximately a factor of 15%,(3)we did not use data compression in PostgreSQL,and (4)there is some overhead in the interaction between Hadoop and PostgreSQL which gets proportionally larger as the number of chunks increases.We believe some of this overhead can be removed with the increase of engineering time.

HadoopDB consistently outperforms Hadoop (except for the UDF aggregation task since we did not count the data merging time against Hadoop).

While HadoopDB’s load time is about 10times longer than Hadoop’s,this cost is amortized across the higher performance of all queries that process this data.For certain tasks,such as the Join task,the factor of 10load cost is immediately translated into a factor of 10performance bene?t.

FAULT TOLERANCE AND HETEROGE-NEOUS ENVIRONMENT

As described in Section 3,in large deployments of shared-nothing machines,individual nodes may experience high rates of failure or slowdown.While running our experiments for this research paper on EC2,we frequently experienced both node failure and node slowdown (e.g.,some noti?cations we received:“4:12PM PDT:We are investigating a localized issue in a single US-EAST Availability Zone.As a result,a small number of instances are unreachable.We are working to restore the instances.”,and “Starting at 11:30PM PDT today,we will be performing maintenance on parts of the Amazon EC2network.This maintenance has been planned to minimize the probability of impact to Amazon EC2instances,but it is possible that some customers may experience a short period of elevated packet loss as the change takes effect.”)

For parallel databases,query processing time is usually deter-mined by the the time it takes for the slowest node to complete its task.In contrast,in MapReduce,each task can be scheduled on any node as long as input data is transferred to or already exists on a free node.Also,Hadoop speculatively redundantly executes tasks that are being performed on a straggler node to reduce the slow node’s effect on query time.

Hadoop achieves fault tolerance by restarting tasks of failed

p e r c e n t a g e s l o w d o w n

Figure 11:Fault tolerance and heterogeneity experi-ments on 10nodes

nodes on other nodes.The JobTracker receives heartbeats from TaskTrackers.If a TaskTracker fails to communicate with the JobTracker for a preset period of time,TaskTracker expiry interval ,the JobTracker assumes failure and schedules all map/reduce tasks of the failed node on other TaskTrackers.This approach is different from most parallel databases which abort un?nished queries upon a node failure and restart the entire query processing (using a replica node instead of the failed node).

By inheriting the scheduling and job tracking features of Hadoop,HadoopDB yields similar fault-tolerance and straggler handling properties as Hadoop.

To test the effectiveness of HadoopDB in failure-prone and het-erogeneous environments in comparison to Hadoop and Vertica,we executed the aggregation query with 2000groups (see Section 6.2.4)on a 10-node cluster and set the replication factor to two for all systems.For Hadoop and HadoopDB we set the TaskTracker expiry interval to 60seconds.The following lists system-speci?c settings for the experiments.

Hadoop (Hive):HDFS managed the replication of data.HDFS replicated each block of data on a different node selected uniformly at random.

HadoopDB (SMS):As described in Section 6,each node con-tains twenty 1GB-chunks of the UserVisits table.Each of these 20chunks was replicated on a different node selected at random.Vertica:In Vertica,replication is achieved by keeping an extra copy of every table segment.Each table is hash partitioned across the nodes and a backup copy is assigned to another node based on a replication rule.On node failure,this backup copy is used until the lost segment is rebuilt.

For fault-tolerance tests,we terminated a node at 50%query completion.For Hadoop and HadoopDB,this is equivalent to fail-ing a node when 50%of the scheduled Map tasks are done.For Vertica,this is equivalent to failing a node after 50%of the average query completion time for the given query.

To measure percentage increase in query time in heterogeneous environments,we slow down a node by running an I/O-intensive background job that randomly seeks values from a large ?le and frequently clears OS caches.This ?le is located on the same disk where data for each system is stored.

We observed no differences in percentage slowdown between HadoopDB with or without SMS and between Hadoop with or with-out Hive.Therefore,we only report results of HadoopDB with SMS and Hadoop with Hive and refer to both systems as HadoopDB and Hadoop from now on.

The results of the experiments are shown in Fig.11.Node failure caused HadoopDB and Hadoop to have smaller slowdowns than Vertica.Vertica’s increase in total query execution time is due to the overhead associated with query abortion and complete restart.In both HadoopDB and Hadoop,the tasks of the failed node are distributed over the remaining available nodes that contain replicas of the data.HadoopDB slightly outperforms Hadoop.In Hadoop TaskTrackers assigned blocks not local to them will copy the data ?rst (from a replica)before processing.In HadoopDB,however,processing is pushed into the (replica)database.Since the number of records returned after query processing is less than the raw size of data,HadoopDB does not experience Hadoop’s network overhead on node failure.

In an environment where one node is extremely slow,HadoopDB and Hadoop experience less than 30%increase in total query exe-cution time,while Vertica experiences more than a 170%increase in query running time.Vertica waits for the straggler node to com-plete processing.HadoopDB and Hadoop run speculative tasks on TaskTrackers that completed their tasks.Since the data is chunked (HadoopDB has 1GB chunks,Hadoop has 256MB blocks),multi-ple TaskTrackers concurrently process different replicas of unpro-cessed blocks assigned to the straggler.Thus,the delay due to pro-cessing those blocks is distributed across the cluster.

In our experiments,we discovered an assumption made by Hadoop’s task scheduler that contradicts the HadoopDB model.In Hadoop,TaskTrackers will copy data not local to them from the straggler or the replica.HadoopDB,however,does not move PostgreSQL chunks to new nodes.Instead,the TaskTracker of the redundant task connects to either the straggler’s database or the replica’s database.If the TaskTracker connects to the straggler’s database,the straggler needs to concurrently process an additional query leading to further slowdown.Therefore,the same feature that causes HadoopDB to have slightly better fault tolerance than Hadoop,causes a slightly higher percentage slow down in heterogeneous environments for HadoopDB.We plan to modify the current task scheduler implementation to provide hints to speculative TaskTrackers to avoid connecting to a straggler node and to connect to replicas instead.

7.1Discussion

It should be pointed out that although Vertica’s percentage slowdown was larger than Hadoop and HadoopDB,its total query time (even with the failure or the slow node)was still lower than Hadoop or HadoopDB.Furthermore,Vertica’s performance in the absence of failures is an order of magnitude faster than Hadoop and HadoopDB (mostly because its column-oriented layout of data is a big win for the small aggregation query).This order of magnitude of performance could be translated to the same performance as Hadoop and HadoopDB,but using an order of magnitude fewer nodes.Hence,failures and slow nodes become less likely for Vertica than for Hadoop and HadoopDB.Furthermore,eBay’s 6.5petabyte database (perhaps the largest known data warehouse worldwide as of June 2009)[4]uses only 96nodes in a shared-nothing cluster.Failures are still reasonably rare at fewer than 100nodes.

We argue that in the future,1000-node clusters will be com-monplace for production database deployments,and 10,000-node clusters will not be unusual.There are three trends that support this prediction.First,data production continues to grow faster than Moore’s law (see Section 1).Second,it is becoming clear that from both a price/performance and (an increasingly important)power/performance perspective,many low-cost,low-power servers are far better than fewer heavy-weight servers [14].Third,there

is now,more than ever,a requirement to perform data analysis inside of the DBMS,rather than pushing data to external systems for analysis.Disk-heavy architectures such as the eBay96-node DBMS do not have the necessary CPU horsepower for analytical workloads[4].

Hence,awaiting us in the future are heavy-weight analytic database jobs,requiring more time and more nodes.The probabil-ity of failure in these next generation applications will be far larger than it is today,and restarting entire jobs upon a failure will be unacceptable(failures might be common enough that long-running jobs never?nish!)Thus,although Hadoop and HadoopDB pay a performance penalty for runtime scheduling,block-level restart, and frequent checkpointing,such an overhead to achieve robust fault tolerance will become necessary in the future.One feature of HadoopDB is that it can elegantly transition between both ends of the spectrum.Since one chunk is the basic unit of work,it can play in the high-performance/low-fault-tolerance space of today’s workloads(like Vertica)by setting a chunk size to be in?nite,or in high fault tolerance by using more granular chunks(like Hadoop). In future work,we plan to explore the fault-tolerance/performance tradeoff in more detail.

8.CONCLUSION

Our experiments show that HadoopDB is able to approach the performance of parallel database systems while achieving similar scores on fault tolerance,an ability to operate in heterogeneous en-vironments,and software license cost as Hadoop.Although the performance of HadoopDB does not in general match the perfor-mance of parallel database systems,much of this was due to the fact that PostgreSQL is not a column-store and we did not use data compression in PostgreSQL.Moreover,Hadoop and Hive are rel-atively young open-source projects.We expect future releases to enhance performance.As a result,HadoopDB will automatically bene?t from these improvements.

HadoopDB is therefore a hybrid of the parallel DBMS and Hadoop approaches to data analysis,achieving the performance and ef?ciency of parallel databases,yet still yielding the scalability, fault tolerance,and?exibility of MapReduce-based systems. The ability of HadoopDB to directly incorporate Hadoop and open source DBMS software(without code modi?cation)makes HadoopDB particularly?exible and extensible for performing data analysis at the large scales expected of future workloads.

9.ACKNOWLEDGMENTS

We’d like to thank Sergey Melnik and the three anonymous re-viewers for their extremely insightful feedback on an earlier version of this paper,which we incorporated into the?nal version.We’d also like to thank Eric McCall for helping us get Vertica running on EC2.This work was sponsored by the NSF under grants IIS-0845643and IIS-08444809.

10.REFERENCES

[1]Hadoop.Web https://www.wendangku.net/doc/2e3240159.html,/core/.

[2]HadoopDB Project.Web page.

https://www.wendangku.net/doc/2e3240159.html,/hadoopdb/hadoopdb.html.

[3]https://www.wendangku.net/doc/2e3240159.html,/.

[4]D.Abadi.What is the right way to measure scale?DBMS

Musings https://www.wendangku.net/doc/2e3240159.html,/2009/06/

what-is-right-way-to-measure-scale.html.

9Disclosure:Authors Daniel Abadi and Alexander Rasin have a small?nancial stake in Vertica due to their involvement in the pre-decessor C-Store project.

[5]P.Barham,B.Dragovic,K.Fraser,S.Hand,T.Harris,A.Ho,

R.Neugebauer,I.Pratt,and A.War?eld.Xen and the art of

virtualization.In Proc.of SOSP,2003.

[6]R.Chaiken,B.Jenkins,https://www.wendangku.net/doc/2e3240159.html,rson,B.Ramsey,D.Shakib,

S.Weaver,and J.Zhou.Scope:Easy and ef?cient parallel

processing of massive data sets.In Proc.of VLDB,2008. [7]G.Czajkowski.Sorting1pb with mapreduce.

https://www.wendangku.net/doc/2e3240159.html,/2008/11/

sorting-1pb-with-mapreduce.html.

[8]J.Dean and S.Ghemawat.MapReduce:Simpli?ed Data

Processing on Large Clusters.In OSDI,2004.

[9]D.DeWitt and M.Stonebraker.MapReduce:A major step

backwards.DatabaseColumn Blog.www.databasecolumn.

com/2008/01/mapreduce-a-major-step-back.html.

[10]D.J.DeWitt,R.H.Gerber,G.Graefe,M.L.Heytens,K.B.

Kumar,and M.Muralikrishna.GAMMA-A High

Performance Data?ow Database Machine.In VLDB’86,

1986.

[11]Facebook.Hive.Web page.

https://www.wendangku.net/doc/2e3240159.html,/jira/browse/HADOOP-3601. [12]S.Fushimi,M.Kitsuregawa,and H.Tanaka.An Overview of

The System Software of A Parallel Relational Database

Machine.In VLDB’86,1986.

[13]Hadoop Project.Hadoop Cluster Setup.Web Page.

https://www.wendangku.net/doc/2e3240159.html,/core/docs/current/cluster_

setup.html.

[14]J.Hamilton.Cooperative expendable micro-slice servers

(cems):Low cost,low power servers for internet-scale

services.In Proc.of CIDR,2009.

[15]Hive Project.Hive SVN Repository.Accessed May19th

https://www.wendangku.net/doc/2e3240159.html,/viewvc/hadoop/hive/. [16]J.N.Hoover.Start-Ups Bring Google’s Parallel Processing

To Data https://www.wendangku.net/doc/2e3240159.html,rmationWeek,August29th,2008.

[17]S.Madden,D.DeWitt,and M.Stonebraker.Database

parallelism choices greatly impact scalability.

DatabaseColumn https://www.wendangku.net/doc/2e3240159.html,/2007/ 10/database-parallelism-choices.html.

[18]Mayank Bawa.A$5.1M Addendum to our Series B.

https://www.wendangku.net/doc/2e3240159.html,/blog/index.php/2009/02/25/

a-51m-addendum-to-our-series-b/.

[19]C.Monash.The1-petabyte barrier is crumbling.

https://www.wendangku.net/doc/2e3240159.html,/community/node/31439. [20]C.Monash.Cloudera presents the MapReduce bull case.

https://www.wendangku.net/doc/2e3240159.html,/2009/04/15/

cloudera-presents-the-mapreduce-bull-case/. [21]C.Olofson.Worldwide RDBMS2005vendor shares.

Technical Report201692,IDC,May2006.

[22]C.Olston,B.Reed,U.Srivastava,R.Kumar,and

A.Tomkins.Pig latin:a not-so-foreign language for data

processing.In Proc.of SIGMOD,2008.

[23]A.Pavlo,A.Rasin,S.Madden,M.Stonebraker,D.DeWitt,

E.Paulson,L.Shrinivas,and D.J.Abadi.A Comparison of

Approaches to Large Scale Data Analysis.In Proc.of

SIGMOD,2009.

[24]M.Stonebraker,D.J.Abadi,A.Batkin,X.Chen,

M.Cherniack,M.Ferreira,https://www.wendangku.net/doc/2e3240159.html,u,A.Lin,S.Madden,E.J.

O’Neil,P.E.O’Neil,A.Rasin,N.Tran,and S.B.Zdonik.

C-Store:A column-oriented DBMS.In VLDB,2005. [25]D.Vesset.Worldwide data warehousing tools2005vendor

shares.Technical Report203229,IDC,August2006.

关键岗位职责描述

DRAGON 龙头（集团）股份关键岗位职责描述 2004年8月

本文档的作者有: 蕊高级顾问钟蓓顾问毕博管理咨询（）市西路1168号泰富广场31层 : +86-21-5292 5392 传真: +86-21-5292 5391 本文包含的资料属于毕博管理咨询公司的商业，一旦泄漏，可能被商业竞争者利用。因此本文档容仅限于对毕博管理咨询公司作评估之用；除此之外，不得私自发布、使用和复制文档的任何容。如果毕博管理咨询公司有幸和贵方签订合同，对本文档中数据的发布、使用和复制的权利将在以后签订的协议中明确说明。本限制条款不适用于可以从其它合法渠道得到对文中包含数据的使用授权的情况。 ? 2004 归毕博管理咨询公司所有。

目录高管董事长兼首席执行官 (7) 董事会秘书 (9) 总经理 (12) 分管副总一（采购/生产/研发） (14) 分管副总二（市场/销售/物流） (17) 分管副总三（国际贸易） (20) 分管副总四（其它业务） (23) 职能部门部门：投资规划部投资规划部总监 (26) －战略规划经理 (28) －投资管理经理 (30) －资产经营经理 (32) 部门：计划财务部财务总监 (34) －计划与绩效经理 (37) －资金经理 (39) －会计税务经理 (41) －资产与产权经理 (44) －派出财务总监（相当于总部财务经理） (46) 部门：办公室办公室主任 (49) －秘书 (51) －行政后勤经理 (53) －对外事务经理 (55) －法律经理 (57) 部门：人力资源部

－规划任用经理 (62) －培训发展经理 (64) －绩效薪酬经理 (67) 部门：信息技术部信息技术部总监 (70) －规划管理经理 (73) －应用开发经理 (75) －系统维护经理 (77) 部门：审计监察部审计监察总监 (79) －审计监察经理 (82) 业务部门部门：市场部市场部总监 (84) －品牌经理 (87) －营销中心主任 (90) 部门：研发部研发部经理 (92) －设计室主任 (95) 部门：销售部销售部总监 (97) －大区经理 (100) －区域经理 (103) －客户部经理 (105) －销售运营经理 (108) －销售计划主管 (111) －订单管理主管 (113) 部门：生产部

公司企业各部门岗位职责.do

公司企业各部门岗位职责一、总经理工作职责 1、执行董事会决议,主持全面工作,保证经营目标的实现,及时、足额地完成董事会下达的利润指标。 2、组织实施经董事会批准的公司年度工作计划和财务预算报告及利润分配、使用方案。 3、组织实施经董事会批准的新上项目。 4、组织指挥公司的日常经营管理工作, 决定组织体制和人事编制,决定总经理助理、各职能部门和下属各关联公司经理以及其他高级职员的任免、奖惩，建立健全公司统一、高效的组织体系和工作体系。 5、决定对成绩显著的员工予以奖励、调资和晋级,对违纪员工的处分,直至辞退。 6、审查批准年度计划内的经营、投资、改造、基建项目和流动资金贷款、使用、担保的可行性报告。 7、健全财务管理,严格财经纪律,搞好增收节支和开源节流工作,保证现有资产的保值和增值。 8、搞好员工的思想政治工作,加强员工队伍的建设,建立一支作风优良、纪律严明、训练有素、能胜任本职工作的员工队伍。 9、坚持民主集中制的原则,发挥“领导一班人”的作用,充分发挥员工的积极性和创造性。 10、加强企业文化建设,搞好社会公共关系,树立公司良好的社会形象。 11、加强廉正建设,搞好精神文明建设,支持各种社团工作。

12、积极完成董事会交办的其他工作任务。二、副总经理工作职责 1、认真执行上级主管部门的指示和公司决议。 2、实行岗位责任制，督导所属各部门职员一切职责的执行、落实。 3、关心所分管部门职工的思想、工作，培养提高他们的政治素质和业务素质，对职工进行职业道德教育，增强责任心，尽职尽责搞好各项服务工作。 4、努力办好公司网站和工作简报，开辟公司与员工信息沟通的渠道，加强宣传力度，不断改进工作。 5、贯彻勤俭办公的企业精神，节约各项开支，不断提高经济效益。 6、接受员工的监督，深入各部门进行调查研究，虚心听取意见，采纳合理建议，不断改进工作，提高效益。 7、协助总经理办理公司对内与对外事项，随时提出个人意见，使各项事务办理得更加完善。 8、完成上司交办的其它任务。

煤矿井下安全避险“六大系统”的作用和配置方案示范文本

煤矿井下安全避险“六大系统”的作用和配置方案示范文本 In The Actual Work Production Management, In Order To Ensure The Smooth Progress Of The Process, And Consider The Relationship Between Each Link, The Specific Requirements Of Each Link To Achieve Risk Control And Planning 某某管理中心 XX年XX月

煤矿井下安全避险“六大系统”的作用和配置方案示范文本使用指引：此解决方案资料应用在实际工作生产管理中为了保障过程顺利推进，同时考虑各个环节之间的关系，每个环节实现的具体要求而进行的风险控制与规划，并将危害降低到最小，文档经过下载可进行自定义修改，请根据实际需求进行调整与使用。摘要：研究了煤矿安全监控系统在瓦斯、火灾等重特大事故监控与预警和事故调查中的作用，提出了系统设置方案和基于煤矿安全监控系统的煤矿瓦斯爆炸等事故直接原因认定方法；研究了煤矿井下人员位置监测系统在遏制超定员生产、事故应急救援等方面的作用，提出了系统设置方案；提出了以矿用调度通信系统和矿井广播通信系统为基础，矿井移动通信系统为补充的矿井通信联络方案；提出了严禁矿用IP电话通信系统和矿井移动通信系统替代矿用调度通信系统的观点；提出了高瓦斯矿井的入井人员宜携带隔离式自救器，隔离式自救器宜选用压缩氧隔离式自救器；提出了避难硐室的装备要求和避难硐室性能价格

操作系统介绍与安装完整版.doc

认识操作系统系统简介定义：操作系统（英语：Operating System，简称OS）是管理和控制计算机硬件与软件资源的计算机程序，是直接运行在“裸机”上的最基本的系统软件，任何其他软件都必须在操作系统的支持下才能运行。操操作系统所处位置作系统是用户和计算机的接口，同时也是计算机硬件和其他软件的接口。操作系统的功能：包括管理计算机系统的硬件、软件及数据资源，控制程序运行，改善人机界面，为其它应用软件提供支持等，使计算机系统所有资源最大限度地发挥作用，提供了各种形式的用户界面，使用户有一个好

的工作环境，为其它软件的开发提供必要的服务和相应的接口。操作系统的种类：各种设备安装的操作系统可从简单到复杂，可分为智能卡操作系统、实时操作系统、传感器节点操作系统、嵌入式操作系统、个人计算机操作系统、多处理器操作系统、网络操作系统和大型机操作系统。按应用领域划分主要有三种：桌面操作系统、服务器操作系统嵌入式操作系统。 ○1桌面操作系统桌面操作系统主要用于个人计算机上。个人计算机市场从硬件架构上来说主要分为两大阵营，PC机与Mac 机，从软件上可主要分为两大类，分别为类Unix操作系统和Windows操作系统： 1、Unix和类Unix操作系统：Mac OS X，Linux发行

版（如Debian，Ubuntu，Linux Mint，openSUSE，Fedora等）；一个流行Linux发行版——Ubuntu桌面 Mac OS X桌面 2、微软公司Windows操作系统：Windows XP，Windows Vista，Windows 7，Windows 8等。 Windows 8 Metro Windows 8桌面 ○2服务器操作系统服务器操作系统一般指的是安装在大型计算机上的操作系统，比如Web服务器、应用服务器和数据库服务器等。服务器操作系统主要集中在三大类： 1、Unix系列：SUN Solaris，IBM-AIX，HP-UX，

销售岗位职责描述

销售岗位职责描述销售岗位职责描述篇一：销售员岗位职责描述销售员岗位职责描述在销售经理（主管）领导下，销售人员严格遵守销售管理规定和销售人员行为规范，积极认真开展销售接待和成交工作：一、岗位职责： 1、按照项目规定的接听电话和接待来访次序，认真接听来电电话、热情接待来访客户，并做详细记录。 2、向来电、来访客户主动、热情介绍本项目概况，耐心了解客户需求，推荐户型。 3、珍惜每一位客源，详细分析客户情况，制定跟进策略，及时填制客户档案，作好跟进记录。 4、在成交过程中出现疑难问题及时向主管汇报，以便取得帮助。 5、积极与客户取得联系，促成客户复访、提高客户购房意向直至成交。 6、客户成交时及时通知开发商收款，不得以任何理由截留，认购、成交后务必将认购协议和合同在成交当日交给经理（主管）或秘书，不得以任何理由保留在销售员手中。 7、成交时须在第一时间报告经理及主管、秘书，确认该房号尚未售出后方可销售，不得在不知情的情况下销售，否则后果自负。 8、及时与经理（主管）沟通客户情况，认真分析成交或未成交的原因，不断提高销售业务水平。 9、积极主动协助成交客户办理认购、成交及其他购房手续。 10、与客户建立并保持良好的关系，将有关通知、项目最新信息、促销活动及时主动转达客户。 1

1、尊重开发商工作人员，与开发商的财务、工程、销售管理人员保持良好的关系，及时将有关信息传达给销售经理（主管）。 1 2、服从销售经理（主管）领导，服从项目分配。 1 3、与同事保持友好合作态度，如遇撞单应互相沟通情况，及时向经理 1 4、汇报并服从上级安排，不得以各种形式抢单。 1 5、积极参加公司组织的各类培训和项目培训。 1 6、严格遵守公司或项目所要求的工作时间。 1 7、积极主动做好并保持销售现场清洁卫生。二、例行工作：每日： 1、按时到岗：早9： 00前台按时集合做到服装得体，工牌佩戴完毕。 2、晨会按时参加 3、客户接待正常：包括来电来访、定期回访 4、B级卡填写完整（不在B级卡上体现的客户公司不予给短信群发） 5、晚总结会4： 45—5： 00按时召开，认真分析当日客户情况并做好回访计划。 6、在5： 10之前配合内业完成每日销售统计

公司各部门岗位职责范本

企业各部门岗位职责范本二会计核算科；直接上级：财务部经理；下属部门：固定资产核算岗位，材料核算岗位，工资核算岗位，成本费用核算岗位，无形资产和递延资产。部门性质：会计成本费用核算，成本管理；管理权限：受财务部部长委托，行使对财务部的成本费用核算整过程的核算和监督管理；管理职能；严格按照成本核算办法规定，正确归集，分配生产费用，计算产品成本，编制成本费用报表，进行成本费用的分析和考核，对所承担的工作负责。具备条件：1。坚持原则，廉洁奉公； 2．具有会计专业技术资格； 3．主要一个单位或者单位内一个重要方面的财务工作时间不少于二年； 4．熟悉国家的财经法律、法规、规章和方针、政策，掌握本行业业务管理的有关知识； 5．有较强的组织能力； 6．身体状况能够适应本职工要求。职业道德：1。敬业爱岗； 2．热爱本职工作； 3．依法办事； 4．客观公正； 5．搞好服务； 6．保守秘密；主要职责：1。服从财务部部长对会计核算工作的要求和指导，一切工作在财务部部长的指导下展开，一切管理行为向财务部部长负责；

2．每年根据公司制订方针、计划及目标，确定成本控制目标，层层落实分解到各生产车间及班组，并组织推动并督促检查各车间班组做好成本核算工作，做到人人动脑筋，个个算成本，保证完成和超额完成成本计划，保证公司总目标的实现。 3制订和组织执行全公司的成本管理制度，进行成本预测，编制成本预算；加强成本控制，核算产品成本；编制成本费用报表，检查考核综合分析公司成本计划的完成情况及增产节约经济效果，组织和指导各车间开展成本管理工作，总结推广先进经验。并协助有关部门建立在产品台账和半产品成品登记簿，在产品的内部转移和半产品的出库、入库都要认真登记，对在产品和自制半产品要定期盘点，做到账实相符。采取多种形式开展部门、车间、班组的群众性经济核算，贯彻经济责任制。4．会同劳资部门，严格按照规定掌握使用工资总额，分析工资计划的执行情况，对于违反工资计划，以及按照规定预发，滥发工资及津贴的行为应预以制止，或向财务部部长和有关领导报告。 5．审核工资计算表，办理代扣款项，发放工资后，要及时收回；经签名的工资计算表，作为原始凭据入账。 6．制订固定资产目录，进行固定资产折旧的分类核算。对列作固定资产管理，按类别和部门列示，一式二份，一份由设备动力科保管，一份由财务部门保管，每月设备动力科应将固定资产的增减数量及保存地点报财务会计核算科，做到账实相符。 7．会计核算科组织使用部门，设备动力科每年对固定资产进行清查盘点，发现盈亏要清查原因，在未查明之前不得轻易处理，待查明原因后经法定代表人批准，进行账务处理。 8．拟定材料管理与核算的实施办法，对于原材料，燃料、包装物等材料的收发，领退和保管，都要制定手续和制度，明确责任。 9．认真审核材料供应计划和供贷合同，防止盲目采购，对超计划用款，要经过总经理批准。 10．参与库存材料的清查盘点。对盘盈、盘亏和报废的材料要查明原因，分别不同情况经过批准后进行账务处理。 11．要经常深入了解材料的储备情况，对于超过正常储备和呆滞积压的材料，要分析原因，提出处理意见，报总经理批准后处理。 12．完成领导交办的其他工作财务科；

六大系统建设情况简介

燕煤公司程庄煤矿“六大系统” 建设完成情况程庄煤矿 2013年12月

燕煤公司程庄煤矿六大安全避险系统建设完成情况为促进和规范煤矿井下紧急避险系统的建设完善和管理工作，根据《国务院关于进一步加强企业安全生产工作的通知》（国发〔2010〕23号），《关于建设完善煤矿井下安全避险“六大系统”的通知》（安监总煤装【2010】146号）以及《推进全省煤矿建设完善井下安全避险“六大系统”工作规划及实施方案》（晋煤救字【2010】1644号）的要求，煤矿必须建设完善煤矿井下安全避险“六大系统”，监测监控、人员定位、紧急避险、通信联络、压风自救、供水施救安全避险“六大系统”，目前程庄矿六大系统均已建成并投入运行。具体情况如下： 1、监测监控系统我矿现装备的瓦斯监控系统型号为重庆煤科院研发生产的KJ90NB监控系统。2007年10月底开始对全矿井监控进行升级改造，由原来的KJ38系统升级为KJ90NB系统,2009年11月12日前对矿井监控系统改造全部完成。目前，地面中心站主控软件、网络终端软件、图形工作站及联网上传功能完善。程庄煤矿KJ90NB监控系统经过几年时间的调试及试运行，对系统性能特点及功能得以全面考核表明，系统性能稳定可靠，各项功能和技术指标达到原设计要求，与传统监控系统比较，在快速反应、系统容量、通讯稳定性、兼容及扩

展、软件功能等方面体现出宽带监控系统的强大优势，技术水平国内领先。 KJ90NB监控系统及设备具备合格有效的标志证书.能与市局联网。具备风电、瓦斯电和故障闭锁功能。实行24小时不间断值班。上岗人员经培训且取得相关证件。 2、人员定位系统我矿现装备的人员定位系统重庆煤科院研发生产的型号为KJ251A煤矿人员监控系统。 KJ251A煤矿人员监控系统于2006年12月正式启用，系统具有图形显示功能，人员跟踪功能、员工考勤功能、中断取数功能、门禁功能、报警功能等。 KJ251A人员定位考勤管理系统平时进行日常的考勤,督促相关工作人员及时到位。井下发生异常情况时,可以知道人员的分布位置及数量，及时找到被困人员。发生事故后,可为事故调查提供参考依据。人员定位系统软件采用三层架构体系。数据采集与分析、存储、应用表示三部分既相对独立又是有机融合。 2010年由重庆煤科院对系统进行升级，由原来的KJ251升级为KJ251A人员定位系统。 3、通讯联络系统燕煤公司通讯系统分为三大部分，分别是：有线通信系统、无线通信系统、矿用IP网络广播对讲系统。三个系统均已通过验收，目前运行正常。具体情况简介如下：

公司各部门岗位职责

第一章总经理岗位职责一、职权 1、全面负责公司的经营管理工作； 2、对下属人员有任命、处分及奖励的权力； 3、主持公司日常工作。对副总经理（或部门负责人）有工作分配、指挥调度、绩效考核、奖惩建议的权力。 3、审核、审批公司经营管理过程中的相关文件、本部门文件报表。 4、公司有关管理制度授予的权力二、职责 1、主持公司的经营管理工作，组织实施董事会决议。 2、组织实施公司年度经营计划和投资方案，实现年度经营目标。 3、根据市场需求，提出调整经营战略方针、经营计划、投资方案的建议。 4、组织实施年度财务预算方案组织制定公司经营预算方案。 5、拟定公司的基本管理制度。 6、决定公司具体规章制度，组织建立和完善公司的工作程序。 7、按既定模式管理公司，保证公司生产经营管理活动有序进行。 8、建立健全公司质量体系，保证提供符合标准的产品和服务。 9、拟订公司内部管理机构设置方案及人员编制和薪酬方案。 10、提请董事会聘任或解聘公司部长级级管理人员，并提出副总级管理人员的奖惩建议。 11、决定公司部门级管理人员的聘任或解聘事项。 12、定期提出营业状况和财务状况报告。 13、提出重大技术、设备改造更新建议和预算外开支计划。 14、创建公司企业文化。 15、保证公司经营运作的合法性，保证公司、员工安全以及公司、员工的合法权益不受侵犯。

16、在权限内代表公司对外开展商务活动。三、任职资格 1、大专以上学历，具有渊博的专业知识，八年以上管理经历。 2、具备现代企业领导能力，善于激发下属，讲究团队精神，事业心极强。 3、极强的工作组织能力，企业长远发展的敏锐能力及分析能力。 4、善于企业规划，技术开发，能给企业带来盈利及持续发展。第二章副总经理岗位职责一、职权 1、主持公司各部门日常工作。对部门员工有工作分配、指挥调度、绩效考核、奖惩建议的权力。兼任供应部长。 2、审核、审批各部门相关文件、本部门文件报表。 3、公司有关管理制度授予的权力二、职责 1、对总经理负责，协助总经理抓好全面工作，在总经理不在公司时，代总经理负责公司的全面职责； 2、熟悉和掌握公司情况，及时向总经理反映、提出建议和意见，当好总经理的参谋和助手； 3、具体抓好公司的设计执行、统计及各项指标考核等； 4、参与公司重大经营决策，组织制定公司营销发展战略，编制年度营销计划；销售计划安排，实施，及时解决实施中出现的问题； 5、每月按时向总经理汇报生产计划的实施情况； 6、组织召开例会、协调各部门工作； 7、企业商业机密受国家法律保护，严守公司商业机密是每一位员工的职责，不得将任何情况向外透露，不得将公司客户资料向同行透露或据为己有，如造成公司损失的将追究相关法律责任； 8、兼任公司供应部工作并按时完成总经理交办的其他临时性工作。三、任职资格

煤矿六大系统简介

提升系统简介：山西中强福山煤业有限公司开拓方式为斜井开拓，主斜井井口标高+894.6。斜长472m。倾角24°34′，三心拱断面，净宽 3.6m，净高2.65m，净断面积9.54m2 。担负矿井运输原煤提升任务，兼做进风井。辅助提升采用 JK-3.5×2.65型单绳缠绕式矿井提升机，煤炭提升采用STJ1000钢绳芯胶带机。副斜井井口标高+902.3m。斜长408m，倾角29°56，半圆拱断面，净宽4m,净高3.6m。担负全矿井进风及运送人员的任务。现开采煤层为9#+10#号煤层。设计能力为90万t/a。本矿井主井采用斜井开拓，矿井设计生产能力为90万t/a，工作制度为330d/a，提升时间16h/d，安装带式输送机，担负原煤的提升。根据矿井生产能力、开拓方式、采区及工作面布置等条件，主斜井原煤提升采用钢绳芯深槽角强力胶带输送机。井底煤仓的原煤通过大型给煤机、经主斜井胶带输送机输送至主斜井井口房，再转载至地面生产系统。运输系统简介：山西中强福山煤业有限公司井下现南回风大巷、中央→南北总皮带大巷→南翼第一联行皮带→南翼主运皮带→东巷主运输皮带。中央变电所、中央泵房、水仓→南翼第三联巷皮带→南北总轨道大巷一部皮带→南北总轨道大巷二部皮带→东巷主运输皮带。主运输皮带（DSJ100/63/2×160）经溜煤口，落到主斜井皮带，通过主斜井皮带输送到地面溜煤口，然后经两部转载皮带运往地面运输皮带到煤场。 1、施工期间主斜井皮带：型号：STJ-800/250S，带宽800mm，

带速2m/s。超出井口长度20米，超出井口部分坡度12.1度。总长度560米。 2、第一部转载皮带：带宽650mm，带速，1.3—1.6m/s 。长度176米，坡度约为2度. 本皮带尾装有给煤机，使本部皮带运输煤量均匀。 3、第二部转载皮带：带宽650mm，带速，1.3—1.6m/s 。长度65米，坡度约为3度。通风系统简介：主扇选用两台防爆对旋轴流风机FBCDZNO27/2×355，主扇风量为：82-165m3/s,n＝740r/min, 配套电机功率Nf＝2×355kW，一台工作，一台备用。设计掘进工作面均采用压入式独立通风，选用FBD —№5.6/15×2型局部通风机供风。风流方向为：新鲜风流→副斜井（主斜井）→东轨道巷（东运输巷、东行人巷）→南北皮带大巷（南北轨道大巷）→工作面运输顺槽→回采工作面→工作面回风顺槽→集中回风巷→总回风巷→回风立井→地面。矿井通风方式为中央分列式，通风方法为机械抽出式通风，主、副斜井进风，回风立井为专用回风井。排水系统简介：该矿井涌水量为：30—50m3/天，分两级排水。一级排水：工作面涌水经各迎头潜水泵、多级泵（D46-50*6，75KW）用4寸管（DN100）直排到井底中央泵房。二级排水：中央水泵房安装3台多级泵（D46-50*6，75KW），经4寸管排到地面静压水池。采掘系统简介：设计采用单水平多煤层联合开拓，全井田共划分

生产部岗位职责描述

职位：生产部部长岗位描述： 1、负责主持本部门的全面工作，组织并督促部门人员全面完成本部职责范围内的各项工作任务。 2、贯彻落实本部岗位责任制和工作标准，密切与工程部、技术部、财务部、仓储物流部等部门的工作联系，加强与有关部门的协作配合工作； 3、负责组织生产、设备、安全检查、环保、生产统计等管理制度的拟订、修改、检查、监督、控制及实施执行； 4、负责组织编制年、季、月度生产作业、设备维修、安全环保计划。定期组织召开公司月度生产计划排产会，及时组织实施、检查、协调、考核； 5、负责牵头召开公司每周一次调度会，确保产品合同的履行，力争公司生产任务全面、超额完成； 6、配合技术工艺部门,参加技术管理标准、生产工艺流程、新产品开发方案审定工作，及时安排、组织试生产，不断提高公司产品的市场竞争力； 7、负责抓安全生产、现场管理、劳动防护、环境保护专项工作； 8、负责做好生产统计核算基础管理工作。重视生产用原始记录、台账、报表管理工作，及时编制上报年、季、月度生产、设备等有关统计报表； 9、负责做好生产设备、计量器具维护检修工作，合理安排设备检修时间； 10、负责生产现场物流体系的规划和改进； 11、强化调度管理。科学地平衡综合生产能力，合理安排生产作业时间，平衡用电、节约能源、节约产品制造费用、降低生产成本； 12、负责组织生产调度员、统计员、计划员、设备管理员、安全员及车间级管理人员的业务指导和培训工作，并对其工作定期检查、考核评比； 13、负责组织拟定本部门工作目标、工作计划、并及时组织实施、指导、协调、检查、监督及控制； 14、及时完成上级主管交办的其他工作任务。

计算机操作系统简单介绍

计算机操作系统简单介绍操作系统的种类繁多，依其功能和特性分为分批处理操作系统、分时操作系统和实时操作系统等；依同时管理用户数的多少分为单用户操作系统和多用户操作系统；适合管理计算机网络环境的网络操作系统。 1)微机操作系统随着微机硬件技术的发展而发展，从简单到复杂。Microsoft 公司开发的DOS是一单用户单任务系统，而Windows操作系统则是一多户多任务系统，经过十几年的发展，已从Windows 3.1发展Windows NT、Windows 2000、Windows XP、Windows vista、Windows 7和Windows 8等等。它是当前微机中广泛使用的操作系统之一。Linux是一个源码公开的操作系统，程序员可以根据自己的兴趣和灵感对其进行改变，这让Linux吸收了无数程序员的精华，不断壮大，已被越来越多的用户所采用，是Windows操作系统强有力的竞争对手。 2）语言处理系统人和计算机交流信息使用的语言称为计算机语言或称程序设计语言。计算机语言通常分为机器语言、汇编语言和高级语言三类。如果要在计算机上运行高级语言程序就必须配备程序语言翻译程序（下简称翻译程序）。翻译程序本身是一组程序，不同的高级语言都有相应的翻译程序。翻译的方法有两种：一种称为“解释”。早期的BASIC源程序的执行都采用这种方式。它调用机器配备的BASIC“解释程序”，在运行BASIC源程序时，逐条把BASIC的源程序语句进行解释和执行，它不保留目标程序代码，即不产生可执行文件。这种方式速度较慢，每次运行都要经过“解释”，边解释边执行。另一种称为“编译”，它调用相应语言的编译程序，把源程序变成目标程序（以.OBJ为扩展名），然后再用连接程序，把目标程序与库文件相连接形成可执行文件。尽管编译的过程复杂一些，但它形成的可执行文件（以.exe为扩展名）可以反复执行，速度较快。运行程序时只要键入可执行程序的文件名，再按Enter键即可。对源程序进行解释和编译任务的程序，分别叫作编译程序和解释程序。如FORTRAN、COBOL、PASCAL和C等高级语言，使用时需有相应的编译程序；BASIC、LISP等高级语言，使用时需用相应的解释程序。

公司各部门岗位职责说明

公司各部门岗位职责(草案) 一、行政人事部工作职责 1、建立并健全各项规章制度，促进公司各项工作的规范化管理。 2、根据行文，办文制度和程序，做好文件、报刊的收发、传递、催办的工作。 3、负责组织安排各类会议，编写会议纪要，督促检查会议决定的执行情况，及时将信息反馈给有关领导。 4、草拟公司日常总结、信、函、文件等行政公文，记录公司重大事件和收集房产相关信息，处理文印工作。 5、按照档案管理制度，进行公司档案的收集、整理、立卷、归档工作。 6、负责管理公司印章以及印章的使用。审核以公司名义对外签发的合同、协议、信函等所有文件及出具公司对外证明。 7、负责公司人力资源管理，员工考勤、考核、工资、奖金、保险、福利等制度流程、操作细则的制定、修订及作业。处理劳动争议。 8、深化人力资源管理工作包括在编人员晋升机制，人员调配，人才储备等。 8、负责承办公司各类手续，诸如工商执照、资质申报等。 9、负责公司对外接待工作，加强对外联络，拓展公关业务，促进公司与社会各界的广泛合作和友好往来，树立良好的企业形象。 10、根据公司决定和领导指示，沟通内外关系，保证上情下达和

下情上报。 11、负责组织协调公司内部各部门共同办理的工作及公司有关重大活动。 12、负责公司环境清洁、员工宿舍管理等后勤服务工作，车辆的使用管理和办公设备的保养、维护。 13、完成领导交办的其他工作。行政部经理工作职责(李琛) 1、完成公司领导交办的各项工作； 2、董事长和主管领导与各部门之间上传下达的畅通，董事长和主管领导与各部门之间文件来往，负责及时反馈各部门在实际执行过程中的问题及建议并了解执行情况。 3、负责相应的会议记录，并将各部门对每次会议的落实情况及时向主管领导汇报。 4、负责起草董事长和主管领导交办的各种文件； 5、负责公司的人事管理、工资管理工作； 6、负责公司的办公用品财产管理工作； 7、督促检查各部门执行公司规章制度情况； 8、公司印章、通知及文件档案的管理； 9、公司的所有后勤保障，即：车辆安排、接待、票务、办公用品购置等。

公司各部门工作职能划分

公司各部门工作职责划分一、行政人力资源部门部门职责描述：根据企业整体发展战略，建立科学完善的人力资源管理与开发体系，实现公司人力资源的有效提升和合理配置，确保满足企业发展的人才需求。负责公司后勤管理工作、维护内部治安、确保公司财产安全，对所承担的工作负责。人力资源方面 1、人力资源规章制度管理负责人力资源管理制度的制定、修订、更正、废止，以及监督执行负责人力资源管理制度的解释与运用 2、人力资源规划管理结合企业发展战略，综合分析企业人力资源现状和未来一段时间内人力资源的供需情况，编制公司人力资源规划方案及具体的实施办法 3、招聘管理根据企业发展需要并结合人力资源规划，负责组织并实施招聘 4、培训管理

根据公司发展需要，进行人员培训需求分析，制定培训开发的总体目标，建立人才培训管理制度 5、绩效管理协同企业各职能部门建立员工绩效管理体系，组织定期或者不定期地对员工的考核工作，跟踪并有效地利用考核结果，提升员工及企业的整体工作绩效 6、薪酬福利管理负责制定科学合理的薪酬管理制度，并按规定做好企业日常的工资计划、核定、核算及统计分析，以及员工的福利等工作，实现企业的薪酬激励计划 7、劳动关系管理负责员工的聘用、劳动合同、调动、退休、离职、解聘及人事档案等日常管理，组织员工的职称与技术等级评聘，负责人事档案管理和公司人力资源信息系统的维护工作，协调企业内部员工工作，维护公司良好的劳动关系行政方面 1、行政性财产物资管理

制定企业低值易耗品和行政性固定资产管理办法，报领导审批后执行，并监督其他部门的执行情况根据企业各部门对办公用品的需求，实行统一购买、申领与日常管理对企业行政性财产进行登记、造册、定期盘点对企业行政性财务物资的维修和保养 2、日常行政事务管理对以企业名义发布的各种文件的管理，保证公文质量企业各类文件的归档管理和印章管理企业年检工作及证、照的登记、注册，荣誉申报等对外各种文件的草拟、表格的填报 3、会议管理安排企业各类会议会议的准备工作及会议期间的服务整理会议纪要，经领导审批后，下发成文企业会议室及会议室内的设备管理部门权限 1、参与制定企业人力资源战略规划的权力

矿井六大系统

矿井六大系统一、监测监控系统1、要求达到的标准系统主机必须双机备份5分钟内启动。主机或显示终端必须设在调度室。 2、本工作面使用情况在距工作面≤5m无风筒侧安设瓦斯探头T110-15m 范围内安设瓦斯探头T2。在皮带机头处安设YW报警仪yw 报警仪。总控上安设DD仪。风筒传感器FT安设在距工作面5-10m范围内的风筒上。温度传感器、CO报警仪安设在距风口10-15m范围内。在风机负荷线上安设两台KT。二、人员定位系统 1、要求达到的标准1实现井下坑道作业面工作人员的精确定位 2提供直观的巷道图 3矿井移动目标实时监视和屏幕显示布情况以及个通信分站的状态;显示大巷内各人员编号及其当前所在的位置 4实现各部门工作人员考勤功能为管理层对生产部门及个人的工作考核提供依据。 5实现井下定点考勤功能 6信息存储和历史数据回放 7突发情况报警功能通过矿用本安型定位卡上的报警按钮进行报警。 8发出报警信息功能 9异常数据自动报警功能 10) 人机对话 11 122、本工作面使用情况本矿所有人员下井必须佩戴人员定位仪随时掌握本工作面人员情况。三、通信系统 1、要求达到的标准通信有效距离应不小于10km100m。容量量、信号装置或系统内终端设备并发数量由相关标准

规定。终端设备输出功率信号设备输出功率无线设备工作频率的工作频率由相关标准规定。备用电源工作时间续工作时间不应小于2小时。 2、本工作面使用情况安设两部有线电话便于皮带开停时互相联系。第二部安设在距工作面50m处并每隔200m安设一组矿用隔爆式扩音器。四、紧急避险矿井应根据井下作业人员和巷道断层等情况择和布置避难硐室或移动式救生舱。所有矿井在各水平井底车场设置固定式避难硐室。有突出煤层的采区应设置采区避难室在防逆流 1000m范围内建设避难硐室或救生舱突出煤层的掘进巷道长度及采煤工作面走向长度超过500米时工作面500米范围内建设避难硐室或设置救生舱。避难硐室的额定人数 5% 至少满足15人的避难需求。避难硐室的设置应避开地址构造带、应力异常区以及透水威胁区 20m 井下避难硐室应具备安全防护、氧气供给、有害气体处理、温湿度控制、避难硐室内外环境参数检测、额定避险人员生存96h以上。矿井避灾路线图应包括井下所有避难硐室设置情况。避难硐室种类全避险。五、压风自救系统 1、 1压风自救系统的防护袋、送气管的材料应符合MT113的规定。 2GB2626的规定。 3压风自救装置应具有减压、节流、消噪音、过滤和开关等功能。 4 5 过5mm的现象。 6压风自救装置的操作应简单、快捷、可靠。

计算机操作系统介绍

《计算机操作系统》实验指导书内蒙古大学计算机学院

2011年9月第一部分.实验要求《计算机操作系统》课程实验的目的是为了使学生在课程学习的同时，通过做实验理解计算机操作系统的资源限制；通过进程调度算法的实现，了解计算机操作系统的核心功能。在《计算机操作系统》课程的实验过程中，要求学生做到：（1）预习实验指导书有关部分，认真做好实验内容的准备，就实验可能出现的情况提前做出思考和分析。（2）按照使用要求认真编写程序，要求有写出程序设计说明，给出程序清单（包括可读性好的注释）。（3）认真书写实验报告，并在规定的时间内提交实验报告。（4）遵守机房纪律，服从辅导教师指挥，爱护实验设备。（5）分散上机时，积极主动上机完成任务；集中检查时，不迟到。如有事不能出席，所缺实验一般不补。（6 ）实验的验收将分为两个部分。第一部分是上机操作，包括检查程序运行和即时提问。第二部分是提交书面的实验报告。计算机操作系统实验将采用阶段检查方式，每个实验都将应当在

规定的时间内完成并检查通过，过期视为未完成该实验，不计成绩以避免期末集中检查方式产生的诸多不良问题，希望同学们抓紧时间，合理安排，认真完成实验1.操作系统安装检查点实验目的：通过动手安装linux虚拟机，来初步了解一个操作系统。了解虚拟机，能够在windows下安装linux系统和对VMware虚拟机进行操作。实验要求：在windows平台下使用Vmware安装Linux系统：fedora-15。完成安装后提交两个截图并回答linux相关的问题。实验步骤：（1）在ftp://202.207.12.210/labs/lab1/ 文件夹中下载fedota-15 安装包。（2）安装VMwareworkstation :实验室中自带了VMwarestation ACE 如果上机时系统自带VMwareW可跳过下列安装步骤。 1.双击VMware-workstation-full-7.1.4-385536 ， 2.在弹出界面：安装提示和版权警告中选择“ next ”； 3.在安装类型界面中的安装类型：Typical （典型）和Custom （自定义）中

8财务岗位职责描述介绍

岗位职责描述介绍岗位名称：出纳直接上级：财务管理部经理直接下级：无岗位职责： 1、严格遵守并执行公司财务规章制度； 2、负责公司的现金存取以及现金划汇工作，并每日记录好现金日记账； 3、负责公司银行存款，支取的核算，以及银行电汇的收付跟踪记录，并锁定当日记录； 4、负责每旬结算现金，银行存款收支报表上交会计处，并结算可用金额上报上级领导； 5、负责保守本部门所掌握的公司秘密。岗位权限： 1、对违反财务制度的行为有拒绝受理权和报告权； 2、对于本部门有意挪用公款，贪污行为者有检举权； 3、对须提供汇划资料的有对其资料的索要权。批准审核制定陈跃玲

岗位名称：外汇核销员、出口退税办税员直接上级：财务管理部经理直接下级：岗位职责： 1、负责申请外汇核销单，出口发票的领购工作； 2、银行的外汇结算、申报； 3、办理银行贷款，承兑汇票兑现的有关事项； 4、办理外汇核销事务及定期报表的申报。 5、办理出口退税的申报工作； 6、负责电子口岸，外汇核销的计算机操作， 7、负责外汇核销，出口退税资料的登记、跟踪、保管工作， 8、办理公司的变更的一切手续； 9、办理外经贸局、外汇局、银行、财政局、商检、工商、税务等部门涉及的各项工作。岗位权限：批准审核制定许俊波

岗位名称：记账员、办税员、税控机操作员直接上级：财务管理部经理直接下级：岗位职责： 1、严格执行公司财务会计制度； 2、经会计主管审核公司费用原始凭证后登记费用明细账及总账； 3、购买、保管发票，登记发票登记本的进账、领用、结存栏目及开票情况按时到税务机构进项发票认证及报税； 4、负责报送税务局自调函资料； 5、负责保守本部门所掌握的公司秘密； 6、完成上级交办的其它事务；岗位权限： 1、对违反财务会计制度的行为有拒绝受理和报告权； 2、对违反增值税发票管理制度的拒绝受理权。批准审核制定薛燕芬

如何描述岗位职责

如何描述岗位职责一、职责界定常规技巧岗位职责描述工作的常规技巧主要有四个： 1.描述岗位的工作成果在进行岗位职责描述时，描述的应该是工作成果而非工作过程。比如，企业人力资源部经理的岗位职责，应该是“组织人力资源规划，确保企业人力资源工作有计划展开”，而非“具体编写就业指导书，规定人力资源工作的开展步骤”。 2.描述单独的、不同的最终结果任何一个岗位的职责说明描述的都是单独的、不同的最终工作结果，即岗位职责不能有交叉和重叠。 3.岗位职责没有时限，岗位变，职责不变岗位的职责是没有时限的，因此不要把临时性的工作任务写入岗位职责。 4.岗位说明不超过8项每个岗位的职责说明一般不超过8项。岗位职责描述不是撰写工作流程说明书，无需长篇累牍罗列具体操作步骤，而是要通过言简意赅的描述，让人对岗位的最终职责一目了然。二、岗位说明书中的职责说明撰写技巧在岗位说明书中，描述职责时一般采用两种固定句型： 1.总是起始于动词总是起始于动词的句型，其基本结构为：动词+对象/目的、结果。以人力资源部经理的职位为例，其职责应描述为：组织（动词）公司人力资源规划工作（对象），确保人力资源工作与公司年度经营计划相匹配（结果）。 2.依据职责实施的前提依据

依据职责实施的前提依据的句型，其基本结构为：依据职责实施的前提依据，动词+对象/目的，结果。这种句型常在前提依据必须要阐述的情况下使用。比如，某企业在制定司机岗位职责时，发现除所有司机都需承担的工作外，还有一项驾照年检工作，由于此项工作每年会由不同的人负责，因此，该企业在对这项职责的描述时加上了“根据上级安排”。三、如何将日常任务转化为职责每个岗位都有着千头万绪的日常工作，因此，企业在描述岗位职责时要善于总结，将日常任务转化为职责。 1.同样的产出只为一个产出结果岗位所有日常工作都是为了实现最终的岗位责任目标，即同样的产出只为一个产出结果，据此可以把日常工作职责合并成岗位职责。比如，某企业招聘经理时有接收记录辞职信、确定空缺岗位和每周更新空缺岗位三项日常工作任务，这三项产出全是为了同一个产出结果——便于报聘和行使人事职能，因此该企业将该招聘经理的三项日常工作合并，总结成一项岗位职责，即“保管空缺岗位表，便于进行报聘及行使其他人事职能。” 在岗位职责描述中，对于产出结果的描述非常重要，它是企业绩效考核的重要依据。 2.句型的表述不超过8条岗位说明书的内容一般不超过8条，为了同一产出结果的日常工作可合并进行描述。 3.职责描述中常用动词的使用方法岗位职责描述需要使用动词。岗位不同，使用的动词也不同：对企业高管的职责描述，多使用“批准”、“指导”、“授权”、“建立”、“制定”、“规划”、“决定”、“准备”、“发展”等动词；对企业中层管理者的职责描述，多使用“达成”、“增进”、“评估”、“建立”、“赢得”等动词；对于那些具有专业性和涉及资源管理的岗位，“分析”、“辨明”、“鉴别”、“界定”等动词更适用。需注意的是，进行岗位职责描述时，应避免使用“负责”、“管理”这类表述模糊的动词。因为同一职责可能会涉及多个部门或多个岗位，使用模糊性的词语容易造成职责不明或职责的主次不分。

智能手机操作系统介绍

智能手机操作系统介绍一、Syｍbiａｎ操作系统介绍 Symbian操作系统(“塞班系统”)是由摩托罗拉、西门子、诺基亚等几家大型移动通讯设备商共同出资组建的一个合资公司，专门研发手机操作系统。而Symbiaｎ操作系统的前身是EＰOC，而EPOC是Eｌec ｔronic Piecｅof Cｈeｅse取第一个字母而来的，其原意为＂使用电子产品时可以像吃乳酪一样简单",这就是它在设计时所坚持的理念。现已被NOKIA全额收购。 Symbiaｎ操作系统在智能移动终端上拥有强大的应用程序以及通信能力,这都要归功于它有一个非常健全的核心－强大的对象导向系统、企业用标准通信传输协议以及完美的sunjａvａ语言。Syｍbian认为无线通讯装置除了要提供声音沟通的功能外,同时也应具有其它种沟通方式,如触笔、键盘等。在硬件设计上,它可以提供许多不同风格的外型，像使用真实或虚拟的键盘，在软件功能上可以容纳许多功能,包括和他人互相分享信息、浏览网页、传输、接收电子信件、传真以及个人生活行程管理等。此外，Symｂiａn 操作系统在扩展性方面为制造商预留了多种接口,而且EPOC操作系统还可以细分成三种类型:Peａｒl/Quartz/Ｃｒyｓtal,分别对应普通手机、智能手机、ＨandHeld PC场合的应用。 Sｙmｂian是一个实时性、多任务的纯３2位操作系统，具有功耗低、内存占用少等特点,非常适合手机等移动设备使用,经过不断完善,可以支持GPRＳ、蓝芽、SyｎｃMＬ、以及3Ｇ技术。最重要的是它是一个标准化的开放式平台,任何人都可以为支持Ｓｙｍｂｉan的设备开发软件。与微软产品不同的是,Symb ｉaｎ将移动设备的通用技术，也就是操作系统的内核,与图形用户界面技术分开,能很好的适应不同方式输入的平台,也可以使厂商可以为自己的产品制作更加友好的操作界面,符合个性化的潮流,这也是用户能见到不同样子的symbian系统的主要原因。现在为这个平台开发的ｊavａ程序已经开始在互联网上盛行。用户可以通过安装这些软件,扩展手机功能。基于Ｓyｍbｉan的UＩ目前根据人机界面的不同,Sｙmbian体系的UI(UseｒIｎteｒｆａce 用户界面）平台分为S erｉｅs６0、Ｓｅｒiｅs80、Ｓeriｅｓ90、UIＱ等。为了更强力地支持Syｍbian平台,Noｋia在2001年成立NokiaMｏbiｌeSoftｗare新部门,全力发展移动通信相关的软件。为了让手机厂商有更多的选择以投入Symbian手机的开发,Nokia发展出三种不同的用户界面：Ｓeｒｉes 60/80／9０。Serｉｅs６0主要是给数字键盘手机用,Ｓｅri ｅｓ80是为完整键盘所设计,Serieｓ90则是为触控笔方式而设计。另外一个重要的平台是由Symbiaｎ百分之百转投资的UIQTechnolｏgy所开发出来的UＩQ。 Nokia开发的UI平台 Series 20/3０多为低端手机所采用,Serieｓ40多为中端商务手机所使用支持Ｊａva的扩展,Serｉｅｓ６0／80/90是为采用Symbiａn系统的中高端智能手机和高端商务手机而设计。 Series ２0: 8４x48 像素Ｓeries 30 ：96x6５像素,型号一般为1系列，手机型号如:１100,１１08.．.... Seｒies 40 : ９6x6８或128x1２8 像素, 手机型号如:11１2,1116，1１1０i,6230，７2１0，26１0,６020．．... Sｅｒies 60 : 支持多种分辨率，机型如:53１0，５320xm,6122c,６220ｃ,N95...... 基本定位于单手操纵设备,至今为止,分为第一版第二版和第三版还有最新的第五版,第五版为触屏版本，包括诺基亚Ｅ77,N9７,5８00XM，５802ＸM,5530XM,三星i891０,第三版还分为预FP1(ＭR),ＦP1,ＦP２,支持１76X2０8，２40ｘ３2０、３５２x4１６分辨率,五方向键，两个功能键。使用Sｙｍbian OS