当前位置：文档库 › 科学文献

科学文献

IRIS at TREC-7

Kiduk Yang, Kelly Maglaughlin, Lokman Meho, and Robert G. Sumner, Jr.

School of Information and Library Science

University of North Carolina

Chapel Hill, NC 27599-3360 USA

{yangk, maglk, mehol, sumnr}@https://www.wendangku.net/doc/6a8440567.html,

0Submitted Runs

unc7aal1, unc7aal2 – Category A, automatic ad-hoc task run (long query)

unc7ias,unc7iap – interactive track runs

1 Introduction

In our TREC-5 ad-hoc experiment, we tested two relevance feedback models, an adaptive linear model and a probabilistic model, using massive feedback query expansion (Sumner & Shaw, 1997). For our TREC-6 interactive experiment, we developed an interactive retrieval system called IRIS (Information Retrieval Interactive System1), which implemented modified versions of the feedback models with a three-valued scale of relevance and reduced feedback query expansion (Sumner, Yang, Akers & Shaw, 1998). The goal of the IRIS design was to provide users with ample opportunities to interact with the system throughout the search process. For example, users could supplement the initial query by choosing from a list of statistically significant, two-word collocations, or add and delete query terms as well as change their weights at each search iteration. Unfortunately, it was difficult to tell how much effect each IRIS feature had on the retrieval outcome due to such factors as strong searcher effect and major differences between the experimental and control systems.

In our TREC-7 interactive experiment, we attempted to isolate the effect of a given system feature by making the experimental and control systems identical, save for the feature we were studying. In one interactive experiment, the difference between the experimental and control systems was the display and modification capability of term weights. In another experiment, the difference was relevance feedback by passage versus document.

For the TREC-7 ad-hoc task, we wanted to examine the effectiveness of relevance feedback using a subcollection in order to lay the groundwork for future participation in the Very Large Corpus experiment. Though the pre-test results showed the retrieval effectiveness of a subcollection approach to be competitive with a whole collection approach, we were not able to execute the subcollection retrieval in the actual ad-hoc experiment due to hardware problems. Instead, our ad-hoc experiment consisted of a simple initial retrieval run and a pseudo-relevance feedback run using the top 5 documents as relevant and the 100th document as non-relevant.

Though the precision was high in the top few documents, the ad-hoc results were below average by TREC measures as expected. In the interactive experiment, the passage feedback results were better than the document feedback results, and the results of the simple interface system that did not display query term weights were better than that of the more complex interface system that displayed query term weights and allowed users to change these weights. Overall interactive results were about average among participants.

1A prior version of IRIS was developed by Kiduk Yang, Kristin Chaffin, Sean Semone, and Lisa Wilcox at the School of Information and Library Science (SILS) at the University of North Carolina. They worked under the supervision of William Shaw and Robert Losee.

2 Key Components of IRIS

2.1 Text Processing

IRIS processes the text first by removing punctuation, and then excluding the 390 high-frequency terms listed in the WAIS default stopwords list as well as “IRIS stopwords,” which are defined as all numeric words, words that start with a special character, words consisting of more than 25 non-special characters, and words with embedded special characters other than a period, apostrophe, hyphen, underline, or forward or backward slash. The IRIS stopwords definition was arrived at by examining the inverted index and identifying low frequency terms that appeared meaningless. The removal of IRIS stopwords reduced the number of unique terms by over 25% (401,423 to 295,257 in the Financial Times collection), which can effect considerable savings in machine resources. Such savings can be a significant factor when dealing with massive collections.

After the initial processing step described above, IRIS conflates each word by applying one of the four stemmers implemented in the IRIS Nice Stemmer module,2 which consists of a simple plural remover (Frakes & Baeza-Yates, 1992, chap. 8), the Porter stemmer (Porter, 1980), the modified Krovetz inflectional stemmer, and the Combo stemmer. The modified Krovetz inflectional stemmer implements a modified version of Krovetz’s inflectional stemmer algorithm (Krovetz, 1993) and restores the root form of plural (“-s,” “-es,” “-ies”), past tense (“-ed”), and present participle (“-ing”) words, provided this root form is in our online dictionary. Though this stemmer’s conservative conflation approach can be advantageous over suffix-removal stemmers that can adversely affect precision by overstemming, it can also cause lower recall by understemming, since the morphological variations targeted for conflation are few. The Combo stemmer attempts to minimize the disadvantages of both understemming and overstemming by taking as the final result the shortest whole word (i.e., word that appears in a dictionary) returned by the three stemmers. For example, the Krovetz stemmer does not conflate “disappointment” and “goodness,” and the Porter Stemmer overconflates “ponies,” “agreed” and “troubling” to “poni,” “agre,” and “troubl,” but the Combo stemmer correctly stems these words to “disappoint,” “good,” “pony,” “agree,” and “trouble.”3

Unfortunately, the Combo stemmer’s computational cost is very high due to its multiple dictionary lookup per word. Given the resource limitations at SILS relative to the size of the TREC-7 collection and the fact that the effectiveness of the Combo stemmer has not yet been fully tested, we opted for the modified Krovetz stemmer as the default stemmer for the TREC-7 experiments.

2.2 Phrase Construction

In our TREC-6 experiments, we constructed a statistically significant, two-word collocation index by extracting co-occurring word pairs within a window of 4 words (Haas & Losee, 1994; Losee, 1994; Martin, Al, & van Sterkenburg, 1983; Phillips, 1985) and selecting those that co-occur with statistically significant frequency (Berry-Rogghe, 1974). Though this collocation index worked very well in some cases, its overall effect on retrieval effectiveness did not appear to be significant (Sumner et. al., 1998). Furthermore, the computational cost of constructing the collocation index was quite high.

Consequently, we tried another approach to constructing a phrase index in TREC-7. Using the online dictionary and the clause recognition algorithm built into the Nice Stemmer, we constructed a two-word noun-noun phrase index by first extracting adjacent word pairs of noun and proper noun combinations within a clause,4 and then discarding the phrases occurring 20 or less times in the collection to reduce indexing time and to conserve computer resources. The phrase occurrence frequency threshold of 20 was arrived at by selecting the number that produced the phrase index whose size was most comparable to that of the collocation index. To augment the proper nouns in the online dictionary, all capitalized words not occurring at the beginning of a sentence were considered to be proper nouns. Since the Krovetz stemmer does not conflate hyphenated words, hyphenated words were broken up and stemmed by

2 Nice stemmer was implemented by Kiduk Yang, Danqi Song, Woo-Seob Jeong, and Rong Tang at SILS at UNC.

3 For interactive comparison of these stemmers, please visit https://www.wendangku.net/doc/6a8440567.html,/iris/nstem.htm.

4IRIS identifies a clause boundary by the presence of appropriate punctuation marks such as a comma, period, semicolon, question mark, or exclamation mark.

the simple plural remover before the noun-noun phrase construction module was applied. Hyphenated words in their raw form (i.e. as they appear in documents sans punctuation) were added to the index as well.

2.3 Ranking Function and Term Weights

IRIS ranks the retrieved documents in decreasing order of the inner product of document and query vectors,

ik t k k i d q ∑==1T

d q ,(1)

where q k is the weight of term k in the query, d ik is the weight of term k in document i , and t is the number of terms in the index. We used SMART Lnu weights for document terms (Buckley, Singhal, Mitra, & Salton, 1996; Buckley,Singhal, & Mitra, 1997), and SMART ltc weights (Buckley, C., Salton, G., Allan, J., & Singhal, A., 1995) for query terms. Lnu weights attempt to match the probability of retrieval given a document length with the probability of relevance given that length (Singhal, Buckley, & Mitra, 1996). Our implementation of Lnu weights was the same as that of Buckley et al. (1996, 1997) except for the value of the slope in the formula, which is an adjustable parameter whose optimal value may depend, in part, on the properties of the document collection.

According to the pre-test experiments, an Lnu slope of 0.5 performed best with feedback, especially when using both single term and phrase indexes. In initial retrieval without any feedback, however, a slope of 0.2 or 0.3 showed best results. Based on these findings, we used a slope of 0.3 in the ad-hoc experiment to optimize the initial retrieval results, but used a slope of 0.5 in the interactive experiment to optimize performance with feedback.

2.4 Feedback Models

2.4.1 Adaptive Linear Model

Currently, the default relevance feedback model of IRIS is the adaptive linear model (Bollmann & Wong, 1987;Wong & Yao, 1990; Wong, Yao, & Bollmann, 1988; Wong, Yao, Salton, & Buckley, 1991). The basic approach of the adaptive linear model, which is based on the concept of the preference relation from decision theory (Fishburn,1970), is to find a solution vector that, given any two documents in the collection, will rank a more-preferred document before a less-preferred one (Wong et al., 1988).

The goal of the adaptive linear model, in essence, is to construct a query vector that ranks the entire document collection according to the user’s preferences. Since the user’s preferences are not usually known for the whole collection, however, we can only create a solution vector for the training set T , which is the cumulative set of documents retrieved and evaluated by the user. As knowledge of the user’s preferences accumulates with relevance feedback iterations, one can expect the solution vector for T to more accurately rank the entire collection (Wong &Yao, 1990).

The error-correction procedure we used to find a solution vector for T in our TREC experiments is based on the procedure used by Wong et al. (1991). The error-correction procedure begins with a starting vector q (0) and repeats the cycle of “error-correction” until a solution vector is found. The error-correction cycle i is defined by

q (i+1) = q (i ) + αb ,(2)

where α is a constant, and b is the difference vector resulting from subtracting a less-preferred document vector from a more preferred one. (For details about how this difference vector is chosen, see Sumner et al., 1998.) The choices for the constant α and the starting vector q (0) are very important since they can influence not only the composition of thesolution vector but also the number of error-correction cycles needed to arrive at it Different choices have been made for α and q (0) in our TREC-5, TREC-6, and TREC-7 experiments (Sumner & Shaw, 1997, Sumner et al., 1998).

In the relevance feedback interface of IRIS, users can evaluate documents as “relevant,” “marginally relevant,” or “nonrelevant.” By adapting the concept of the user preference relation to extend the relevance scale from a binary to a

three-valued scale, we constructed the following formula for the starting vector. Note that this formula can be adjusted for any multivalued relevance scale:

∑∑∑?++=nonrel new nonrel new mrel new mrel new rel new rel new rk N c N c N c c d d d q q 321

0(0),(3)

where q rk is the query vector that produced the current ranking of documents; c 0, c 1, c 2, and c 3 are constants; N new rel ,N new mrel , and N new nonrel are the number of new relevant, new marginally relevant, and new nonrelevant documents respectively in the current iteration; and the summations, are over the appropriate new documents. This formula is similar to the relevance feedback formulas used by Rocchio (1971) and Salton and Buckley (1990). A “new”document during a given search iteration is one that was not retrieved and evaluated during a previous iteration.Alternatively, it may also be a document that was retrieved and evaluated in a previous iteration, but whose relevance judgement was changed in the current iteration.

Because every new document vector already contributes to the starting vector (Equation 3), we used a value of α= 0.5 (Equation 2) to reduce the influence of any one new document. The value of c 2 = 0.6 was chosen so that a marginally relevant document could still contribute to the final query vector even after being subtracted in the error correction procedure (c 2 - α = 0.1). We set c 1 = 1.2 so that the influence of relevant documents would be twice that of marginally relevant ones and set c 3 = 0.6 for internal consistency. Though we used c 0 = 1.0 in TREC-6 experiments,we adjusted it to the value of 0.2 in the TREC-7 interactive experiment to reduce the influence of the initial query.We noticed in our post-TREC6 experiments that the influence of the initial query tended to overshadow the user feedback, and consequently set c 0 = 0.2, which seemed to make the system more responsive to the user feedback. In the ad-hoc experiment, however, we used the value of c 0 = 1.0 since the importance of pseudo-feedback for that task was viewed as minimal.

2.4.2 Probabilistic Model

In addition to the adaptive linear model, a variation of the binary probabilistic feedback model that accommodates three levels of relevance judgments is implemented in IRIS. As is the case with the adaptive linear model, this probabilistic model with the graded relevance formula (Yang & Yang, 1997) can be adjusted for any multivalued relevance scale including the binary relevance scale.

According to the TREC-7 pre-tests as well as our past TREC results, our implementation of the adaptive linear model performed consistently better than that of the probabilistic model when using binary relevance feedback. The findings of the TREC-6 interactive experiment regarding the comparative performances of the two feedback models using the three-valued relevance scale is inconclusive due to other factors such as searcher effect. Given these considerations and our resource limitations, we decided to exclude the probabilistic model from the actual TREC-7experiments. The detailed description of the probabilistic model can be found in Sumner et. al. (1998).

2.4.3 Passage Feedback Model

The conventional relevance feedback models assume the user’s relevance judgement, whether binary or multi-valued,to be about an entire document. The unit of a document, however, can sometimes be arbitrary, as in the case of web documents whose boundaries are often determined for reasons of convenience or efficiency rather than content, or can contain subsections of various information content as in Congressional Record and Federal Register documents. The findings from passage feedback research (Melucci, 1998) as well as from comments made by IRIS users at large indicate that determination of relevance is sometimes based on certain portions of a document rather than the entirety of it.

To test out this theory in TREC-7 experiments, we implemented in IRIS a third relevance feedback model called the “passage feedback model”. The formula for feedback vector creation in the passage feedback model looks almost identical to the “Ide regular” formular (Ide, 1971; Ide & S alton, 1971, Salton & Buckley, 1990),except where the document vector d is replaced by p, the passage vector.

∑∑?

+=nonrel rel p p q q old new (4)

Since the normalization factor of the Lnu weight is based on document length, an inverse document frequency (Spark Jones, 1972) weight was used for the passage vector p . Inverse document frequency weight is computed by log of N /d k , where N is the number of documents in collection and d k is the number of documents in which term k appears.

In the interactive setting of the IRIS passage feedback interface, the unit of passage is determined by the user,who can simply highlight the relevant and nonrelevant portions of documents with a mouse. Passage feedback can also be implemented by automatically selecting passages with high frequencies of matching query terms, though the automatic determination of nonrelevant passages is not possible with this approach. Such automatic passage feedback approach may be useful if activated after the initial feedback so as to expand the initial query with related terms.

The passage feedback approach differs fundamentally from the philosophy of the adaptive linear model and the probabilistic model. Regardless of whether a document or passage is used as the unit of feedback, the passage feedback model does not attempt, in principle, to rank a document collection in the preference or relevance order defined by a training set. Instead, the passage feedback model, similar to conventional vector space models, simply expands the query vector to make it more “similar” to relevant passages and “dissimilar” to nonrelevant passages.3 Pre-test Experiments

In our TREC-7 pre-test experiments, we chose to examine the effects of 4 main system components of representation (single term vs. phrases), term weighting (normalization slope), feedback model (adaptive linear vs. probabilistic),and feedback query expansion size (full-expansion, 300 terms, 30 terms). As a preparation for potential future efforts to scale up to massive document collections, we also examined the effectivenss of relevance feedback using a subcollection. The FT collection with TREC-6 queries and relevance judgements was used in these experiments, and many system design decisions in both ad-hoc and interactive experiments were based on the findings from the pre-test results.

3.1. System Component Tests

3.1.1 Experiment Design

Prior experiments, both in and outside of TREC, have shown the use of syntactic phrases to be only marginally

effective (Salton, 1968; Dillon & Gray, 1983; Lewis, Croft & Bhandaru, 1989). However, most of the findings were based on the performance of initial retrieval only and did not investigate the effect of automatically expanding the feedback query with phrase index terms.5

Though Lnu weights with a slope of 0.2 proved effective in both TREC-4 and TREC-5 (Buckley et. al., 1996;Buckley, Singhal, & Mitra, 1997), we found a slope of 0.3 to be more effective with respect to initial retrieval in our TREC-6 experiments (Sumner et. al., 1998). As is the case with phrases, Lnu weight experiments did not investigate its effects on retrieval beyond the first feedback iteration.

In our past TREC experiments, we compared the performances of the adaptive linear model (ALM) and probabilistic model (PM) in relevance feedback, and though ALM generally outperformed PM, we noticed distinctly different retrieval patterns between the two models, which warranted further investigation (Sumner & Shaw, 1997;Sumner et. al., 1998).

In TREC-6, we also compared the performance of a fully expanded feedback vector with that of a shorter feedback vector, namely one with the top 250 positive-weighted terms and the lowest 50 negative-weighted terms.Previous routing and adhoc pseudo-feedback experiments in TREC have shown that effectiveness improves linearly with the log of the number of added terms, with the point of diminishing improvement at 300 terms (Buckley et. al.,1995). This was in direct contrast to our results in TREC-6, which, though somewhat suspect due to a system bug,indicated superior performance of the fully expanded feedback vector over the 300 term feedback vector. However, 5 Here, routing and filtering experiments are not considered due to the different nature of those tasks and the ad hoc task.

the advantage gained by full expansion of the feedback vector was marginal and the shorter feedback vector performed reasonably well given its size, which was about one tenth that of the full feedback vector. In addition to reconfirming our previous findings regarding feedback query size, we wanted to investigate the retrieval performance level of an even shorter feedback query that consisted of the top 25 positive-weighted terms and the lowest 5 negative-weighted terms—just to see how much gain in efficiency can be achieved without sacrificing too much in effectiveness.

In order to identify the optimum retrieval component combinations of representation, normalization weight,feedback model, and feedback query size, one of us (Yang) conducted an experiment using 5 retrieval iterations (4feedback iterations with a feedback window of 20 documents) with 48 retrieval model combinations as outlined below. A feedback window of 20 documents means that the top 20 previously unretrieved documents of the current ranking are added to the training set. A feedback window of 20 documents and 5 retrieval iterations were chosen to simulate the capacity of a human searcher based on the data from TREC-6 experiments.

2 * 4 * 2 *

3 = 48 Retrieval Model combinations

The retrieval results of all the model combinations were then compared using optimum effectiveness (F) in the top 20 documents retrieved as well as using standard TREC evaluation metrics. We chose these evaluation measures because optimum F, which represents the optimum performance level of the top 20 retrieved documents, and TREC metrics, which signify the overall performance level of the top 1000 retrieved documents, tend to complement each other.

TREC evaluation measures used were average precision across all relevant documents, R-precision, and the total number of relevant documents retrieved in the top 1000 documents. Optimum F is the highest F value in all retrieval iterations, where F is computed from recall and precision (Shaw, 1986; van Rijsbergen, 1979) by the formula,

P R F 112

+=. (5)

3.1.2 Results

The analysis of retrieval results by all evaluation measures used showed a consistent pattern of improved retrieval performance with the larger feedback query. The difference in performance between the 30 term feedback vector and the 300 term feedback vector, however, was much greater than that between the 300 term feedback vector and the full feedback vector. As a matter of fact, the reduction in performance by limiting the feedback vector to 300 terms was almost negligible, whereas significant loss of performance occurred by reducing the feedback vector to 30 terms.

Though there were slight variations across evaluation methods, an Lnu slope of 0.5 seemed to be most advantageous for ALM and an Lnu slope of 0.2 seemed to perform best with PM. Using phrases in feedback as well as single terms resulted in slightly improved retrieval performances by both feedback models, which suggested that using phrases in feedback provides some utility though less than one might hope for.

As was the case in our previous TREC experiments, ALM consistently outperformed PM across all evaluation measures. The difference between the two models was most prominent in the number of relevant documents retrieved in the top 1000 documents, where ALM retrieved hundreds more relevant documents than PM. Upon closer inspection of ALM and PM, we discovered a pattern of “failure” by PM, where PM’s feedback query formulation strategy of selecting terms from relevant documents would stagnate the performance of feedback when no more

relevant documents could be found. ALM, on the other hand, would continue expanding the feedback vector in its attempt to find the solution vector (Nilsson, 1966, Ch. 4; Wong et al., 1988; Wong et. al. 1991).

At this point, we devised the “Adaptive Probabilistic Model” (APM), which will keep adding terms from top-ranked non-relevant documents until finding the solution vector that will rank a more-preferred document before a less-preferred one (Wong et al., 1988). Due to time constraints, however, we did not engage in full-scale retrieval experiments with APM. Instead we tested APM in a limited fashion, which resulted in only a marginal improvement of retrieval performance.

3.2.Subcollection Tests

3.2.1Experiment Design

One of the immediate challenges in the field of Information Retrieval is effective and efficient handling of massive document collections. When dealing with massive document collections, the conventional IR approach of ranking the entire document collection by document-query similarity scores becomes extremely resource-intensive, especially with relevance feedback, where retrieval cycles have to be repeated with expanded query vectors.

One way to deal with massive data may be to create a subcollection, which is small enough to be efficient and yet large enough to contain most of the relevant documents. Once such a subcollection has been created, we can not only refine the search with relevance feedback at relatively small cost, but can also continue to refine and/or update the subcollection by periodically resubmitting to the entire collection the reformulated query created using the subcollection. The main question in subcollection IR is twofold; First, how do we create a subcollection with high enough recall and small enough size? Second, is the retrieval performance of an optimum subcollection competitive to that of the whole collection? In order to investigate these questions, we explored various subcollection creation methods to identify the optimum subcollection creation method, after which we compared its retrieval performance with that of using the whole collection.

The first objective of subcollection creation is to maximize recall at some optimum document rank N, so that the subcollection is small enough to be efficient while containing enough relevant data to be effective. In addition to applying the optimum retrieval component combinations identified in the system component tests, we implemented combinations of document-reranking methods to retrieve relevant documents that may not necessarily contain any initial query terms. Initial retrieval, being essentially a Boolean OR retrieval, will only retrieve documents that contain at least one query term. Thus, a poorly formulated initial query will tend to not retrieve many relevant documents that may include only synonyms or related concept terms.

One way to overcome this problem is to expand the query vector with synonyms or related concept terms as well as using word-stems to conflate the morphological variations. Relevance feedback also expands the query vector indirectly with synonyms and related concept terms often contained in the body of relevant documents, though the effect may not be as precise as using a thesaurus or other such natural language processing methods. Consequently, we experimented with expanding the initial query with noun-noun phrases as well as expanding it by applying the “pseudo-relevance feedback” method of assuming that the top 5 documents are relevant and the 100th document is non-relevant. Variations on this method of expanding the initial query by pseudo-relevance feedback (using terms from the top n documents) have been used by top performing participants in past TREC ad-hoc experiments (Buckley et. al., 1995; Voorhees & Harman, 1997; Voorhees & Harman, 1998).

In addition to query expansion by phrases and automatic feedback, we also tested query expansion methods by using passages6 with matching initial query terms. Three variations of query expansion by passage feedback were tested by selecting terms from only the “relevant” passages, terms from relevant passages and top-ranked “non-relevant” passages, and terms from all passages in the top 100 documents retrieved. These subcollection creation methods along with the baseline method of initial retrieval with the unexpanded original query were then investigated by comparing recall at various document ranks up to the rank of 21,000 (10 % of FT collection).

After identifying the optimum subcollection creation method and cutoff, we created 47 subcollections, one for each query7, and recomputed their collection statistics, namely Lnu weights for documents, and ltc weights for queries. We then performed 5 retrieval iterations with a feedback window of 20 documents using selected retrieval 6IRIS identifies a passage boundary by SGML tags or a clause break followed by a carriage return.

7 Three TREC-6 queries that did not have any relevant FT documents were dropped from pre-test experiments.

models from the system component tests, and compared the performance of subcollection retrieval with that of whole collection retrieval. The same evaluation metrics used in the analysis of system component test results were applied to evaluate the performance of subcollection retrieval.

3.2.2Results

According to recall values at fixed document ranks, the top-performing subcollection creation method was pseudo-relevance feedback by ALM, though average recall (recall averaged over queries) at document rank 5000 was the same for the top 4 methods. Interestingly enough, the baseline method was one of the top 4 methods, performing only slightly below the methods of initial query expansion by phrases and feedback by ALM. The results of passage feedback methods (PFM) were disappointing. However, poor performance of PFM could be due to an improper threshold of “relevant” passage identification (e.g. passages with n or more query terms). The hypothesis that a relevant passage would include related terms and concepts is critically dependent on the correct identification of relevant passages. Since the top 4 subcollection creation methods all achieved average recall of 0.87 at 5000 documents, which is only 2.4% of the total FT collection, we chose the optimum cutoff at 5000 and decided on the simplest method (i.e. baseline initial retrieval with single term queries) to create the subcollections.

The comparison of the subcollection retrieval results with the whole collection retrieval results showed an interesting difference between ALM and PM. The performance of PM using a subcollection was better than that of PM using the whole collection, whereas ALM’s performance deteriorated slightly with subcollection retrieval. Overall performance of ALM, however, was again superior to that of PM, though the gap in performance between ALM and PM was much narrower in subcollection retrieval than whole collection retrieval.

Different behaviors of ALM and PM may be attributed to fundamental differences in feedback query formulation between the two models. ALM starts out with the initial query vector and keeps adding and subtracting terms to find the solution vector, which is a radically different approach from PM’s strategy of estimating the probability of term occurrence in all relevant/nonrelevant documents from its occurrence characteristics in a training set. ALM’s feedback vector is firmly anchored with the initial query terms and is more resilient to improper and/or insufficient feedback evaluations, whereas PM’s feedback vector can be affected severely by bad relevance judgements and small training sets. It is therefore reasonable to assume that PM will perform better as the ratio of the training set size to the document collection size increases, as in the case of the subcollection retrieval.

The overall results of relevance feedback using subcollections has shown it to be almost as effective as using the whole collection while being much more efficient. However, by virtue of the fact that only the top 100 documents of various TREC runs are evaluated for a given set of topics, the TREC test collection may be inherently put together to show high recall for the top N documents, given N is sufficiently large enough. Accordingly, it is difficult to tell how much of these good results using subcollections are due to TREC bias, and whether the optimum subcollection creation method of using simple initial retrieval or ALM pseudo-relevance feedback at 2 or 3% of total collection cutoff will still be applicable in other instances. Though high recall value at such a low rank (under 3% of the total document collection) is somewhat suspect due to the potential bias introduced by the TREC pooling method of relevant document identification (Voorhees & Harman, 1997), it is still reasonable to think that subcollection IR can be an effective as well as efficient way to deal with the problem of massive document collections.

4 Ad-hoc Experiment

4.1Research Question

As a natural consequence of our belief that the user is an integral component of a truly effective information retrieval system, our approach to information retrieval centers on various ways to involve the user and then to effectively incorporate the user contribution into the search process. Thus, our main goal of the ad-hoc experiment was to explore methods of preparing the system for such an eventuality. More specifically, we wanted to examine a strategy for creating a subset of a document collection to be used in relevance feedback.

One obvious advantage of using a subcollection is the reduced computational cost. If it can be shown that the retrieval effectiveness of using a subcollection is competitive to that of using a whole collection, then subcollection

retrieval may be a desirable strategy when iteratively querying (e.g. relevance feedback, query refinement) a document collection (Sumner & Shaw, 1997) that is massive or composed of distributed collections, where collection statistics for the whole collection are not known. A less obvious advantage of a subcollection might be its increased homogeneity. Being more densely populated with relevant documents that are likely to be topically similar, a subcollection may be more responsive to a refined query than a whole collection with diverse subject matter. For example, “court rulings on the use of peyote” queried against the entire Web document collection may retrieve documents about either courts or peyote, whereas the same query submitted to a subcollection of legal documents may boost those documents specifically about court rulings on the use of peyote to the top of the document ranking (Sumner, Yang & Dempsey, 1998).

For these potential advantages to be realized, a subcollection has to be small enough to incur savings in computational cost while at the same time contain enough relevant document to be effective. Thus, we are interested in finding answers to the following questions regarding subcollection retrieval strategy.

?What is the best way to create a subcollection?

?How effective is subcollection retrieval compared to whole collection retrieval?

The focus of our ad-hoc experiment, therefore, was on achieving high recall at some reasonable document rank in order to create an effective and efficient subcollection for relevance feedback.

4.2Research Design

The constitution of the ad-hoc collection compounds the subcollection question. Since the ad-hoc collection is made up of four document collections, subcollection creation methods can be applied to the document collections separately or as a whole. If subcollection creation methods are applied to individual collections, then the question of how the results should be merged must be addressed. In previous research on this “collection fusion” problem, various strategies were employed to compensate for the potential incomparability of query-document similarity scores across collections (Callan, Lu, & Croft, 1995; Voorhees, Gupta, & Johnson-Laird, 1995; Savoy, Calve, & Vrajitoru, 1997).

Though the “raw score” merging method can be problematic when collection-dependent term weights (i.e. idf weight) cause the retrieval scores of similar documents to vary in different collections (Dumais, 1993; Voorhees et. al., 1995), we thought longer queries and massive retrieval window used for subcollection creation might mute its adverse effects. Any advantages gained by more complex retrieval strategies are likely to have less impact on subcollection creation, whose goal is to retrieve the bulk of relevant documents at an acceptable document rank. Ideally, these assumptions should be empirically tested by experimenting with exhaustive combinations of subcollection creation, collection fusion, and various ad-hoc retrieval strategies, but we decided to test only a few subcollection creation methods for the reasons of practicality and simplicity. One of the overriding factors that influenced our research design in TREC-7 was the machine resource limitations that restricted a large scale experimentation. Besides, we figured if a simple method could create an effective enough subcollection, we could forgo complex methods in favor of a simple one.

Therefore, we chose the two most simple and yet effective subcollection creation methods from the pre-test and applied them to individual collections separately. Subcollection creation methods tested were:

?unc7aal1: Collection fusion by raw score merging of the simple initial retrieval results without any feedback.?unc7aal2: Collection fusion by raw score merging of the pseudo-relevance feedback results with the adaptive linear model using top 5 retrieved document as relevant and 100th document as non-relevant.

Both unc7aal1 and unc7aal2 were produced by first retrieving 10% of documents in each collection and merging the results by their raw query-document similarity scores.

The second phase of our ad-hoc experiment, which we planned to do in a post-study, involved performing relevance feedback on the subcollections by using the official TREC relevant documents in top 20 retrieved documents. The results of relevance feedback on subcollection would then be compared with that of whole collection to determine the relative effectiveness of the subcollection retrieval.

The system construct for the ad-hoc experiment was based on the system component pre-test results. We used the document term weight of Lnu 0.3 to optimize the initial retrieval results and allowed for the full feedback query

expansion to maximize the feedback effect. We also heavily weighted the initial query in the starting vector formulation of the adaptive linear model (i.e. c0 = 1.0 in Equation 3) to reduce the adverse effect of the pseudo-relevance feedback. We did not create a phrase index for ad-hoc experiment since we thought its creation cost in time and machine resources far outweighed any potential benefit gained by using it.

4.3Results

The TREC-7 ad-hoc collection consists of 130,471 FBIS, 19,8428 Federal Register, 210,158 Financial Times, and 131,896 LA Times documents. Each document collection was first processed individually to generate single-word indexes of 243,778 terms for FBIS, 117,743 terms for Federal Register, 295,257 terms for Financial Times, and 222,155 terms for LA Times collection.

Unfortunately, we experienced a hard disk problem that corrupted the entirety of TREC-7 data and disabled our main research computer soon after we completed the first phase of our ad-hoc experiment. We turned in the top 1000 retrieved documents produced by subcollection creation runs of initial retrieval (unc7aal1) and pseudo-feedback with ALM(unc7aal2). We are still in the process of restoring the data and consequently, we were not able to conduct the second phase of our experiment to test the effectiveness of relevance feedback using a subcollection.

According to TREC evaluation measures, which indicate the retrieval performance of top 1000 document only, the pseudo-relevance feedback with the adaptive linear model did slightly better than the initial retrieval without feedback, though both runs performed slightly below the median level of all the ad-hoc participants (Table 1). As for the subcollection creation results, the smaller and homogeneous FT collection results still held true for the larger and heterogeneous ad-hoc collection. As can be seen in Table 2, there was very little difference in average recall between the two runs. In both runs, the subcollections consisting of only two percent (document rank 10,000) of the whole document collection contained over 80% of the relevant documents on the average.

Table 1. Performance Statistics of top 1000 documents

Table 2. Recall at Document Ranks averaged over 50 Queries

Closer examination of individual query result revealed some outlier queries with many relevant documents (e.g. queries 370, 389) or with possibly dissimilar relevant documents (e.g. query 373) that produced recall much below the average. Though it is very conceivable that more complex methods of collection fusion and/or ad-hoc retrieval methods may produce a subcollection with higher recall at a smaller size, whether those methods can push the results of outlier queries beyond the “recall block” remains to be seen.

8 Using the corrected document tag instead of reduced the number of Federal Register documents from

55,630 to 19,842.

5 Interactive Experiment

5.1Research Question

Feedback from IRIS users at large as well as those in TREC-6 include mixed response regarding the complexity of its user interface. Some like the interactive nature of its interface throughout the search process, while others are taken back by the complexity of it. One of the most often mentioned IRIS interface components is its ability to display and modify term weights. Most novice searchers are confused by it, though some like “seeing how the system works” and the opportunity to intervene in the system process.

In addition to user’s ambivalence, there is also the question of how the users’ term weight modification will affect the retrieval result. Certainly, if the user does not understand the significance of term weights and modifies them inappropriately, the search result will be adversely affected. If the system’s search direction is amiss and needs to adjusted, however, user intervention by term weight modification might be beneficial.

Another often discussed IRIS component is the relevance feedback interface, especially its three level of relevance (i.e. “yes,” “maybe,” and “no,” representing relevant, marginally relevant, and nonrelevant). Users like the option of judging a document beyond the dichotomous “relevant’ or “nonrelevant,” but they are not quite sure what a “marginally relevant” document should be. The question of what makes a document relevant is a fertile ground for research (Schamber, 1991; Cool, et al., 1992; Barry, 1993 and 1994; Wang 1994; Spink & Greisdorf 1997)). In a prior research, we investigated the relationship between the proportionality and the degree of relevance and found that the number of relevant passages in a document corresponded directly with the degree of relevance awarded to the document by the user (Maglaughlin, Meho, Yang and Tang, 1998). If the proportionality of relevance is an important factor in determining the relevance of a document, then the relevance levels used by the system should be finely graded to better reflect the user’s evaluation of relevance. One method of addressing this aspect of relevance may be to use the “passage feedback,” where passages instead of documents are used as the unit of relevance feedback.

Based on these observations, we asked following questions in our TREC-7 interactive experiment.

?Does the display and modification option of term weights in an interactive retrieval system help or hinder the retrieval result?

?Is the passage feedback an effective alternative to the conventional “document” feedback?

5.2Methodology

We learned from our TREC-6 interactive experience that it is difficult enough to gauge the effects of various contributing factors in an interactive retrieval experiment without compounding the analysis by introducing numerous system features. Consequently, we attempted to isolate the effect of system features by keeping the experimental and the control system identical except in one aspect.

In one interactive experiment (unc7ias), the only difference between the experimental and the control system was the display and modification capability of term weights. In another experiment (unc7iap), the difference was the relevance feedback by passage versus by document. Both experiments used the same control system called “iriss”which did not have the term weight display. The experimental system in unc7ias, “irisa”, is essentially the standard IRIS with term weight display used since TREC-6, whereas the experimental system in unc7iap, “irisp”, implements the passage feedback based on the “simple” interface (i.e. without term weight display) .

All three systems use the same initial interface (Figure 1), but the initial query modification interface differs in the display of query terms. Though all three system offer the “suggested phrases” with which the user can supplement the initial query, iriss and irisp (Figure 2.1) do not have the term weight display where user can change the term weights as in irisa (Figure 2.2). Instead, the term display of the simple interface offers check boxes users can click to include or exclude terms. The feedback query modification interface is structured in the same fashion (Figures 4.1 and 4.2). In addition to the modification of existing terms, the feedback query modification interface allows the user to add terms with emphasis, indicated by plus or minus symbol (simple interface, Figure 4.1) or by term weights (advanced interface, Figure 4.2).

Another major difference between systems occurs in the relevance feedback interface. Both iriss and irisa employ the conventional feedback mechanism of judging the relevance of a document as a whole (Figure 3.1), but this

document feedback mechanism is replaced by the passage feedback in irisp (Figures 3.2, 3.3 ad 3.4). Instead of checking each document as yes, maybe, or no for relevance, the user can simply copy and paste relevant and nonrelevant portions of documents into the appropriate passage feedback box in irisp.

The system construct for all three systems was essentially the same except for the mechanism of the passage feedback. This is described in the section 2.4.3. Based on the system component pre-test results, we used the document term weight of Lnu 0.5 to maximize the relevance feedback influence and restricted the feedback query expansion to 250 positively weighted terms and 50 negatively weighted terms in order to optimize the system for efficiency. We also reduced the contribution of the initial query in the starting vector formulation of the adaptive linear model (i.e. c0 = 0.2 in Equation 3) to allow the actions taken during feedback process to have stronger influence on the direction of the search. We also created a phrase index of adjacent noun-noun pairs to use in suggesting potentially useful phrases for the initial query as well as in expanding the feedback vectors.

5.3Results

The performance of IRIS measured by the mean instance recall (MIR) measure was slightly below median of all interactive track runs (Table 3). The passage feedback system of irisp had the highest MIR of all three IRIS systems tested, though the conventional document feedback system (iriss) showed more improvement when compared pairwise within each experimental run. Though it is difficult to be certain without an in-dept analysis of the result, MIR statistics seem to indicate that the term weight display and modification option in IRIS hinder rather than help the retrieval process. Also, the high MIR of irisp suggests the passage feedback as an effective alternative to the conventional document feedback.

Table 3. Interactive Experiment Result Statistics

Tables 4.1 and 4.2 show the information about each searcher's background and search experience gathered by pre-study questionnaires. All searchers had received a bachelor's degree and were enrolled in the School of Information and Library Science. Three searchers had previous graduate degrees. The searchers had been searching between 1 and 15 years, with 5 being the average. Four of the 16 searchers were male.

In addition to the pre-search questionnaire, the searchers also completed a psychometric evaluation in an attempt to assess their query formulation skills. The psychometric evaluations scores, which ranged from 11 to 70, were computed by comparing the searcher’s synonyms to a list of “correct” synonyms and scoring a point for each correct synonym they recorded. When the searchers’ psychometric scores were compared to their average precision and recall values, little correlation was found between search results and psychometric results (Table 4.3).

Table 4.1 Response Frequency of unc7iap Searchers on Pre-Study Questionnaire

Table 4.2 Response Frequency of unc7ias Searchers on Pre-Study Questionnaire

* OCLC** Military

Table 4.3 Searchers’ Average Psychometric Score, Precision and Recall

References

Barry, C. L. (1993). A preliminary examination of clues to relevance criteria within document representations.

Proceedings of the American Society for Information Science, Columbus, Ohio, (pp. 81-86). Medford, N.J.: Learned Information, Inc.

Barry, C. L. (1994). User-defined relevance criteria: an exploratory study. Journal of the American Society for Information Science, 45(3), 149- 159.

Berry-Rogghe, G. (1974). The computation of collocations and their relevance in lexical studies. In A. J. Aitken, R. W.

Bailey, & N. Hamilton-Smith (Eds.), The Computer and Literary Studies (pp. 103-112). Edinburgh: Edinburgh University Press.

Bollmann, P., & Wong, S. K. M. (1987). Adaptive linear information retrieval models. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 157-163.

Buckley, C., Salton, G., Allan, J., & Singhal, A. (1995). Automatic query expansion using SMART: TREC 3. In

D. K. Harman (Ed.), Overview of the Third Text REtrieval Conference (TREC-3) (NIST Spec. Publ. 500-

225, pp. 69-80). Washington, DC: U.S. Government Printing Office.

Buckley, C., Singhal, A., & Mitra, M. (1997). Using query zoning and correlation within SMART: TREC 5. In E.

M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5).

Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1996). New retrieval approaches using SMART: TREC 4. In D.

K. Harman (Ed.), The Fourth Text REtrieval Conference (TREC-4) (NIST Spec. Publ. 500-236, pp. 25-48).

Washington, DC: U.S. Government Printing Office.

Callan, J. P., Lu, Z., & Croft, W. B. (1995). Searching distributed collections with inference networks.

Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 21-

28.

Cool, C., Belkin, N. J., Kantor, P. B., & Frieder, O. (1993). Characteristics of texts affecting relevance judgments.

In M. E. Williams (Ed.), Proceedings of the 14th National Online Meeting, (pp. 77-84). Medford, N.J.:

Learned Information, Inc.

Dumais, S. T. (1993). LSI meets TREC. In D. K. Harman (Ed.), Proceedings of the First Text REtrieval Conference (TREC-1), 137-152.

Dillon, M., & Gray, A. S. (1983). FASIT: A fully automatic syntactically based indexing system. Journal of the American Society for Information Science, 34, 99-108.

Fishburn, P. C. (1970). Utility theory for decision making. New York: John Wiley & Sons.

Frakes, W. B., & Baeza-Yates, R. (Eds.). (1992). Information retrieval: Data structures & algorithms. Englewood Cliffs, NJ: Prentice Hall.

Haas, S. W., & Losee, R. M. (1994). Looking into text windows: Their size and composition. Information Processing and Management, 30, 619-629.

Harman, D. (1996). Overview of the Fourth Text REtrieval Conference (TREC-4). In D. K. Harman (Ed.), The Fourth Text REtrieval Conference (TREC-4) (NIST Spec. Publ. 500-236, pp. 25-48). Washington, DC: U.S.

Government Printing Office.

Ide, E. (1971). New experiments in relevance feedback. In G. Salton (Ed.), The Smart System-- experments in automatic document processing (pp. 337-354).Englewood Cliffs, NJ: Prentice-Hall, Inc.

Ide, E. Y Salton, G. (1971). Interactive search strategies and dynamic file organization in information retrieval. In

G. Salton (Ed.), The Smart System-- experments in automatic document processing (pp. 373-393).

Englewood Cliffs, NJ: Prentice-Hall, Inc.

Krovetz, R. (1993). Viewing morphology as an inference process. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 191-203.

Losee, R. M. (1994). Term dependence: Truncating the Bahadur Lazarsfeld expansion. Information Processing and Management, 30, 293-303.

Lewis, D., Croft, W. B., & Bhandaru, N. (1989). Language-oriented information retrieval. International Journal of Intelligent Systems, 4, 285-318.

Maglaughlin, K.L., Meho L., Yang, K. and Tang, R. (1998). Utilizing Users’ Relevance Criteria in Relevance Feedback. Unpublished manuscript.

Martin, W., Al, B., & van Sterkenburg, P. (1983). On the processing of a text corpus. In R. Hartmann (Ed.), Lexicography: Principles and practice (pp. 77-87). London: Academic Press, Inc.

Melucci M. (1998). Passage retrieval: A probabilistic technique. Information Processing & Management. 34, 43-

68.

Nilsson, N. J. (1965). Learning machines: Foundations of trainable pattern-classifying systems. New York: McGraw-Hill.

Phillips, M. (1985). Aspects of text structure. Amsterdam: Elsevier Science Publishers.

Porter, M. (1980). An algorithm for suffix stripping. Program, 14, 130-137.

Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129-146.

Rocchio, J. J., Jr. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMART Retrieval System: Experiments in Automatic Document Processing (pp. 313-323). Englewood Cliffs, NJ: Prentice-Hall.

Salton, G. (1968). Automatic Information Organization and Retrieval. McGraw-Hill.

Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41, 288-297.

Savoy, J., Calve, A., & Vrajitoru, D. (1997). Report on the TREC-5 experiment: Data fusion and collection fusion.

In E. M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5). Schamber, L. (1991a). Users’ criteria for evaluation in a multimedia environment. Proceedings of the American Society for Information Science, Washington, DC, (pp. 126-133). Medford. N.J.: Learned Information, Inc. Shaw, W. M., Jr. (1986). On the foundatin of evaluation. Journal of the American Society for Information Science, 37, 346-348.

Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 21-

29.

Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11-21.

Spink, A. and Greisdorf, H. (1997). Users’ Partial Relevance Judgements During Online Searching. Online and CDROM Review, 21, (5) 271-279.

Sumner, R. G., Jr., & Shaw, W. M., Jr. (1997). An investigation of relevance feedback using adaptive linear and probabilistic models. In E. M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference

(TREC-5).

Sumner, R. G., Jr., Yang, K., Akers, R., & Shaw, W. M., Jr. (1998). Interactive retrieval using IRIS: TREC-6 experiments. In E. M. Voorhees & D. K. Harman (Eds.), The Sixth Text REtrieval Conference (TREC-6). Sumner, R. G., Yang, K., & Dempsey, B. (1998). Interactive WWW search engine for user-defined collections.

Digital 98 Libraries: The Third ACM Conference on Digital Libraries. 307-308.

van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). London: Butterworths.

Voorhees, E., Gupta, N. K., & Johnson-Laird, B. (1995). The Collection fusion problem. In E. M. Voorhees & D.

K. Harman (Eds.), Overview of the Third Text REtrieval Conference (TREC-3).

Voorhees, E., & Harman, D. (1997). Overview of the Fifth Text Retrieval Conference. In E. M. Voorhees & D. K.

Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5).

Voorhees, E., & Harman, D. (1998). Overview of the Sixth Text Retrieval Conference. In E. M. Voorhees & D. K.

Harman (Eds.), The Sixth Text REtrieval Conference (TREC-6).

Wang, P. (1994). A cognitive model of document selection of real users of information retrieval systems (Doctoral dissertation), University of Maryland. College of Library and Park, Md. (University Microfilms No.

AAI9514595).

Wong, S. K. M., & Yao, Y. Y. (1990). Query formulation in linear retrieval models. Journal of the American Society for Information Science, 41, 334-341.

Wong, S. K. M., Yao, Y. Y., & Bollmann, P. (1988). Linear structure in information retrieval. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 219-232.

Wong, S. K. M., Yao, Y. Y., Salton, G., & Buckley, C. (1991). Evaluation of an adaptive linear model. Journal of the American Society for Information Science, 42, 723-730.

Yu, C. T., Buckley, C., Lam, K., & Salton, G. (1983). A generalized term dependence model in information retrieval. Information Technology: Research and Development, 2, 129-154.

Yang, K., & Yang, K. (1997). Graded relevance in information retrieval. Unpublished manuscript.

Figures

Figure 1 IRIS Initial Search Screen

Figure 2.1 Initial Query Modification Interface for iriss and irisp

Figure 2.2 Initial Query Modification Interface for irisa

Figure 3.1 Relevance feedback interface for iriss and irisa

Figure 3.2 Relevance Feedback Interface (phase 1) for irisp

Figure 3.4 Relevance Feedback Interface (phase 3) for

irisp

Figure 3.3 Relevance Feedback Interface (phase 2) for

irisp

Figure 4.1 Feedback Query Modification Interface for iriss and irisp

Figure 4.2 Feedback Query Modification Interface for irisa

科技论文设计参考文献地格式

科技论文写作指南 (依据中国科技论文在线网站的相关内容改编) 返回科技论文是科技发展及现代化建设的重要科技信息源，是记录人类科技进步的历史性文件。什么是科技论文？它与一般的科技文章有什么不同？怎样写好科技论文？这些都是广大科技工作者感兴趣的问题。因此，本站刊登“科技论文写作指南”，以期进一步提高科技论文的整体水平。一、科技论文的含义科学技术论文简称科技论文。它一般包括：报刊科技论文、学年论文、毕业论文，学位论文 ( 又分学士、硕士、博士论文 ) 。科技论文是在科学研究、科学实验的基础上，对自然科学和专业技术领域里的某些现象或问题进行专题研究，分析和阐述，揭示出这些现象和问题的本质及其规律性而撰写成的文章。也就是说，凡是运用概念、判断、推理、论证和反驳等逻辑思维手段，来分析和阐明自然科学原理、定律和各种问题的文章，均属科技论文的范畴。科技论文主要用于科学技术研究及其成果的描述，是研究成果的体现。运用它们进行成果推广、信息交流、促进科学技术的发展。它们的发表标志着研究工作的水平为社会所公认，载入人类知识宝库，成为人们共享的精神财富。科技论文还是考核科技人员业绩的重要标准。二、科技论文的特点 1．科学性科学性又称真理性，是保证学术论文质量的最基本的要求。科学性的内涵通常可分解为真实性、准确性、可重复性、可比性和逻辑性。（1）真实性文案大全

论文必须内容真实，资料可靠。作者要求具有严谨的治学作风和实事求是的科学态度，做到科研设计缜密，努力避免技术性失误，并且客观地记述科研数据，尊重事实，不凭主观臆断和个人好恶随意取舍客观数据或歪曲结论。（2）准确性论文的数据、引用的资料应准确无误，其结论和评价应恰如其分地反映客观事物及其运动规律。要求作者仔细观察实验过程，并对实验数据进行精确记录；写作时要选择最恰当的词语，仔细推敲相近词在表述上的细微差别，力争把写入的内容准确到表述出来。（3）可重复性读者如采用论文介绍的技术和方法，在相同的条件下，应获得与论文相同的结果和结论。只有在研究中真正揭示了研究对象的内部联系，并掌握了该对象的变化规律，才能保证论文结果和结论的可重复性。这就要求作者科研设计必须合理，写作时要详细介绍必要的、关键的内容，尤其是自己创新或改进的技术和方法，以便读者可重复出同样的结果。有了可重复性的成果，才有推广和应用价值，也才有确定的经济价值和社会价值。（4）可比性论文的结果可与其他相同或相近的课题已报道的结果进行比较，以确定其是否具有先进性。这就需要设立对比观察，并用统计学的方法处理观察结果。没有可比性的论文，其可行程度会大大降低。（5）逻辑性论文要求必须脉络清晰，结构严谨，论证的展开应符合思维的客观规律。这就要求作者在选题、提出假设、搜集素材、推断结论以及论文写作的全过程中，都必须严格遵守逻辑学的基本规律，不能出现违背逻辑学原理和规律的错误。文案大全

古典文献学书目

张舜徽：《中国文献学》，郑州：中州书画社，1982；上海：上海古籍出版社，2005。洪湛侯：《中国文献学新编》，杭州：杭州大学出版社，1994。孙钦善：《中国古文献学史》，北京：中华书局，1994。杜泽逊：《文献学概要》，北京：中华书局，2001。黄永年：《古文献学四讲》，厦门：鹭江出版社，2003。张三夕主编：《中国古典文献学》，武汉：华中师范大学出版社，2003。孙钦善：《中国古文献学》，北京：北京大学，2006。张舜徽：《文献学论著辑要》，西安：陕西人民出版社，1985。余嘉锡：《古书通例》，上海：上海古籍出版社．1985。邱陵：《书籍装帧艺术简史》，哈尔滨：黑龙江人民出版社，1984。韩仲民：《中国书籍编纂史稿》，北京：中国书籍出版社，1988。来新夏：《中国古代图书事业史》，上海：上海人民出版社，1990。曹之：《中国古籍编撰史》，武汉：武汉大学出版社，1999。魏隐儒：《古籍版本鉴定丛谈》，北京：印刷工业出版社，1984。戴南海：《版本学概论》，成都：巴蜀书社，1989。严佐之：《古籍版本学概论》，上海：华东师范大学出版社，1989。李致忠：《古书版本学概论》，北京：书目文献出版社，1990。曹之：《中国古籍版本学》，武汉：武汉大学出版社，1992，2002（重印）。姚伯岳：《版本学》，北京：北京大学出版社，1993。程千帆、徐有富：《校雠广义?版本编》，济南：齐鲁书社，1998。黄永年：《古籍版本学》，南京：江苏教育出版社，2005。戴南海：《校勘学概论》，西安：陕西人民出版社，1986。倪其心：《校勘学大纲》，北京：北京大学出版社，1987。管锡华：《校勘学》，合肥：安徽教育出版社，1991。程千帆、徐有富：《校雠广义?校勘编》，济南：齐鲁书社，1998。余嘉锡：《目录学发微》，北京：中华书局，1963；成都：巴蜀书社，1991。来新夏：《古典目录学浅说》，北京：中华书局，1981。程千帆、徐有富：《校雠广义?目录编》，济南：齐鲁书社，1988。周少川：《古籍目录学》，郑州：中州古籍出版社，1996。

中国专利文献编号

中国专利说明书的编号体系由于1989年和1993年的两次调整而分为三个阶段：1985年~1988年为第一阶段，1989年~1992年为第二阶段，1993年以后为第三阶段。列表举例说明如下。第一阶段：1985年~1988年的编号体系此阶段的编号特点：（1）三种专利申请号由8位数字组成，按年编排。如88100001，前两位数字表示受理专利申请的年号，第三位数字表示专利申请的种类，1—发明、2—实用新型、3—外观设计，后五位数字表示当年申请顺序号。（2）一号多用，所有文献号沿用申请号。专利号前的ZL 为汉语“专利”的声母组合，一般用在专利公报或检索工具中。共用一套号码的编号方式，突出的优点是方便查阅, 易于检索。不足之处是：由于专利审查过程中的撤回、驳回、修改或补正，使申请文件不可能全部公开或按申请号的顺序依次公开，从而造成文献的缺号和跳号（号码不连贯）现象，给文献的收藏与管理带来诸多不便。因此， 1989年中国专利文献编号体系作了调整。第二阶段：1989年~1992年的编号体系此阶段的编号特点：（1）自1989年开始出版的专利文献中，三种专利申请号由9位数字组成，按年编排。如 89103229.2，增加小数点后面的计算机校验码，其它含义不变。（2）自 1989年开始出版的所有专利说明书文献号均由7位数字组成，按各自流水号序列顺排，逐年累计。起始号分别为：发明专利申请公开号自CN1030001A 开始，发明专利申请审定号自CN1003001B 开始，实用新型申请公告号自CN2030001U 开始，外观设计申请公告号自CN3003001S 开始。首位数字表示专利权种类：1—发明，2—实用新型，3—外观设计。 1993年1月1日起实施修改后的专利法，伴随1994年中国加入专利合作条约（PCT ），1995年4月开始公布进入中国国家阶段的国际申请，为此中国专利文献编号体系又有新变化。

文献综述完整版

文献综述近十年白居易诗歌平淡美研究综述一、国内外研究现状概述近十年来关于白居易的研究也是古代文学研究领域的一大趋势。主要集中在白居易的诗歌研究、散文研究、思想研究、生存哲学研究等4个方面。据不完全统计，近十年来关于白居易研究的著作大致有陈友琴《白居易资料汇编》（中华书局，2005年再版）、付兴竹《白居易散文研究》（中国社会科学出版社，2007年版）、刘维，焦淑清《白居易传》（辽海出版社，2009年版）、蹇长春《白居易评传》（南京大学出版社，2011年版）等4部；研究论文达4500多篇，其中硕士学位论文余篇、博士学位论文余篇。研究领域得到很大的拓展，研究视角和方法更加多元化，研究观念也较为开放自觉。近十年来白居易研究主要的研究方向体现在白居易的诗歌研究、散文研究、思想研究、生存哲学研究等4个方面。在白居易研究的多个方面上，成就较为突出地是关于诗歌的研究。据不完全统计，十年来关于白居易诗歌方面研究的著作有乔立智《白居易诗歌词汇研究》（北京人民出版社，2012年版）、付兴林，倪超《<长恨歌>及李扬题材唐诗研究》（中国社会科学出版社，2013年版）、张中宇《白居易<长恨歌>研究--中华文史新刊，2005年版》、胡奇光《中国古代语言艺术史》（上海人民出版社，2010年版）等4部；研究论文达200篇，其中硕士学位论文50余篇，博士学位论文达4篇。涉及的研究范围很广泛，在研究视角与方法上呈现多样性，在观念上也比先前更为开放自觉。近十年来白居易诗歌研究的主要内容多体现在诗歌对后世文学的影响研究、诗歌语言词汇研究、诗歌意象研究、诗歌对外翻译研究、审美研究等5个方面。在不同程度上，都取得了相应的成果，50多篇硕博学位论文对白居易诗歌的相对应之处都进行了深入的探讨研究，整体上对全面了解白居易及其诗歌做出了较大贡献，对白居易集的

和生活有关的科学论文参考文献

和生活有关的科学论文参考文献居家生活与储藏空间的科学理念现在都市的房子昂贵，人们都在绞尽脑汁使自己买的房子物有所值。一般看楼盘都关注房子的套型、舒适度、交通便利，却往往忽视了房子能否储藏的空间。当住进来后才感觉房子越住空间越来越小，事实上各功能房间的进深、宽度的比例将影响到装修时留有储物空间的大小。一、设计观念随着生活水平的提高，物质的丰富多样化，大小家庭的储物空间越来越重视。房子布局合理虽然重要，具体到每个房间的长宽比时，能否给储藏留有足够的空间也同样重要。这将给居住者们今后生活中大小物品的存放减少不必要的麻烦。 "以人为本"的设计理念，在各种报章媒体中频频出现，有时也成为一个时髦的标签，贴在了某个设计作品上。对于从事住宅设计的设计师来说，第一重要的是真正地从生活需求出发，认认真真地观察生活，感受生活;第二重要的是真诚地了解当地人们生活习惯和生活方式，用心来做设计。我们可以没有理论，但是我们不能没有思考。花大钱追求豪华奢侈生活的业主毕竟数量有限，广大百姓对居住环境的要求只是舒适﹑整洁﹑温情，体现居住者的生活情趣，良好的空间感受、生活习惯便捷﹑生活方式的关切，将其生活内容体现在设计中远胜于那些抽象莫名的理论，没实用性的艺术美感去误导业主，千万不要让自己仓促的设计造成人们日常生活的麻烦。二﹑储藏空间设计原则生活中的物品零零碎碎﹑各种各样，又是不可缺少的，该怎样来安排呢?首先在设计住宅时应考虑各类房间都有自己的储物空间待装修时留有余地，以个人设计经验认为首先储藏室一般为1.5m?~3 m?，格式根据具体情况设计为“U“ “ L” “Ⅱ”型，又分为进人和不进人。譬如各类房间进出口处、房间进深及高度、内墙及结构的边角部分的利用，都是今后生活中大大小小物品的存放空间。第二、就近原则，尽量将各个功能房间中所需使用的物品储存于相应的房间中，便于日常取用。第三、儿童及老人房储存设计中考虑到生活自理能力和孩子的动手能力，为居住者日常生活提供良多便利。第四、现在住宅套型的灵活性和多样性，在套型之间、楼层之间的变换给同一居住者在不同时期对空间需求有不同的选择。譬如套之间的墙体拆除形成大的空间，楼层之间可以形成跃层式住宅，给居住者新的空间。这些变换都随着家庭结构、时间的变化而改变，从而对储物空间的选择性更为灵活。另外，安全因素隐蔽性也是不能忽视的，以待装修时考虑。三﹑居家新生活

(完整word版)2020复旦050104中国古典文献学考研考试科目、参考书目、真题等汇总,推荐文档

一、2020-2021复旦大学中国古典文献学考研研究方向及考试科目考研研究方向： 07中国文献学史 08古文字学 09古文献与古文字 10文字学 11敦煌学考试科目： ①101思想政治理论; ②201英语一(或)203日语(或)241法语(或)242德语; ③705文学语言综合知识; ④810目录版本学（07方向） ①101思想政治理论; ②201英语一(或)203日语(或)241法语(或)242德语; ③707古汉语与上古史 ④812古文字学（08、09、10方向） ①101思想政治理论; ②201英语一(或)203日语(或)241法语(或)242德语; ③735敦煌文献学; ④891古代汉语（11方向）二、2020-2021复旦大学中国古典文献学考研参考书目复旦大学官网不公布参考书目，以下书目仅供参考：古代文学用的是袁行霈编的中国文学史四卷本外国文学是郑克鲁的外国文学史配套崇文书局出的外国文学史辅导及习题集现当代文学就是以三十年、洪子诚和陈思和老师的当代文学史文学理论的东西：古代文论与西方文论、文学理论大量复旦老师的专著论文严家炎的二十世纪中国文学史，新中国文学史复旦出的中国文学史新著、分体文学史王力《中国古代文化常识》《辞海》文学卷王力先生的《古代汉语》郭锡良老师的《古代汉语》 ps:欢迎后台留言补充书目~ 三、复旦大学2016古籍所【050104中国古典文献学】702+803真题回忆

702 文史知识一、填空：2分×11个 1、对联：__________，青山无古今。 2、明代唐寅字____号____。 3、明天启七年干支丁卯，则天启辛酉是天启____年。清乾隆五年干支庚申，则乾隆五十五年干支____年。 4、《瓯北诗话》作者是____朝代的____，《元丰类稿》作者是____朝代的____，《殷卜辞中所见先公先王考》作者是____朝代的____。二、名解：6分×8个 1、等因奉此 2、歌行体 3、朴学 4、会稽 5、光禄寺 6、《通鉴纪事本末》 7、燕行使 8、南怀仁三、问答：40分×2个 1、4选1作答，用浅近文言，写500字左右提要：《史记评林》《四库提要辨证》《世说新语》《石渠宝笈》 2、阅读文本：（就是伯夷叔齐不食周粟，首阳山采薇那一段，好像以前看过，哭，考试时紧张的忘记出处了，还好不用答出处，百度了一下，是《史记·伯夷列传》，原文：简体，横排，左-右）夫武王已平殷乱天下宗周而伯夷叔齐耻之义不食周粟隐于首阳山采薇而食之及饿且死作歌其辞曰登彼西山兮采其薇矣以暴易暴兮不知其非矣神农虞夏忽焉没兮我安适归矣于嗟徂兮命之衰矣遂饿死于首阳山（1）简转繁，标点（包括专名号）；（2）运用学过的文史知识，用规范的现代汉语，对短文写1000字左右札记————然后我可耻的只会大作文+对联干支…丢人啊～选了《世说新语》竟然没写完五百字～———— 803 古籍校读一、选段：（原文：繁体，横排，左-右，有标点，标点正误夹杂）是书首有自序云凡三十篇为二十卷今自忠志至肉攫部凡二十九篇尚阙其一考语资篇后有云客徵鼠虱事余戏摭作破虱录今无所谓破虱录者盖脱其一篇独存其篇首引语缀前篇之末耳至其续集六篇十卷合前集为三十卷诸史志及诸家书目并同而胡应麟笔丛云酉阳杂俎世有二本皆二十卷无所谓续者近於太平广记中抄出续记不及十卷而前集漏轶者甚多悉抄入续记中为十卷俟好事者刻之又似乎其书已佚应麟复为抄合者然不知应麟何以得其篇目岂以意为之耶其书多诡怪不经之谈荒渺无稽之物而遗文秘籍亦往往错出其中故论者虽病其浮夸而不能不相徵引自唐以来推为小说之翘楚莫或废也其曰酉阳杂俎者盖取梁元帝赋访酉阳之逸典语

中国专利文献

中国专利文献 §2.1.1 中国专利说明书中国各类专利说明书自1985年9月10日开始出版以来，随专利审批程序的变化不断推陈出新。 1.1985-1992年出版的专利说明书： 1985年，专利法规定对发明专利申请实行“早期公开、延迟审查”制，专利申请自实质审查开始、到授予专利权期间内设异议程序；对实用新型、外观设计专利申请实行初步审查制，专利申请从初审公告开始到授予专利权期间内设异议程序。 ①发明专利申请公开说明书（文献类型识别代码A）：这是一种未经实质性审查、尚未授予专利权的说明书。发明专利申请提出后，经初步（形式）审查合格后，自申请日（或优先权日）起满18个月予以公布，并出版发明专利申请公开说明书。（1985 年出版至今）。 ②发明专利申请审定说明书（文献类型识别代码B）：这是一种经过实质性审查、尚未授予专利权的说明书。1985年专利法规定，发明专利申请自申请日起3年内，专利局可根据申请人随时提出的请求，对其申请进行实质性审查。经实审合格的，做出审定予以公告，并出版发明专利申请审定说明书。自公告日起3个月为异议期，期满无异议或异议理由不成立的，对专利申请授予发明专利权。（1985-1992年间出版）。 ③实用新型专利申请说明书（文献类型识别代码U）：实用新型专利提出申请后，初步审查合格即行公告，并出版实用新型专利申请说明书。自公告日起3个月内异议期，期满无异议或异议理由不成立的，对专利申请授予实用新型专利权。（1985-1992 年间出版）。 ④外观设计专利公告（专利公报）（文献类型识别代码S）：专利提出申请后，初步审查合格即行公告。由于外观设计专利申请公告仅由简要说明、图片或照片组成，因而不出版说明书单行本，只在专利公报上进行公告。自公告日起3个月为异议期，期满无异议或异议理由不成立的，对专利申请授予外观设计专利权。（1985-1992年间出版）。

文献综述完整版

XXX大学文献综述 ***届离子液体+ 溶剂二元体系电导率、表面张力物性研究进展学生姓名XXX 学号XXX 院系XXX 专业XXX 指导教师XXX 填写日期XXX 离子液体 + 溶剂二元体系电导率、表面张力物性研究进展

摘要离子液体作为一种新型的绿色溶剂，其物理化学性质的研究受到了普遍的关注,采用离子液体与各类溶剂形成二元体系研究究引起了全世界研究者的关注。针对离子液体二元体系常规理化性质的研究有利于了解离子液体的结构特性及新型离子液体的开发。离子液体二元体系的理化性质除受到温度和离子液体本身结构的影响外，还受到二元体系中溶剂极性和各组分含量等的影响。本文综述了离子液体的电导率、表面张力的研究进展。研究发现大部分离子液体的表面张力γ随温度升高而减小，同一种离子液体浓度越高,表面张力越小，表面张力随含水量的增加而增加；离子液体在相同温度下电导率随浓度的增加而增大，相同浓度下电导率随温度的升高而增大。关键词：离子液体；电导率；表面张力离子液体具有与传统有机溶剂截然不同的性质和特点，其化学稳定性好、溶解性好、熔点低、不易挥发、可传热、可流动、对环境污染少，可作为绿色溶剂用于化学反应和分离过程，近年来受到了人们的广泛关注和被广泛应用，例如精细化学品合成、高分子聚合物及有关合成、分离萃取、消除环境污染、太阳能电池和燃料电池等[1]。离子液体成为国内外研究的热点之一，目前已广泛应用于催化、材料和萃取分离[2-5]等领域由于离子液体所具备的这些优点，近年来离子液体越来越多地被作为一种可设计的功能型分子，即所谓的功能化离子液体(TSIL)。功能化离子液体是指在阳离子或阴离子上引入官能团的离子液体，但其与离子液体是一个不可分割的整体。由于功能化离子液体的核心离子与官能团影响着反应过程，与溶解于其中的溶质产生相互作用，导致最终过程优化的实现，更加符合实验和工业需求而受到重视。本文结合国内外的研究情况，不仅对离子液体+溶剂二元体系表面张力实验测定工作进展做了归纳，还对电导率方面的研究做了相应的综述。 1.离子液体+溶剂二元体系表面张力目前，关于离子液体表面张力的研究还十分有限，表面张力是表面化学中最

中国科学引文数据库(CSCD)简介

附件2：中国科学引文数据库（CSCD）简介中国科学引文数据库创建于1989年，通过清华大学和中国科学院资源与技术的优势结合和多年的数据积累，目前已发展成为我国规模最大、最具权威性的科学引文索引数据库——中国的《科学引文索引》（SCI），为中国科学文献计量和引文分析研究提供了强大工具。中国科学引文数据库（CSCD）为年刊，收录我国数学、物理、化学、天文学、地学、生物学、农林科学、医药卫生、工程技术、环境科学和管理科学等领域出版的中英文科技核心期刊和优秀期刊近千种。数据库的来源期刊每两年进行一次评选，分为核心库和扩展库两部分内容。其中，核心库的来源期刊经过严格的评选,是各学科领域中具有权威性和代表性的核心期刊；扩展库的来源期刊也经过大范围的遴选，是我国各学科领域较优秀的期刊。2011-2012版本的中国科学引文数据库共遴选了1124 种期刊，其中英文刊110 种，中文刊1014 种；核心库期刊 751 种（以C为标记）扩展库期刊 373 种（以E为标记）。中国科学引文数据库是我国第一个引文数据库。曾获中国科学院科技进步二等奖。1995年CSCD出版了我国的第一本印刷本《中国科学引文索引》，1998年出版了我国第一张中国科学引文数据库检索光盘，1999年出版了基于CSCD和SCI数据，利用文献计量学原理制作的《中国科学计量指标：论文与引文

统计》，2003年CSCD上网服务，推出了网络版，2005年CSCD 出版了《中国科学计量指标：期刊引证报告》。2007年中国科学引文数据库与美国Thomson-Reuters Scientific合作，中国科学引文数据库将以ISI Web of Knowledge为平台，实现与Web of Science的跨库检索，中国科学引文数据库是ISI Web of Knowledge平台上第一个非英文语种的数据库。中国科学引文数据库目前已在我国各大科研院所、高等学校的课题查新、基金资助、项目评估、成果申报、人才选拔以及文献计量与评价研究等多方面作为权威文献检索工具获得广泛应用。主要包括：自然基金委国家杰出青年基金指定查询库；第四届中国青年科学家奖申报人指定查询库；自然基金委资助项目后期绩效评估指定查询库；众多高校及科研机构职称评审、成果申报、晋级考评指定查询库；自然基金委国家重点实验室评估查询库；中国科学院院士推选人查询库；教育部学科评估查询库；教育部长江学者；中科院百人计划等。

中国科技论文参考文献范例

https://www.wendangku.net/doc/6a8440567.html, 中国科技论文参考文献一、中国科技论文期刊参考文献 [1].走向繁荣的中国科技期刊研究——庆祝《中国科技期刊研究》创刊25周年. 《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2015年10期.刘雪立. [2].《中国科技期刊研究》创刊25周年感悟. 《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2015年10期.马智. [3].《中国科技期刊研究》论文被WebofScience数据库引用分析. 《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2014年9期.鲍国海. [7].《新疆医科大学学报》被“中国科技核心期刊”收录的启示. 《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2013年4期.周芳. [8].中国科技核心期刊网站建设现状. 《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2011年5期.程维红.任胜利.路文如.严谨.王应宽.方梅. [9].《电力系统自动化》和《高电压技术》入选“第2届中国精品科技期刊”. 《电力系统自动化》.被中信所《中国科技期刊引证报告》收录ISTIC.被EI收录EI.被北京大学《中文核心期刊要目总览》收录PKU.2011年24期. [10].中国科技服务业区域非均衡发展及影响因素研究. 《科技管理研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2016年1期.张清正.魏文栋.孙瑜康. 二、中国科技论文参考文献学位论文类

中国古典文献学参考书目

中国古典文献学参考书目来源：李士彪柘人的日志中国古典文献学参考书目（据说是北师大教授郭英德推荐的书目。不是很多，但看完这些也就够了。）通用书目：张舜徽：《中国文献学》，郑州：中州书画社，1982；上海：上海古籍出版社，2005。洪湛侯：《中国文献学新编》，杭州：杭州大学出版社，1994。孙钦善：《中国古文献学史》，北京：中华书局，1994。杜泽逊：《文献学概要》，北京：中华书局，2001。黄永年：《古文献学四讲》，厦门：鹭江出版社，2003。张三夕主编：《中国古典文献学》，武汉：华中师范大学出版社，2003。孙钦善：《中国古文献学》，北京：北京大学，2006。张舜徽：《文献学论著辑要》，西安：陕西人民出版社，1985。古籍编纂学书目余嘉锡：《古书通例》，上海：上海古籍出版社．1985。邱陵：《书籍装帧艺术简史》，哈尔滨：黑龙江人民出版社，1984。韩仲民：《中国书籍编纂史稿》，北京：中国书籍出版社，1988。来新夏：《中国古代图书事业史》，上海：上海人民出版社，1990。曹之：《中国古籍编撰史》，武汉：武汉大学出版社，1999。古籍版本学书目

魏隐儒：《古籍版本鉴定丛谈》，北京：印刷工业出版社，1984。戴南海：《版本学概论》，成都：巴蜀书社，1989。严佐之：《古籍版本学概论》，上海：华东师范大学出版社，1989。李致忠：《古书版本学概论》，北京：书目文献出版社，1990。曹之：《中国古籍版本学》，武汉：武汉大学出版社，1992，2002（重印）。姚伯岳：《版本学》，北京：北京大学出版社，1993。程千帆、徐有富：《校雠广义·版本编》，济南：齐鲁书社，1998。黄永年：《古籍版本学》，南京：江苏教育出版社，2005。古籍校勘学书目戴南海：《校勘学概论》，西安：陕西人民出版社，1986。倪其心：《校勘学大纲》，北京：北京大学出版社，1987。管锡华：《校勘学》，合肥：安徽教育出版社，1991。程千帆、徐有富：《校雠广义·校勘编》，济南：齐鲁书社，1998。古籍目录学书目余嘉锡：《目录学发微》，北京：中华书局，1963；成都：巴蜀书社，1991。来新夏：《古典目录学浅说》，北京：中华书局，1981。程千帆、徐有富：《校雠广义·目录编》，济南：齐鲁书社，1988。周少川：《古籍目录学》，郑州：中州古籍出版社，1996。何新文：《中国文学目录学通论》，南京：江苏教育出版社，2001。补：《古籍整理学》，刘琳、吴洪泽著，成都：四川大学出版社，2003。

中国专利文献介绍

中国专利文献中国各类专利说明书自1985年9月开始出版以来，随专利审批程序的变化不断推陈出新。现将不同审批阶段出版的专利说明书汇总如下： 1985-1992年：发明专利申请说明书，文献类型识别代码A 这是一种未经实质性审查、尚未授予专利权的说明书。发明专利申请提出后，经初步（形式）审查合格，自申请日（或优先权日）起满18个月即行公布，出版发明专利申请公开说明书。1985年出版至今。发明专利申请审定说明书，文献类型识别代码B 这是一种经过实质性审查、也尚未授予专利权的说明书。1985年专利法规定，发明专利申请自申请日起3年内，专利局可根据申请人随时提出的请求，对其申请进行实质性审查。经实审合格的，作出审定予以公告，出版发明专利申请审定说明书。自公告日起3个月内为异议期，期满无异议或异议理由不成立，对专利申请授予发明专利权。1985年至1992年期间出版。实用新型专利申请说明书，文献类型识别代码U 我国专利法对实用新型专利申请实行初步审查制，专利提出申请后，初步审查合格即行公告，出版实用新型专利申请说明书。自公告日起3个月内为异议期，期满无异议或异议理由不成立，对专利申请授予实用新型专利权。1985年至1992年期间出版。外观设计申请公告，文献类型识别代码S 外观设计专利申请同样实行初步审查制。专利提出申请后，初步审查合格即行公告。由于外观设计说明书仅由简要说明、图片或照片组成，因而不出版说明书单行本，只在专利公报上进行公告。自公告日起3个月内为异议期，期满无异议或异议理由不成立，对专利申请授予外观设计专利权。1985年至1992年期间出版。为减少重复出版，对上述授权的三种专利一般不再出版专利说明书。如果经异议或无效程序，对发明专利申请审定说明书或实用新型专利申请说明书做出较大修改，才出版相应的经修改后的发明专利说明书或实用新型专利说明书。这两种专利说明书只出过若干件。 1993年后： 1993年实施第一次修改后的专利法，由于取消了三种专利申请授权前的异议程序，专利说明书出现新的类型：发明专利说明书，文献类型识别代码C 发明专利申请经实审合格即可授予专利权，自1993年1月1日起开始出版发明专利说明书，从而取代了发明专利申请审定说明书的出版。实用新型专利说明书，文献类型识别代码Y 实用新型专利申请经初审合格即可授予专利权，自1993年1月1日起开始出版实用新型专利说明书，从而取代了实用新型专利申请说明书的出版。外观设计授权公告，文献类型识别代码D 外观设计专利申请经初审合格即可授予专利权，自1993年1月1日起开始在外观设计公报中公告，外观设计申请公告也随之取消。下列示意图可对以上7种类型说明书的产生和取缔进行归纳、总结。

如何阅读科技文献

同研究生谈科技文献阅读彭渤科技文献阅读在科研活动中占有十分重要的地位。在我看来，阅读专业文献应贯穿科研活动的整个过程。信息时代，面对浩如烟海的专业科技文献，究竟应该如何来阅读呢？这是很多研究生问我的问题。以前在讲授《文献阅读与科技论文写作》这门课程，与学生讨论交流时，这个问题是大家课堂问得最多的问题。一些学生甚至毕业后，还发邮件来问这个问题。但我想，不同的学者对这个问题有不同的体会和答案。这里谈谈自己的体会，仅供参考。并期以抱砖引玉。 1. 科技文献的作用。阅读文献，首先应明确文献在科研工作，特别是基础研究中的作用。文献在科研活动中具有如下三方面的作用或者功能。(1) 文献资料构筑了某个领域的研究背景。即一般基金申请书中的“国内外研究现状部分”的内容，或者文章本身的引言部分的内容。一份基金申请书，或者一篇投稿的论文，对某领域研究背景的表述和分析，是最能反映申请书或者论文水平的部分。一项研究起点高低的程度，全在于其对研究背景的把握和分析。因而作者对相关文献资料的占有程度、把握水平和理解深度，是决定某项研究水平高低的关键之一。(2) 文献资料为科研工作奠定研究基础。一项研究设计的研究内容、技术路线和研究方案，在理论上是否成立，在实践上是否可行，文献资料的分析能够帮助你作出判断。因此，文献资料在理论上为科研工作奠定了研究的理论基础，在实践上为科研工作创造了一定的工作条件。(3) 文献资料是“巨人的肩膀”。科技创新不是喊口号，更不是“无源之水”。她需要“土壤”，需要根基。这个“土壤”和根基的重要组成部分，就是文献资料。在这个“土壤”和根基上，发生知识“火花”的碰撞，实现科技创新，其实就是站在了“巨人的肩膀”之上的认识升华、技术革新或者理论突破。因此，文献资料是创新“火花”的源头，是“巨人的肩膀”。 2. 阅读科技文献的目的。上述科技文献在科研活动中所起的作用表明，阅读参考文献的主要目的之一就是提升学术水平。在笔者看来，具体包括如下四个方面。(1) 丰富基础知识。文献与专著等书籍不同，文献传播的知识常常是零散的，而一般专著或者教材包容的知识具一定的系统性。但文献，特别是新近文献，常常传播最新的知识点。因而，通过阅读某个领域的新近文献，追踪阅读历史文献，能为读者打下某个研究领域全面、深刻、丰厚的知识基础。使初学者由入门进步到专业水平。进而可达到通观全局，充满自信的程度。(2) 把握学术观点。在同一个研究领域，不同的学者从不同的研究层面、(对地质学、地理学等自然科学而言)不同的研究地域、不同的研究方法，甚至不同的研究水平和不同的实验条件等，对同一问题可能得到不同的认识，形成不同的学术观点。通过广泛的阅读、分析和思考，就会对不同的学术观点有全面的把握。认识(归纳、总结)不同学术观点形成的环境条件、适用范围、优点和不足等，对于进一步的科研工作十分重要。(3) 学习技术方法。大家在看文献时应该注意到，发表的专业科技文献，一般都包括研究方法的详尽表述这部分内容。特别是外文文献。这是因为方法决定结果。因而，阅读参考文献，能够达到全面了解某个领域使用的主要研究方法和技术手段的目的。如沉积物重金属元素赋存状态的研究，就有逐级化学萃取分离分析、单矿物分析、元素地球化学分析等主要的研究方法。其中化学逐级分离分析的方法又有5步法、3步法、2步法等。所有这些方法都在有关文献中有详细的表述。如果你也试图研究沉积物中重金属的活性问题，那就得首先从文献资料中，对这些方法有全面的认识。再结合自己的研究课题，确定你自己研究中采用的方法。这个过程本身就需要你有一定的见解和创新能力。(4) 积累研究素材。科学研究得到科学结论、学术观点和理论认识，都需要以客观事实为基础。即需要合理的素材来作支撑。这个研究素材可以是你通过科学考察、科学实验分析得到，也可以通过文献资料来获取。对文献资料把握得好，常常能起到事

自然科学类论文参考文献范例

https://www.wendangku.net/doc/6a8440567.html, 自然科学类论文参考文献一、自然科学类论文期刊参考文献 [1].2005～2007年我国自然科学类期刊自引率统计分析. 《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2010年6期.徐自超. [2].基于因子分析的120种自然科学类期刊的综合评价. 《山东农业大学学报（自然科学版）》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.2014年2期.梁凤鸣.侯丽娟.时群. [3].大学学报《中国科技期刊研究》.被中信所《中国科技期刊引证报告》收录ISTIC.被北京大学《中文核心期刊要目总览》收录PKU.被南京大学《核心期刊目录》收录CSSCI.2013年4期.吕文红.刘霞.高丽华. [7].基于协调视角的广东省自然科学类“211工程”学科建设项目资金管理模式的改进——以华南师范大学的调研数据为例. 《广东科技》.2012年5期.黄英.何林.李松. [8].师范大学自然科学类通识选修课的新模式探索. 《课程教育研究》.2016年3期.张小燕. [9].我国自然科学类博物馆非正规教育属性分析. 《黑龙江科技信息》.2012年14期.刘君普. [10].高校自然科学类学报编校质量分析. 《传媒》.被北京大学《中文核心期刊要目总览》收录PKU.2014年2期.李文竹.杨春兰. 二、自然科学类论文参考文献学位论文类 [1].高中生自然科学类科普读物阅读现状的调查研究.被引次数：3 作者：房小敏.课程与教学论·生物东北师范大学2010（学位年度） [2]中国自然科学类博物馆五十年发展史. 作者：董远军.科学技术史华南农业大学2009（学位年度）

中国历史文献学的发展历程

中国历史文献学的发展历程中国历史文献学是一门以历史文献及其整理研究工作为研究对象的，以复原、求真和致用为主要任务的专科文献学。其研究范围主要包括：学科基本理论、历史文献及其产生发展过程、研究和整理历史文献的方法以及中国历史文献学发展史。他从属于历史学，具有综合性、基础性和实践性的突出特点。作为一门古老而又年轻的学科，中国历史文献学面临着加强学科理论建设、实现研究手段现代化等多重任务和发展趋向。一、文献与历史文献文献二字联成一词，最早见于《论语·八佾篇》。该篇记载：“子曰：‘夏礼吾能言之，气不足征也；殷礼吾能言之，宋不足征也。文献不足故也。足，则吾能征之矣。’”汉宋学者注疏时都把“文”释为典籍，“献”释为贤人或贤人言论。最早以“文献”名书的是宋末元初的马端临，他写了一部贯通历代典章制度的著作，取名为《文献通考》。在《自叙》中他解释说：“凡叙事，则本之经史而参之以历代会要，以及百家传记之书。信而有征者从之，乖异传疑者不录，所谓文也。凡论事，则先取当时臣僚之奏疏，次及近代诸儒之评论，以至名流之言谈、稗官之记录，凡一话一言，可以订典故之得失，证史传之是非者，则采而录之，所谓献也。” 从孔子到马端临，随着社会的发展和学术的进步，特别是书写工具的改进与印刷术的发明和广泛运用，贤者的言谈高见很容易见诸笔

端，各种口头传说和议论也逐渐通过各种书面的形式记录下来，于是人们越来越重视典籍而轻视传闻，相应地“文献”也由一个联合式的合成词逐渐向偏义复合词的方向演变。这里，需要指出的是马端临将文献的内容区分为“叙事”和“论事”两大类，并且将两者并重，对我们是有启发意义的。白寿彝先生认为历史文献有记注和撰述之别，记注即历史记录，而撰述要有史识。，在发现一件不经常见的文献，往往表现得相当激动，而对于历史的撰述的重要性，往往估计不足。这是带有片面性的。今人对“文献”的理解，，归纳起来大致有两类：一类是文史学界的文献概念，如郑鹤声、郑鹤春称：“结集、翻译、编纂诸端，谓之文，审订、讲习、印刻诸端谓之献。”王欣夫认为：“文献指一切历史性的材料。张舜徽先生认为：“‘文献’既是一个旧名词，自有它原来的含义和范围。我们今天既要借用这一名词，便不应抛弃它的含义而填入别的内容。近人却把具有历史价值的古迹、古物、模型、绘画，概称为历史文献，这便推广了它的含义和范围，和‘文献’二字的原意是不相符合的。当然，古代实物上载有文字的，如龟甲、金石上面的刻辞，竹简缯帛上面的文字，便是古代的书籍，是研究、整理历史文献的重要内容，必须加以重视。至于地下发现了远古人类的头盖骨或牙齿，那是古生物学的研究范围；在某一墓葬中出土了大批没有文字的陶器、铜器、漆器等实物，有必要考明其形制、时代和手工艺的发展情况，那是古器物学的研究范围。这些都是考古学家的职志，和文献学

专利信息检索(专利文献信息基础(2012版)试卷一)——64分

专利文献信息基础（2012版）试卷一 64分 (共16道题，共100分，限时：180分钟，还剩86分钟14秒) 多选题 1. 要检索“通用电气公司”在中国申请的有关“医用核磁共振成像”的技术主题，该技术主题对应的IPC类号为A61B5/055，应该（）。 (5分) A. 在“分类号”中输入：A61B5/055，同时在“申请（专利权）人”中输入：通用电气 B. 在“优先权”中输入：医用and 核磁and共振成像，同时在“申请（专利权）人”中输入：通用电气 C. 在“主分类号”中输入：A61B5/055，同时在“申请（专利权）人”中输入：通用电气 D. 在“摘要”或“名称”中输入：医用and 核磁and共振成像，同时在“申请（专利权）人”中输入：通用电气 2. 要检索有关稀土荧光材料的国外专利信息，可通过以下哪些途径进行？ (5分) A. 查找“稀土荧光材料”对应的英文主题词，用英文词在“espacenet”数据库中进行检索； B. 用“稀土荧光材料”的中文主题词查出一些中国专利文献，然后照出所对应的IPC分类号；用分类号在“espacenet”数据库中进行检索； C. 查找国外有关公司的名称，用公司的名称在“espacenet”数据库中进行检索； D. 查找“稀土荧光材料”对应的英文主题词，通过查阅科技文献。 3. 某公司的主要产品是稀土荧光材料，为了进一步提升产品的技术含量，该公司准备开发新型稀土荧光材料。在开发之前，公司欲进行专利信息的全面检索，以便为开发提供一些参考。此时该公司了解到国外某

公司甲有类似产品。请问该公司可通过以下哪些途径进行专利信息检索（）。 (5分) A. 主题词——稀土荧光材料； B. 稀土荧光材料所对应的IPC分类号； C. 国外某公司甲的公司名称； D. 通过查阅科技文献。 4. 一种纤维素酶制备魔芋甘露寡糖的生产方法，该方法为：酶解、过滤、脱色、浓缩、干燥，其特征在于：按魔芋精粉1g∶80－120u酶活单位投入纤维素酶、半纤维素酶混合物，纤维素酶、半纤维素酶混合物中酶活比例为2.5－3.5∶1；酶解终止后依次采用离心过滤、膜过滤，得到甘露寡糖混合液和滤渣；离心过滤除掉粗渣，超滤膜过滤得到甘露寡糖混合液，去掉大分子；所用的过滤膜的孔径为2－10微米。该技术主题对应的IPC类号为C12P19/00，其类名为：含有糖残基的化合物的制备。请问检索与该技术主题“纤维素酶制备魔芋甘露寡糖”有关的专利信息，应该通过以下哪些检索策略进行最全面检索？ (5分) A. 纤维素酶and 魔芋and 甘露寡糖； B. C12P19/00 and (甘露寡糖or魔芋or纤维素酶)； C. 纤维素酶and 魔芋甘露寡糖； D. C12P19/00 and纤维素酶and 魔芋甘露寡糖。 5. GE公司在中国申请的专利CN95190062.5，它是（）途径申请中国专利的。 (5分)

文献综述的主要方法

文献综述的主要方法文献综述抽取某一个学科领域中的现有文献，总结这个领域研究的现状，从现有文献及过去的工作中，发现需要进一步研究的问题和角度。文献综述是对某一领域某一方面的课题、问题或研究专题搜集大量情报资料，分析综合当前该课题、问题或研究专题的最新进展、学术见解和建议，从而揭示有关问题的新动态、新趋势、新水平、新原理和新技术等等，为后续研究寻找出发点、立足点和突破口。文献综述看似简单．其实是一项高难度的工作。在国外，宏观的或者是比较系统的文献综述通常都是由一个领域里的顶级“大牛”来做的。在现有研究方法的著作中，都有有关文献综述的指导，然而无论是教授文献综述课的教师还是学习该课程的学生，大多实际上没有对其给予足够的重视。而到了真正自己来做研究，便发现综述实在是困难。约翰W.克雷斯威尔（John W. Creswell）曾提出过一个文献综述必须具备的因素的模型。他的这个五步文献综述法倒还真的值得学习和借鉴。克雷斯威尔认为，文献综述应由五部分组成：即序言、主题1（关于自变量的）、主题2（关于因变量的）、主题3（关于自变量和因变量两方面阐述的研究）、总结。 1. 序言告诉读者文献综述所涉及的几个部分，这一段是关于章节构成的陈述。在我看也就相当于文献综述的总述。 2. 综述主题1提出关于“自变量或多个自变量”的学术文献。在几个自变量中，只考虑几个小部分或只关注几个重要的单一变量。记住仅论述关于自变量的文献。这种模式可以使关于自便量的文献和因变量的文献分开分别综述，读者读起来清晰分明。 3. 综述主题2融合了与“因变量或多个因变量”的学术文献，虽然有多种因变量，但是只写每一个变量的小部分或仅关注单一的、重要的因变量。 4. 综述主题3包含了自变量与因变量的关系的学术文献。这是我们研究方案中最棘手的部分。这部分应该相当短小，并且包括了与计划研究的主题最为接近的研究。或许没有关于研究主题的文献，那就要尽可能找到与主题相近的部分，或者综述在更广泛的层面上提及的与主题相关的研究。 5. 在综述的最后提出一个总结，强调最重要的研究，抓住综述中重要的主题，指出为什么我们要对这个主题做更多的研究。其实这里不仅是要对文献综述进行总结，更重要的是找到你要从事的这个研究的基石（前人的肩膀），也就是你的研究的出发点。在我看来，约翰.W.克雷斯威尔所提的五步文献综述法，第1、2、3步其实在研究实践中都不难，因为这些主题的研究综述毕竟与你的研究的核心问题有距离。难的是第4步，主题3的综述。难在哪里呢？一是阅读量不够，找不到最相

科学文献

科技论文设计参考文献地格式

古典文献学书目

中国专利文献编号

文献综述 完整版

和生活有关的科学论文参考文献

(完整word版)2020复旦050104中国古典文献学考研考试科目、参考书目、真题等汇总,推荐文档

中国专利文献

文献综述 完整版

中国科学引文数据库(CSCD)简介

中国科技论文参考文献范例

中国古典文献学参考书目

中国专利文献介绍

如何阅读科技文献

自然科学类论文参考文献范例

中国历史文献学的发展历程

专利信息检索(专利文献信息基础(2012版)试卷一)——64分

文献综述的主要方法

文献综述完整版

文献综述完整版