To install the Summarizer, click-and-drag the "Summarize" button below and drop it to where you want it on the Favorites bar or Bookmarks bar of your browser.Summarize
A bookmark to access the Summarizer will be added to your browser, similar to the example below.
To completely unistall the Summarizer right click on its bookmark and select the Delete option.
In this segment, we will walk through a typical use of the Summarizer as part of a Google search. In this example, we googled for Information Overload and viewed one of the articles in the search results (you can view the article by clicking here). While the document was displayed, we clicked on the Summarize bookmark and received the following summary.
|View article, Press Summarize||Summary of the entire article is returned|
|View article, Select a segment, Press summarize||Summary of the selected segment is returned|
"Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document."  Opait Summarizer was developed to study the effects of multiple techniques from Natural Language Processing (NLP) on automatic summarization of text documents. The summarizer assigns relevancy scores to various text elements found within the source document and uses the elements with highest scores to generate the summary.
The following features are currently used to assign relevancy scores to sentences:
|Skimming||1.0||The first sentence in the document, sentences within the first paragraph, as well as the first sentence of each paragraph are weighted higher than other sentences. Subsequent sentences in a paragraph are assigned progressively smaller weights.|
|Term Frequency||2.0||A logarithmic form of Term Frequency * Inverse Document Frequency (TF*IDF) is used to assign highest weights to terms that appear often, but not too rarely or too frequently. Stop words are removed and a stemming algorithm is applied if defined. The current version only supports Porter 2 stemming algorithm for English text.|
|Title Overlap||1.5||If a title is detected within the document, the number of terms that are common between each sentence and the title are used to assign a weight to the sentence.|
|Description Overlap||1.5||If a tagged description is detected within the document, the number of terms that are common between each sentence and the description are used to assign a weight to the sentence.|
|Length||0.5||A function is used to demote sentences that are significantly shorter or longer than the mean length of all sentences in the article.|
|Readability||0.8||This feature only applies to English documents. It assigns a weight to a sentence based on the proportional appearance of so called "Spache" words, which is measure of how readable the content of the sentence is.|
|Graph Similarity||2.0||For documents of smaller sizes, a fully connected graph is constructed with sentences as nodes and Cosine Similarity between sentences as edges. This graph exists in an N-dimensional space where N is the number of unique terms in the source document. The mean of similarity of a sentence to all other sentences is used to assign a weight to the sentence. This algorithm can become expensive for larger documents. If the number of sentences is larger than a threshold (currently set at 100), the Graph Similarity is disabled in favor of a similarity measure between a sentence and the Centroid of the document (see next feature).|
|Centroid Similarity||1.0||A centroid vector of the document is calculated as the mean of term frequencies (more specifically, TF*IDF values). The cosine similarity of each sentence to the centroid vector is used as the weight of the sentence.|
The overall score is a linear combination of weighted individual features and is normalized to fall into [0-100] range. The Multiplier column above shows empirical default boosters for each feature. Any feature may be disabled by setting the corresponding Multiplier to zero.
If you hold down the Alt key while clicking on the Summarize button, a detailed view of all the features that contributed to the ranking of the sentences in the summary will be displayed.
|SourceUrl||String||Address of the source document.|
|SourceText||String||Actual text of the source document.|
|Language||String||Language code of the document.|
|SummaryPercent||Integer||Summary length as percentage of the original.|
|MinSentences||Integer||Minimum number of sentences in the summary.|
|MaxKeywords||Integer||Maximum number of keywords to return.|
|Highlight||Boolean||Highlight summary sentences in context.|
|ShowScores||Boolean||Include relevancy scores.|
|ShowDebug||Boolean||Include raw scores attached to features that are used to rank sentences.|
|AutoRun||Boolean||Generate summary immediately (as in the bookmarklet).|