Similar Articles |
|
D-Lib Nov/Dec 2014 Tkaczyk et al. |
GROTOAP2 -- The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles In this paper we present GROTOAP2 -- a large dataset of ground truth files containing labelled fragments of scientific articles in PDF format, useful for training and evaluation of document content analysis-related solutions. |
D-Lib Sep/Oct 2013 Kern & Klampfl |
Extraction of References Using Layout and Formatting Information from Scientific Articles The automatic extraction of reference meta-data is an important requirement for the efficient management of collections of scientific literature. |
D-Lib Jul/Aug 2012 Kern et al. |
TeamBeam - Meta-Data Extraction from Scientific Literature The TeamBeam algorithm analyses a scientific article and extracts structured meta-data, such as the title, journal name and abstract, as well as information about the article's authors (e.g. names, e-mail addresses, affiliations). |
D-Lib Nov/Dec 2014 Klampfl et al. |
A Comparison of Two Unsupervised Table Recognition Methods from Digital Scientific Articles In this paper we present two table recognition methods based on unsupervised learning techniques and heuristics which automatically detect both the location and the structure of tables within a article stored as PDF. |
D-Lib Nov/Dec 2015 Frey & Kern |
Efficient Table Annotation for Digital Articles Table recognition and table extraction are important tasks in information extraction, especially in the domain of scholarly communication. |
D-Lib Jul/Aug 2012 Bertin & Atanassova |
Semantic Enrichment of Scientific Publications and Metadata Our aim is to bring new value to scientific publications by automatic extraction and semantic analysis. |
D-Lib Nov/Dec 2015 Vetle I. Torvik |
MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide The problem addressed in this paper is as follows: given a free-form text string representing an author affiliation, output the name of the corresponding city (or similar locality) and its physical location. |
D-Lib Sep/Oct 2013 Erbs et al. |
Bringing Order to Digital Libraries: From Keyphrase Extraction to Index Term Assignment Collections of topically related documents held by digital libraries are valuable resources for users; however, as collections grow, it becomes more difficult to search them for specific information. Structure needs to be introduced to facilitate searching. |
D-Lib Nov/Dec 2015 Francopoulo et al. |
NLP4NLP: The Cobbler's Children Won't Go Unshod Understanding current trends is a challenging and attractive text mining task, especially when suitable tools are recursively applied to publications from the very domain they come from. |
D-Lib May/Jun 2014 Bergamaschi et al. |
The Odysci Academic Search System This paper describes the Odysci Academic Search System including all steps necessary from acquiring a document to making it available for user search. |
D-Lib November 2004 Senserini et al. |
Archiving and Accessing Web Pages: The Goddard Library Web Capture Project To ensure continued availability of these knowledge assets to the Goddard Space Flight Center (GSFC) community, the GSFC Library is working closely with others in the area of preservation to determine how to preserve the captured web sites once they are no longer maintained by the current owners or curators. |
D-Lib Jul/Aug 2015 Lorang et al. |
Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections The Image Analysis for Archival Discovery (Aida) project team is investigating the use of image analysis to identify poetic content in historic newspapers. |
D-Lib Nov/Dec 2014 Giannakopoulos et al. |
Discovering and Visualizing Interdisciplinary Content Classes in Scientific Publications In this paper, we focus on visualizing funding-specific scientific corpora in a supervised context and discovering interclass similarities which indicate the existence of inter-disciplinary research. |
D-Lib Jan/Feb 2015 Dragan et al. |
A-posteriori Provenance-enabled Linking of Publications and Datasets via Crowdsourcing In this paper we present opportunities to leverage crowdsourcing for a-posteriori capturing dataset citation graphs. |
D-Lib Jul/Aug 2012 Bhatia et al. |
Specialized Research Datasets in the CiteSeerX Digital Library These datasets are not those usually available from CiteSeer x and awareness of these datasets may further advance state-of-the-art research in academic digital library data management and analysis. |
D-Lib Sep/Oct 2013 Knoth et al. |
Scientific Publications: Gathering Data, Extracting Information, and Following Trends Digital libraries that store scientific publications continue to be increasingly important in research. They are used not only for the traditional tasks of finding and storing research outputs, but also as data sources for mass automated processing. |
D-Lib Nov/Dec 2014 Kroll et al. |
Towards a Marketplace for the Scientific Community: Accessing Knowledge from the Computer Science Domain As scientific output is constantly growing, it is getting more and more important to keep track not only for researchers but also for other scientific stakeholders such as funding agencies or research companies |
D-Lib Sep/Oct 2013 Imran et al. |
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries This paper addresses the problem of name disambiguation in the context of digital libraries that administer bibliographic citations. |
D-Lib Nov/Dec 2012 Knoth & Zdrahal |
CORE: Three Access Levels to Underpin Open Access We present the CORE (COnnecting REpositories) system, a large-scale Open Access aggregation, outlining its existing functionality and discussing the future technical development. |
D-Lib March 2006 Choudhury et al. |
Document Recognition for a Million Books Transcription represents only one component of document recognition. The presence of a large-scale book image corpus significantly raises the possibilities for document recognition capabilities, especially given the potential for statistical inferences or analyses. |
D-Lib October 2001 Ian H. Witten |
Greenstone: Open-Source Digital Library Software The Greenstone digital library software is an open-source system for the construction and presentation of information collections. It builds collections with effective full-text searching and metadata-based browsing facilities that are attractive and easy to use... |
D-Lib Jul/Aug 2012 Patton et al. |
Identification of User Facility Related Publications One metric for evaluating the scientific value or impact of a facility is the number of publications by users as a direct result of using that facility. |
D-Lib January 2000 Dan Huttenlocher & Angela Moll |
On DigiPaper and the Dissemination of Electronic Documents Proposal for a new image-based document representation, called DigiPaper, which is designed to easily disseminate electronic documents with a guaranteed appearance. DigiPaper's compression performance is analyzed. |
Information Today December 11, 2008 Avi Rappoport |
CiteSeerX and SeerSuite--Adding to the Semantic Web CiteSeer could be called a vertical research portal, a niche search engine, or a specialized digital library. |
D-Lib Jan/Feb 2011 Hense & Quadt |
Acquiring High Quality Research Data We discuss the differences between an electronic text publication and a data publication and the challenges that result from these differences for the data publication process. |
D-Lib Jul/Aug 2014 DeRidder & Matheny |
What Do Researchers Need? Feedback On Use of Online Primary Source Materials A qualitative study of 11 humanities faculty researchers at the University of Alabama, describes and rates the importance of various issues encountered when using 29 participant-selected online databases. |
D-Lib Nov/Dec 2014 Holub et al. |
Annota: Towards Enriching Scientific Publications with Semantics and User Annotations In this paper we present Annota -- a collaborative tool enabling the researchers to annotate and organize the scientific publications on the Web and to share them with their colleagues. |
D-Lib Nov/Dec 2014 Volske et al. |
A Keyquery-Based Classification System for CORE We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers |
D-Lib Nov/Dec 2015 Herrmannova & Knoth |
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analyzing Patterns of Research Collaboration We explore how Semantometrics can help to characterize the types of research collaboration in scholarly publication networks and the nature of the cross-community ties, and how this information can be utilized in aiding research evaluation. |
D-Lib January 2006 Linden & Green |
Don't Leave the Data in the Dark: Issues in Digitizing Print Statistical Publications Statistical digitization projects must make investments in adequate metadata and object-oriented design at the point of digitization - otherwise, the data are in danger of losing their context |
D-Lib Jan/Feb 2012 David Shotton |
The Five Stars of Online Journal Articles -- a Framework for Article Evaluation I propose five factors -- peer review, open access, enriched content, available datasets and machine-readable metadata -- as the Five Stars of Online Journal Articles. |
ONLINE March 2001 Katherine C. Adams |
The Web as Database New Extraction Technologies and Content Management... |
D-Lib February 2000 Atkins, Lyons, Ratner, Risher, et al. |
Reference Linking with DOIs: A Case Study Digital Object Identifiers enable readers to find content on the Internet with a persistent and reliable identifier. Hyperlinking between article bibliographies and the cited articles is a natural application of DOIs. |
D-Lib February 2005 |
The eXtensible Past: The Relevance of the XML Data Format for Access to Historical Datasets and a Strategy for Digital Preservation Reports on investigations carried out by the Netherlands Historical Data Archive into the relevance of the XML data format and the "Open Archives" paradigm on the long-term preservation and dissemination of historical datasets. |
D-Lib April 2002 Erik Duval |
Metadata Principles and Practicalities There is much confusion about how metadata should be integrated into information systems. How is it to be created or extended? Who will manage it? How can it be used and exchanged? Whence comes its authority? Can different metadata standards be used together in a given environment? |
D-Lib Sep/Oct 2011 Gauthereau-Bryson et al. |
Digitization Practices for Translations: Lessons Learned from the Our Americas Archive Partnership Project This paper discusses the complexities involved in digitizing multilingual historical documents, including practices for creating "born-digital" translations and unique metadata to best describe these rare, primary documents. |
D-Lib June 2001 Linda L. Hill |
A Content Standard for Computational Models There are no generally accepted procedures for describing computational models in ways that support cataloging, search, selection, and use. In this paper, we propose a content standard for describing computational models... |
D-Lib May/Jun 2012 Westbrook et al. |
Metadata Clean Sweep: A Digital Library Audit Project This paper discusses the pilot of an ongoing digital library metadata audit project that was collaboratively launched by library school interns and full-time staff to alleviate poor recall, poor precision and metadata inconsistencies across digital collections. |
PC Magazine November 25, 2003 |
Turn Reader Files into Writers Sometimes you don't want a huge program for a simple task. Here's a simple plug-in program that converts PDFs to Word documents. |
D-Lib March 2006 Schibel & Rydberg-Cox |
Early Modern Culture in a Comprehensive Digital Library Digital libraries have the potential to transform fields such as early modern studies, where problems of physical access to sources and intellectual access to their contents have hampered our ability to contemplate major topics. |
D-Lib Jul/Aug 2004 Coleman, Bracke & Karthik |
Integration of Non-OAI Resources for Federated Searching in DLIST, an Eprints Repository Highlights of some of the limitations of proposed solutions to distributed archives as well as the added benefits for digital repository development that non-OAI (Open Archives Initiative) integration offers. |
ONLINE November 2000 Winfred Ark & Sue Park |
A Plain Text Metamorphosis: Converting Search Results to HTML Rather than static, ASCII-based text, the Web provides the ability to deliver more dynamic, interactive information... |
D-Lib January 2000 Gail M. Hodge |
Best Practices for Digital Archiving: An Information Life Cycle Approach Digital information is fragile in ways that differ from traditional technologies, such as paper or microfilm. It is more easily corrupted or altered without recognition... |
D-Lib August 2003 |
In Brief Building a More Meaningful Web: From Traditional Knowledge Organization Systems to New Semantic Tools... Information Visualization Interfaces for Retrieval and Analysis (IVIRA) Workshop Summary... Report on the "OAI Metadata Harvesting Workshop"... etc. |
D-Lib Jan/Feb 2015 Sarah Callaghan |
Data without Peer: Examples of Data Peer Review in the Earth Sciences This paper takes an experimental view, and selects seven datasets, all from the Earth Sciences and with DOIs from DataCite, and attempts to review them, with varying levels of success. |
D-Lib Jan/Feb 2011 Starr & Gastl |
isCitedBy: A Metadata Scheme for DataCite The knitting together of published research articles and the research data that substantiate their findings is of increasing importance as more disciplines take advantage of data-driven approaches to knowledge acquisition. |
D-Lib Sep/Oct 2008 Dappert & Enders |
Using METS, PREMIS and MODS for Archiving eJournals As institutions turn towards developing archival digital repositories, many decisions on the use of metadata have to be made. |
D-Lib February 2000 Van de Sompel, Krichel, Nelson, Hochstenbach, et al. |
The UPS Prototype: An Experimental End-User Service across E-Print Archives A description of the Universal Preprint Service (UPS) Prototype developed as a proof-of-concept of a multi-discipline digital library of publicly available scholarly material |
Macworld April 18, 2005 Ross Tibbits |
PDF2Office 2.1 Professional If you have to open simple PDF files in Microsoft Word on a regular basis, Recosoft's PDF2Office 2.1 Professional is a very useful tool. Unfortunately, it isn't always accurate. |
D-Lib December 2004 Jia Liu |
Metadata Development in China: Research and Practice Chinese researchers and practitioners have now reached the point where metadata development and use have matured and become stable throughout the country's institutions. |