Existing transcription text files will need to be converted to TEI XML format before uploading them into the Nyingarn Workspace. There are a number of tools you can use including https://pandoc.org/ and https://transpect.github.io/. Please reach out to our team if you need any guidance. To convert Microsoft Word documents, please follow the instructions below.
The Nyingarn Workspace will accept Microsoft Word transcriptions modified for TEI and transformed into TEI.XML through TEIgarage, an online conversion service. Microsoft Word documents may contain headings and page breaks obstructing the conversion process. These unnecessary elements need to be removed, and the page numbers appropriately styled to be recognised by the code in the workspace.
Before starting, decide on the manuscript item name. In Nyingarn, an item name must be a unique identifier and can contain letters and numbers. Some examples include Bates34 or SLNSW_FL814, which relate to the collector’s name or the holding institution naming schema. The page number sequencing then follows the identified item name. For example Bates34-001; Bates34-002; Bates34-003. For more naming instructions, see our support page.
WORD PREPARATION STEPS
Open the Microsoft word document (.docx)
Creating a Style of Page Numbers
Step 1 Click the Styles in the Home toolbar
Step 2 Click the A+ at the bottom of the tool window
Create a new style in formatting
Step 3 Name the style Page
Step 4 For style type choose Character in the dropdown menu
Step 5 Choose a Style colour
Allocating a font colour. Choosing a colour other than black will help you to recognise the change in your document.
Page is now a standard style in your documents.
Apply Page style to every page name in the document. The find and replace function is helpful for bulk changes.
Step 6 Type Bates34-??? into the Find what: box. The ??? denote wild fields and will help you to find the entire page number.
Step 7 Click the cursor in the Replace with: field, then click More to expand the Find and Replace options if they are not already displayed.
Step 8 Next click Format button
Step 9 Click Style option
Now that the page numbers are correct and labelled via the style. The word page break should be removed.
• Remove Page Breaks
Use Find and Replace function
Find what: click Special (Step 10) and choose Manual Page Break (Step 11).
This will add the symbol ^m (see screenshot below). Nothing is needed in the Replace with: section, so leave it blank
Final Steps – Save and convert the document
Save the Microsoft word .docx file using the naming convention for the Nyingarn Workspace e.g Bates34-tei.docx.
Convert the .docx file to TEI XML using Oxgarage
Oxgarage will ask you to select the type of document you want to convert. Choose Documents
Choose Convert from: Microsoft (.docx), Convert to: TEI P5 XML Document
The next window will ask to you select the file for conversion. Click Choose File button to browse, find and upload your file.
The file will automatically download ready to be ingested into the Nyingarn workspace. The document should be name Bates34-tei.xml.
See further instructions for ingestion into Nyingarn