converting transcription text files for nyingarn

Existing transcription text files will need to be converted to TEI XML format before uploading them into the Nyingarn Workspace. The Nyingarn Workspace will accept Microsoft Word transcriptions modified for TEI and transformed into TEI.XML through TEIgarage, an online conversion service. Microsoft Word documents may contain headings and page breaks obstructing the conversion process. These unnecessary elements need to be removed, and the page numbers appropriately styled to be recognised by the code in the workspace. 

Before starting, decide on the manuscript item name. In Nyingarn, an item name must be a unique identifier and can contain letters and numbers. Some examples include Bates34 or SLNSW_FL814, which relate to the collector’s name or the holding institution naming schema. The page number sequencing then follows the identified item name. For example Bates34-001; Bates34-002; Bates34-003. For more naming instructions, see our support page.

WORD PREPARATION STEPS 

Naming your pages in sequence
Name each page of your manuscript according to the Nyingarn item name and sequence number. For example, on the first page of the transcription you would type Bates34-001 at the top or bottom of the page.
Note: Each page of the transcription should equal one page of the manuscript.

Creating a style of page numbers
Step 1 Click the Styles group in the Home toolbar
Step 2 Click the A+ at the bottom of the tool window

Create a new style in formatting
Step 3 Name the style Page
Step 4 For style type choose Character in the dropdown menu
Step 5 Choose a Style colour. Choosing a colour other than black will help you to recognise the change in your document.

Page is now a standard style in your documents.

Apply Page style to every page name (e.g. Bates34-001) in the document
Step 6 The find and replace function is helpful for bulk changes. Type Bates34-??? into the Find what: box. The ??? denote wild fields and will help you to find the entire page number. Note: Make sure ‘use wildcards’ is ticked.
Step 7 Click the cursor in the Replace with: field, then click More to expand the Find and Replace options if they are not already displayed.
Step 8 Next click the Format button
Step 9 Click Style option, and choose style Page

Now that the page naming/numbering is correct and styled with Page style, page breaks should be removed. 

Remove Page Breaks

Step 10 Using the Find and Replace function, in the Find what: field click Special
Step 11 Choose Manual Page Break. This will add the symbol ^m (see screenshot below). Nothing is needed in the Replace with: section, so leave it blank

Final Steps – Save and convert the document

Step 12 Save the Microsoft Word .docx file using the naming convention for the Nyingarn Workspace e.g Bates34-tei.docx
Step 13 Convert the .docx file to TEI XML using TEIgarage. TEIgarage will ask you to select the type of document you want to convert.
Step 14 Choose Documents. Choose Convert from: Microsoft (.docx), Convert to: TEI P5 XML Document
Step 15 The next window will ask you to select the file for conversion. Click Choose File button to browse, find and upload your file.
Step 16 The file will automatically download ready to be ingested into the Nyingarn workspace. The document should be named e.g. Bates34-tei.xml.