LINGUIST List 9.782

Tue May 26 1998

Sum: Interlinear Translation Tools

Editor for this issue: Brett Churchill <brettlinguistlist.org>


Directory

  1. Vern M. Lindblad, Interlinear Translation Tools

Message 1: Interlinear Translation Tools

Date: Mon, 25 May 1998 05:52:13 -0700 (PDT)
From: Vern M. Lindblad <vernmlu.washington.edu>
Subject: Interlinear Translation Tools

Hi everyone,

In February I posted a query to Linguist List regarding tools to
facilitate producing interlinear translations in Word for Windows. It
appeared as LL: Vol-9-262, dated Sun Feb 22 1998.

Many thanks to everyone who responded for your helpful suggestions. I
have received several inquiries (from Spain, Germany, Denmark, etc.) 
regarding whether I had gotten any responses, so it seems that there is a
need for such products, and for me to finally get around to posting a
summary. (I wanted to wait till I had worked out all the bugs and made
sure that this solution really would do the job; it did.) 

This posting consists of three parts:

1. My original query
2. Texts of the 9 substantive responses (very lightly edited)
3. Some tips on doing IT using SIL's Shoebox (the one I chose to use) 


My original query read as follows:

Subject: Interlinear Translation Tools

I will soon need to do interlinear translations of several pages of text,
and do not savor the thought of trying to do it in Word for Windows. This
seems like the sort of low-level programming problem that many people must
have solved long ago, since linguists put examples into this format in
their papers all the time. Do any of you know of such a program
(preferably available free) that you would recommend, and how to get it?

More specifically, what I need:

a Word for Windows (or at least PC) -compatible editing program that will
help keep things neat and tidy as I take a page of continuous text in
another language and add to it word-by-word English translations aligned
directly below each word on the next line, and then add a smoother
sentence-by-sentence translation on the line below that. Thus, the
original text should appear on lines 1, 4, 7, 10 ..., the word-by-word
translation should appear on lines 2, 5, 8, 11 ..., and the smooth
translation should appear on lines 3, 6, 9, 12 .... (I'm sure I can
eventually accomplish this using ordinary WfW, but I don't relish the
thought of wrestling with all the problems of keeping everything lined up,
going back to make changes on previous lines that affect alignment, etc.)

What I do NOT need: a program to translate words from one lg. to another.

It seems like I remember seeing something like this mentioned on LL not
too long ago, but I searched back through all the LL files I have for the
last couple years and didn't recognize it from the titles. I also tried
searching for _interlinear translation_ on the WWW and got over 100 hits,
but none of them seem to be what I want. (I did find a similar query five
years ago in LL 4.399 titled "LaTeX style file for interlinear
translations required", but I am not using LaTeX. I also found that SIL
has a program called IT that both does interlinear translation and creates
a lexical database, and it is available for USD 60, but that seems a bit
high for my rather simple needs.)

That's why I'm asking for your advice.

Naturally, I will post a summary of any substantive responses.

Thanks, Vern


Vern M. Lindblad
VERNMLU.WASHINGTON.EDU

- -------------------------------------------------------------------------
- -------------------------------------------------------------------------

Summary of responses:

The suggestions I received were primarily endorsements of various SIL
products (5), with several mentions of tab and/or tables in Word for
Windows (3), and one description of IBM Translator's Workbench (price not
given). Since some of you may benefit from knowing about options other
than the one I chose, lightly edited versions of those 9 messages are
copied below in the order given above. 

I tried briefly and unsuccessfully to use tabs before I decided to invest
the time to learn how to use Shoebox. Although nobody suggested using
Excel, it seems to me that that would be easier than trying to use tabs or
tables in Word. If you want to attempt to employ that solution, some of
my comments near the end of this posting about editing in Excel and
copying from Excel to Word may be of some help to you. 

(I have listed the option I chose to pursue as #1; more comments about
Shoebox follow message #9.) 



1. Evan L. Antworth:


>...I also found that SIL
>has a program called IT that both does interlinear translation and creates
>a lexical database, and it is available for USD 60, but that seems a bit
>high for my rather simple needs.)

You can get IT version 1.2 for MS-DOS here:

 http://www.sil.org/ftp/software/dos/it12.zip

This is free, but does not include the full documentation. The $60 is for a
rather expensive-to-print loose-leaf manual!

There is no Windows version of IT, nor will there ever be. (There is a
Macintosh version, which is a great improvement over the DOS version.) IT
for DOS is very much a 1980s-vintage program, with all the limitations that
implies.

Probably better is Shoebox for Windows (and Macintosh). While primarily a
lexical database program, it also does semiautomated interlinear glossing
in a manner very similar to IT. You can obtain it (for free) here:

 http://www.sil.org/computing/shoebox.html


- Evan Antworth
SIL Webmaster at www.sil.org
<Evan.Antworthsil.org>


2. John P. Boyle <jpboylemidway.uchicago.edu>:

A good program that I have been working with is called 'IT' (interlinear
transcription). It can be downloaded from SIL at <www.sil.org>. The only
problem is that the instructions are somewhat cumbersome. However once you
get it working and become fimilar with it it is wonderful.
Good luck.
 John P. Boyle


3. John E. Koontz <John.KoontzColorado.EDU>:

...
The closest to what you want that I know of is another SIL program called
Shoebox, which will handle morpheme by morpheme translations and alignment
with monospaced fonts. I believe it is OK with word to phrase. 

You should be able to find it under SIL's stuff at http:/www.sil.org/

It is freeware, and the documentation is online, too, though I think it
might be useful to have the last pre-Windows versions' documentation which
was available as a book, if that book is still available. The Windows doc
is up to date wrt the interface, but a bit slapdash on what you're doing.

SIL has another tool, available only from their JAARS group by mail, as
far as I know, that can import a Showbox format file into WfW, applying
WfW styles. The name of this tool is escaping me at the moment!


4. Claire Bowern <C.BowernStudent.anu.edu.au>:

You've probably got enough information by now, but I thought I'd write anyway!

I use Shoebox (you can download it for free from SIL's web site) - the
instruction doc that comes with the programme tells you how to do
interlinear things (or if you have any questions I might be able to help)
and you can export Shoebox docs into Word.

Happy interlinearising,

Claire Bowern.

Centre for Linguistic Typology
Australian National University,
ACT, 0200, AUSTRALIA
Ph: +61 2 6249 2222


5. Jim Bauman <jjbaumanix18.ix.netcom.com>:

I had only the SIL tools to suggest. If, however, you get any other 
responses to your query, I would appreciate if you would notify me also.
Many thanks.

Jim Bauman


6. Peter T. Daniels:

The aligning of words with glosses is super-simple (this is how I did it
in *The World's Writing Systems* hundreds of times): don't use word
spaces between the words, use a tab character; use end-of-line not
end-of-paragraph between the lines of each group; and put a tab marker
in the ruler line for each word. Then you can adjust the spacing within
the line to fit as long or as short a word or gloss as you need. (On a
Mac, end-of-line is usually shift-return. I don't remember how I did it
in Word for Windows 2.0 many years ago.)
- 
Peter T. Daniels					
grammatimworldnet.att.net


7. Mai Kuha:

Regarding your query on Linguist List, in case the optimal software does
not surface, I have found it feasible to do this in Word (with shorter
stretches of text) using tables. That is, each word in the original
language goes in one cell in a table, its gloss in the cell below, and the
translation on a separate line just under this table consisting of two
rows. At least it's a little easier than tabs.

-Mai

.......................................................
Mai Kuha				"Que me quiten
mkuhaindiana.edu			lo bailao"
http://php.indiana.edu/~mkuha/home.html
.......................................................


8. Retta Whinnery <R2Whinneryaol.com>:

Hello, Vern

I saw your posting on the LinguistList for a way to align text. If 
I understand what you want (and it's possible that I do not), it can 
be easily done in Word by using tables. I'm sending you an attachment 
in Rich Text Format (because it seems to transfer better) and you 
should be able to open it in Word (I made it in Word 97 and saved it 
as Word 95, so either program should be able to open it).

If you're already intimately acquainted with the power of using
tables in Word and it doesn't suit your needs, then I apologize
for wasting your time. However, I work with many people (very
smart, very educated people) who have used Word for quite awhile 
yet who are not familiar with tables. So I thought it was worth a try.

I am a professional technical writer currently working in the software
development field, but my educational background is in linguistics.
Although I have not published any interlinear text, I would use a
Word table for the task, if I understand correctly what you want.

I hope you can read the attached example. If it will meet your needs,
but you need a few tips on tables, please let me know and I can
provide a few. Or you may work with someone who knows how
to make tables do what you want them to, as well.

Good luck,

Retta Whinnery
Kansas City, MO


[17 k attachment deleted --VML]


9. Deborah D. Kela Ruuskanen:

Hei Vern,
IBM Translator's Workbench might be able to do this for you, but I'm not
sure it is worth the investment. It will definitely align things side by
side, in two columns (or three), but I don't know if it will put things
underneath. The EU texts have to be exactly aligned when you translate
directives etc., but again, they are side-by-side, in two vertical
windows. 
If I am lucky enough to have a text that is scannable, I scan it in,
triple-space it in Word for Windows, turn off the automatic carriage
returns, (check to see if you have an automatic hyphenation program
which is an abomination anyway and turn it off, too), and go back and do
the rough draft translation underneath each line, working sentence by
sentence. Then I go back and type in the final translation under that. 
If you set your document margins at wider than those of the original
text (remembering to turn off the automatic formatting), and use hard
carriage returns, this works rather well. Of course, you cannot possibly
send the finished text on diskette to anyone with a different computer
set-up than yours, or add it to an email, because the formatting goes
all to hell, but you can print out your own text quite nicely. And if
you use a different type face for each of the three (original,
intertext, translation) they show up very well and are easy to follow. 
If your original is not scannable, then when you type it in, again, use 
hard carriage returns and leave lots of space at the end of the line.
You have to use the 'insert' key if your program won't let you write in
between lines when the spacing is set at triple or double. 
Good luck, and I'm looking forward to the summary of your responses.
Cheers, DK Ruuskanen
- 
Deborah D. Kela Ruuskanen \ You cannot teach a Man anything,
Leankuja 1, FIN-01420 Vantaa \ you can only help him find it
druuskancc.helsinki.fi \ within himself. Galileo

- --------------------------------------------------------------------------
- -------------------------------------------------------------------------


Finally, some comments about my experiences with Shoebox:

My original query envisioned only two lines of vertically aligned
words/morphemes (original text and word-by-word English gloss) plus one
line of free translation into English. However, Shoebox enticed me into
having four lines of vertically aligned words/morphemes plus the free
translation. The additional lines showed the separate morphemes of the
original directly below the words in which they occurred (line 2), and
parts of speech (line 4).

- versions

The versions of Shoebox currently available at http://www.sil.org/ are 1.2
and 3 (= 3.02). (Apparently a version 2 also exists.) However, on the
web page for Shoebox 3 there is a note saying "Shoebox version 4,
scheduled for release in February 1998, is delayed. Check back here
occasionally for a release announcement." This suggests the possibility
of a *better* version if you wait a little while (but how long?). I don't
know whether or not version 3 will remain available via SIL's homepage
after version 4 is released. I had no particular complaints about version
3, but kept worrying that it might disappear before I finished my project
(it didn't!). 

- speed of download

If you work on your own computer at home, this won't affect you. But if
you work on public computers at your university like I do and have to
download Shoebox from <www.sil.org> at the start of each work session, my
experience has been that it took only 3 or 4 minutes from the time I first
activated Netscape Navigator till I had Shoebox installed on the PC. As
soon as Shoebox opened my project file from drive A (i.e. my floppy), it
reconfigured itself to the layout at the end of the previous session. 

- memory required

In the process of installation, one of the screens may look something like
this (I copied these numbers down during one of my recent sessions):

	Components to Install ...

		Disk Space Required 	 6854 k 
		Free Space Remaining	-3140 k

	_X_ Shoebox 3.02 for Windows 3.1	1985 k
	_X_ Shoebox 3.02 for Windows 95/NT	2788 k
	_X_ Documentation			1479 k
	_X_ Samples				 602 k

Obviously, that PC didn't have enough free memory to install everything. 
If you have that problem, the first time you could just install the
documentation, and then print it out to get a hard-copy version to use as
a reference; subsequent times you won't need to install it. The
documentation runs close to 200 pages, but is very helpful while you are
learning to use Shoebox. I skimmed through it very quickly before I tried
to follow the tutorial. I agree with their recommendation that you should
install the samples and one of the versions of Shoebox 3.02 so that you
can work your way through the tutorial (which they refer to as the
walkthrough) before you start your own project. Once you have done the
walkthrough, you will only need to install one version of Shoebox 3.02
each time (I used the Windows 95 version). In my experience, their
instructions and explanations were quite good, and I had only a few minor
problems working my way through the tutorial, and then getting my own
project going. 

- learning curve

I didn't keep track of the number of hours spent on each part. If
sessions were 3 hours long, my rough guesstimate is that you might be able
to read the documentation in one session, do the walkthrough in two
sessions, and then get your own project actually started in one more
session. From there, it all depends on how much material you have to deal
with. Since you are constructing a lexicon as you go, things start to go
faster as you get more of the common morphemes entered into the lexicon
and don't have to repeat them each time they occur. 

- exporting to Word-for-Windows

Although some of the messages I received said that you can export files
from Shoebox to Word, that was the one area where I felt that Shoebox's
documentation was inadequate (I don't know what I would have done without
some clever thinking by one of the guys at our computer center's Help
Desk; he suggested going via Excel, and showed me how to get from Shoebox
to Excel). I will try to go into sufficient detail here to allow others to
benefit from my experience. First I will give an overview of the whole
strategy I employed in moving texts through the interface between programs
in the sequence Word-Shoebox-Excel-Word, and then I will give some
potentially helpful details on each of its parts. 

Overview:

Word	1. Keep one copy of original text in paragraph format. (Doc1)
	2. Create a new file by making a copy of the original text, and
		reformat it with a separate paragraph for each sentence. 
		(Doc2) 
	3. Insert a free translation of each sentence as a separate
		paragraph following its original. (still Doc2)
	4. Create a new file, in which you gather copies of the
		translation sentences (from step 3.) into paragraphs as 
		in the original (i.e. same format as step 1.) (Doc3)
W & Sh	5. Copy each S from Doc2 (original + translation) into Shoebox
		as a separate text file entry. (Shoebox will sort text
		files alphabetically by the alphabetical order you assign 
		for that language.)
Shoebox	6. Work out the interlinear translation for each line separately,
		till all are complete.
W & Sh	7. Copy an entire paragraph from Doc1 (i.e. without translation)
		into Shoebox as a separate text file entry (in Shoebox it 
		will all appear as one super-long line).
Shoebox 8. Interlinearize the paragraph, resolving any remaining problems. 
Sh & XL 9. Export each paragraph from Shoebox to Excel as a file, and edit
		it there.
XL & W 10. Copy the paragraph from Excel to Word (Doc3), in units that are
		just enough to fill one line in Word each time, and put 
		them above their free translation. Put the cursor in the 
		free translation text at the point corresponding to the 
		end of the line imported from Excel, and hit 'Enter' three 
		times to leave room for the next line to be copied from 
		Excel. Then move the cursor up one line to poise it at
		the next insertion point.


Some detailed comments about moving text between programs:

(In some cases, you will need to have two programs running
simultaneously, and I sometimes found it easiest to run all three at
once.)


Word to Shoebox:

No problemo; just *Select* the text you want to transfer, then use Copy
(Ctrl + C) and Paste (Ctrl + V).


Export from Shoebox to ... (e.g. floppy in drive A):

File >> Export >> [dialog box:]	Save in: ___ (e.g. 3-1/2 Floppy (A:))
				File Name: ___ (e.g. data1)
				Save as type: _Standard Format_(*.sfm)
>> Save >> [dialog box:] Records to Include: _x_ Current record only >> OK


Import to Excel from ... (e.g. floppy in drive A):

File >> Open >> [dialog box:] Look in: _ 3-1/2 Floppy (A:)_ >>
Files of type: _All Files (*.*)_ >> data1 >> Open >>
Text Import Wizard [dialog box 1 of 3]	
		_x_ Fixed width
		Start import at row: _1_
		File Origin: _Windows (ANSI)_
>> Next >> Next >> [dialog box 3 of 3:] _x_ Text >> Finish

n.b. Just like Shoebox, Excel will put your paragraph in one long
horizontal line, with the various types of glosses running parallel to it
below the original. I don't know what the limits are in Shoebox, but Excel
will only allow a maximum of 282 cells. (This was more than enough for
all of my paragraphs.)

One problem for the interface between Shoebox and Excel is that Shoebox
requires all suffixes to begin with - (hyphen), whereas Excel regards all
instances of - (hyphen) as an operator (minus sign). Therefore, many
times when I opened newly imported files in Excel, I found a number of
cells filled with the notation: #NAME? 

The only way I found to remedy this is to immediately use the Replace
function to automatically delete all occurrences of the character =
(equals sign). To accomplish this, follow the sequence:
Edit >> Replace >> [dialog box:]	Find what: = (equals sign)
					Replace with: (leave blank)
						>> Replace all

Excel also has some other peculiarities. One of my cells contained the
word _true_, which Excel insisted on converting to _TRUE_ until I inserted
a single quote mark before it, i.e. _'true_. Some unwanted changes can be
averted by clicking: Tools >> AutoCorrect >> ___ Replace text as you type
(i.e. by removing the check mark in front of _Replace text ..._), but this
will not control Excel's _TRUE_ compulsion!

If you wish to apply different fonts or typefaces to different lines of
your text, this is the time to do that. For instance, you can select a
line by clicking on the number to its left, and then make it Bold by
clicking on _B_ in the toolbar. The example in the walkthrough applies
Bold to the first line, leaves the second line plain, applies Bold +
Italic to the third line, and applies Italic to the fourth line. This may
be more appealing visually, but exporting this type of formatting from
Shoebox is not recommended. 

At this point go through the entire paragraph (= line) to clean up any
residual problems that you can find. 

Then adjust the size of the cells to the length of the words/morphemes: 

*Select All* (e.g. by clicking on the button at the top left corner) 
>> Format >> Column >> AutoFit Selection.

While *Select All* is still applied, make sure that the invisibility
option for cell walls is in effect, to avoid showing the outlines of the
cell walls after you import the text into Word: Format >> Cells >> Border
>> None >> OK (or employ the Borders icon in the toolbar).



Export from Excel:

*Select* the number of cells you want for one line of text in Word
(approximately six inches; I used a pen to gauge the proper horizontal
length) and include all (e.g. 4) rows of your Excel layout >> Edit >> Copy
(or Ctrl + C).


Import into Word:

(It will be easier to accurately position the cursor to insert at the
desired spot if you click on the paragraph sign in the toolbar (similar to
the mirror image of _TP_) so that you can see all paragraph demarcators.)

Edit >> Paste Special >> [Dialog Box:] 
	_x_ Paste:
		_Microsoft Excel Worksheet Object_
			_x_ Float over text
>> OK

You can easily move this object (i.e. line) that you imported from Excel
around on the page in Word by clicking on it once, and then using the
mouse to position it in the desired location. However, do not edit the
text after you import it into Word. If you click on it twice to edit its
cells, the cell walls will appear and cannot be erased, as far as I can
figure out. If you discover that you need to make changes to the contents
of the cells, it seems best to delete that line, go back to Excel to make
your changes, and then import that line into Word again. 

The results look nice, in my opinion, though it did take a long time to
learn the tricks and work out all the wrinkles in this.

Good luck, and have fun. 

Vern M. Lindblad
VERNMLU.WASHINGTON.EDU
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue