LINGUIST List 3.85

Tue 28 Jan 1992

Disc: Fonts, Unix, Machine Transcription

Editor for this issue: <>


Directory

  1. Eric Schiller, Re: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP
  2. Susanna Cumming, IPA for Microsoft Windows
  3. Tim F O'Donoghue, Re: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP
  4. Martin Wynne, re: Unix wordsort
  5. Richard Sproat, Voice Transcription

Message 1: Re: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP

Date: Fri, 24 Jan 92 14:16:15 CSRe: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP
From: Eric Schiller <schillersapir.uchicago.edu>
Subject: Re: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP

Re: Fonts
Adobe has finally released two sets of phonetic fonts, featured in
the latest issue of Font and Function. Times phonetic and Stone
phonetic, the latter including a variety of fonts. I am too poor
to check these out at present, but would love to hear reactions from
those with the resources to buy them. The samples in F&F look pretty
good, the unpleasant character spacing due no doubt to the need for
floating diacritics, which, alas, are not in the sample.

Eric Schiller
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: IPA for Microsoft Windows

Date: Sun, 26 Jan 92 11:32:42 -0IPA for Microsoft Windows
From: Susanna Cumming <scummingclipr.colorado.edu>
Subject: IPA for Microsoft Windows

I've just installed Atech Software's "Publisher's Powerpak" -- a
font manager for Windows, Dos WordPerfect and a few other programs.
They have an IPA font set which I got at the same time. I'm using
it with Word for Windows 2.0, but of course you can use the Windows
version with any Windows application. I'm pretty happy so far --
it's got all the characters I've needed to date, though the ones I
use most aren't necessarily the easiest ones to type! It includes
a screen driver so you can see what you're typing, the IPA comes in
both Roman and sans serif, and you get about 10 rather frivolous
other fonts with the basic package (all fully scalable). It works
with both my Laserjet III and my 24-pin dot-matrix printer (though
the latter is painfully slow). The basic package lists at around
$80 (though you can find it cheaper by mail order) and includes
an import utility that lets you convert any Adobe Type 1 fonts you
have around. The IPA font set is another $80. Atech's phone #
is 800-786-FONT.

Susanna Cumming
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Re: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP

Date: Sun, 26 Jan 92 14:30:50 GMRe: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP
From: Tim F O'Donoghue <timcanon.co.uk>
Subject: Re: 3.69 FYI: Nameserver, IPA, Reverse Lists, WP


>Date: Mon, 20 Jan 1992 10:06:13 CST
>From: Chris Culy <cculyvaxa.weeg.uiowa.edu>
>Subject: For the Listserv--announcement follows

>It is sometimes desirable to generate a reverse word list from a given list.
>Below is a description of one fairly easy way to do this. I hope people find
>it useful.
>
>The following is a simple awk program that reverses a line.
>
>Program 1:
>
>{ for(i=length; i>0; i--) {
> printf("%s",substr($0,i,1))
> }
> printf("\n")
>}
>
>This program can be used to create reverse word lists as follows. In the
>original file, put each word on a separate line. Run the awk program, sort
>the result, and then run the awk program again. This can be done simply in
>UNIX or DOS by using pipes.
>
>Example (UNIX):
>awk -f reverse.prog mylist | sort | awk -f reverse.prog

A simpler method is to use rev(1), ie: rev|sort|rev

The reversing program can be modifed to reverse only part of the input. For
example, Program 2 reverses only those lines ending in a lowercase letter from
s to z, while Program 3 reverses only those lines beginning with a lowercase
letter from a to k. For further information on awk, see your system
documentation.

Program 2:

/[s-z]$/{ for(i=length; i>0; i--) {
 printf("%s",substr($0,i,1))
 }
 printf("\n")
 }

This can also be done without resorting to awk by using grep(1), ie:

grep '[s-z]$'|rev|sort|rev

Tim [O'Donoghue] <timcanon.co.uk>
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 4: re: Unix wordsort

Date: Mon, 27 Jan 92 11:20:44 GMre: Unix wordsort
From: Martin Wynne <LNP5MWcms1.leeds.ac.uk>
Subject: re: Unix wordsort

Chris Culy's awk programs do work, but I'd just like to point out that
there are ways of doing wordsorts under Unix without resorting to
awk programs. Since many Unix users won't know awk, here's how to do it.
Unix (usually) provides powerful programs for text processing operations.
However your system may work slightly differently to mine (running SunOS
4.1.1).

The following will do produce a sorted wordlist:

 tr " " "\012" < input.file | sort | uniq > output.file

The sort command has a -r option, that does a reverse sort.
The pipeline:

 tr " " "\012" < input.file | sort | uniq | sort -r > output.file

will take a text file, convert word boundaries (or spaces at least)
into line breaks ( so that you get one word on each line), sort then
erase duplicate lines (so that you get a wordlist) then reverse sort it
(and output to the file text.rev.)

To make it a bit better (at the expense of making it more complicated),
you can strip out punctuation as well, as in the following:

tr -d ",.:;!?()" < input.file | tr " " "\012" | sort | uniq | sort -r \ > tex
> output.file

(NB type a \ if you overspill the command line)

This might not look much simpler than Chris Cury's awk progs, but I
think that if you get to grips with the commands involved (plus
others like grep, wc, cut), you can carry out a wide variety of
operations, without needing to learn the arcane mysteries of awk.

To make the above serviceable, you could create an executable
file (I called mine wl for wordlist) containing this:

tr -d ",.:;!?()" < $1 | tr " " "\012" | sort | uniq > ${1}.wl > tex

and then you could just type 'wl filename' to get a sorted wordlist.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 5: Voice Transcription

Date: Tue, 21 Jan 92 13:08:30 ESVoice Transcription
From: Richard Sproat <rwsmbeya.research.att.com>
Subject: Voice Transcription

>Date: Thu, 16 Jan 92 08:53:17 GMT
>From: mesuzuka.u-strasbg.fr (Michel Eytan LILoL)
>Subject: Re: 3.36 Queries: Computer Transcription, Wordstar, Verlan

>if you can work out the system you describe, it would be very useful. As you
> state it, it would be quite complex because of the vocal side of things. I am
> however thinking of another application: writing/transferring files by e-mail
> in non-standard -- ie non English -- alphabets, ranging from Latin but
> accentuated (French, German, in fact all other european) ones to non Latin
> alphabets (eg, Hebrew, Greek, Russian) or even to ideogrammatic ones (Chinese,
> Japanese, etc). I have heartd somewhere that there is a proposal to set the
> ASCII standard at 32 bits, but till that comes along (if it does, since it
 will
> be a heavy load on the net) transcription might be useful -- although it has
> its own set of problems, specially the regional accents, dialects, nay the
 many
> _different_ languages using a common set of ideograms (cf Chinese again) or a
> common alphabet (eg Hebrew and Yiddish).

> ==michel eytandpt-info.u-strasbg.fr

I'm somewhat confused as to how this relates to the IPA transcription
issue. For languages that use other scripts from Latin letters ---
esp. languages like Chinese or Japanese --- if the point is to find an
alternative way of transcribing those languages in an ascii-encodable
fashion, then there are many such schemes (e.g., pinyin for Mandarin
Chinese, and other schemes for other dialects). I don't think that IPA
transcription would be particularly helpful. For instance, it is true
that it is not very easy to read Chinese transcribed into pinyin, but
transcribing it into IPA instead is not going to help.

(If the point is to send via email the text encoded in two-byte (or,
for European languages, 8-bit) format, then there are also many such
schemes, even without proposed increases in the size of the standard
character set. For example unix provides tools such as uuencode and
uudecode for encoding and decoding binary files into/from ascii
format, for transmission via such means as email.)

Richard Sproat
Linguistics Research Department
AT&T Bell Laboratories
600 Mountain Avenue, Room 2d-451
Murray Hill, NJ 07974
tel (908) 582-5296
fax (908) 582-7308
rwsresearch.att.com
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue