Vi/Vim Substitution For Microsoft Word Characters

Bryan Breen Bryan.C.Breen.1 at gsfc.nasa.gov
Mon Oct 28 19:12:54 PST 2002


Hey all,

I'm having a hard time finding any references that might help me do some
very specific editing in vi[m]. I was hoping one of the LFS group might
have run across this in the past. I've got a LOT of reports that were
originally written in Microsoft Word (Windows 97 and Mac 98 versions),
hereafter to be referred to as "M$W". The reports have all been converted
to text files (through M$W's "Save As... Text With Line Breaks") and
uploaded to a webserver. I'm sure I don't have to mention how atrocious M$W
is at using fancy characters and odd formatting.

The problem I am running into is the apostrophe (') characters that M$W
likes to use. Instead of using the standard ASCII character decimal 39
(octal 047, hex 27), it uses two characters that resemble slanted tic-marks
such as ` with the slant being determined by where in the word the
character is located. The character will be slanted forward if it is at the
beginning of a word, and be slanted backwards if it occurs anywhere else in
the word (or at the end of a word). This looks nice and pretty in M$W,
however, it doesn't seem these are standard ASCII characters that the rest
of the world accepts as the default 128 base ASCII characters.

On an *nix computer, when the text files are looked at with more/less/cat
these characters will appear as either a lower or upper case character that
appears to be the letters A and E squashed together for the forward or
backward leaning tic-marks (respectively). When I edit the files in vi[m],
the aforementioned characters appears as:

\346
\306

On the screen, they each look to be 4 characters, but when I move the
cursor across the character, it only stops on the number 6. This is similar
to say when you see a ^M, and when you move your cursor over it, it only
stops on the M (and not the ^). So, for example, if I had the following
text typed into the M$W document:

select the label marked 'on'.

That line would appear in vi[m] as:

select the label marked \346on\306.

Now why is this a big problem? Because when different web browsers view the
file, they see it as more/less/cat does, and now how M$W does (even M$
Internet Explorer... oh the irony!).

So I want to make a simple search and replace line for vi[m]. I have tried
some of the following, with no success:

%s/\306/'/gc
%s/\\306/'/gc
%s/^\306/'/gc

(That last one, the ^ was made with a CTRL-V press, followed by a CTRL-\,
thinking maybe it was looking for an ASCII escape sequence... grasping for
anything at that point).

I can't seem to figure out (or find a reference (man pages or online)) for
a way to get vi[m] to match on these high end ASCII characters that not
everyone recognizes. Googling on this produces 25,000+ pages of vi[m]
manuals that contain barely more than a "vi for dummies" quality of
material, nothing as esoteric as this. And I unfortunately don't have
access to a news reader program to access the relevant newsgroups. So I
thought I might give a shot at picking the minds of some of you folks.

Anybody have a guess at how I might be able to substitute/search for these
odd M$W characters?

- Bryan

-- 
Unsubscribe: send email to listar at linuxfromscratch.org
and put 'unsubscribe lfs-chat' in the subject header of the message



More information about the lfs-chat mailing list