Shallow Thoughts : tags : html
Akkana's Musings on Open Source Computing and Technology, Science, and Nature.
Fri, 26 Nov 2021
Emacs has various options for editing HTML, none of them especially
good. I gave up on html-mode a while back because it had so many
evil tendencies, like not ever letting you type a double dash
without reformatting several lines around the current line as an
HTML comment. I'm using web-mode now, which is better.
But there was one nice thing that html-mode had: quick key bindings
for inserting tags. For instance, C-c i
would insert
the tag for italics, <i></i>
, and if you
had something selected it would make that italic:
<i>whatever you had selected</i>
.
It's a nice idea, but it was too smart (for some value of "smart"
that means "dumb and annoying") for its own good. For instance,
it loves to randomly insert things like newlines in places where they
don't make any sense.
Read more ...
Tags: editors, emacs, html, html-mode
[
19:07 Nov 26, 2021
More linux/editors |
permalink to this entry |
]
Tue, 29 Apr 2014
Long ago (in 2006!), I blogged on
an
annoying misfeature of Emacs when editing HTML files: you can't type
double dashes.
Emacs sees them as an SGML comment and insists on indenting all
subsequent lines in strange ways.
I wrote about finding a fix for the problem, involving commenting out
four lines in sgml-mode.el. That file had a comment at the very
beginning suggesting that they know about the problem and had guarded
against it, but obviously it didn't work and the variable that was
supposed to control the behavior had been overridden by other
hardwired behaviors.
That fix has worked well for eight years. But just lately, I've been
getting a lot of annoying warnings when I edit HTML files:
"Error: autoloading failed to define function sgml_lexical_context".
Apparently the ancient copy of sgml-mode.el that I'd been using all
these years was no longer compatible with ... something else somewhere
inside emacs. I needed to update it.
Maybe, some time during the intervening 8 years, they'd actually
fixed the problem? I was hopeful. I moved my old patched sgml-mode.el
aside and edited some files. But the first time I tried typing a
double dashes -- like this, with text inside that's long enough to
wrap to a new line -- I saw that the problem wasn't fixed at all.
I got a copy of the latest sgml-mode.el -- on Debian, that meant:
apt-get install emacs23-el
cp /usr/share/emacs/23.4/lisp/textmodes/sgml-mode.el.gz ~/.emacs-lisp
gunzip ~/.emacs-lisp/sgml-mode.el.gz
Then I edited the file and started searching for strings like font-lock
and comment.
Unfortunately, the solution I documented in my old blog post is no
longer helpful. The code has changed too much, and now there are many,
many different places where automatic comment handling happens.
I had to comment out each of them bit by bit before I finally found
the section that's now causing the problem. Commenting out these lines
fixed it:
(set (make-local-variable 'indent-line-function) 'sgml-indent-line)
(set (make-local-variable 'comment-start) "")
(set (make-local-variable 'comment-indent-function) 'sgml-comment-indent)
(set (make-local-variable 'comment-line-break-function)
'sgml-comment-indent-new-line)
I didn't have to remove any .elc files, like I did in 2006; just putting
the sgml-mode.el file in my Emacs load-path was enough. I keep all my
customized Emacs code in a directory called .emacs-lisp, and in my .emacs
I make sure it's in my path:
(setq load-path (cons "~/.emacs-lisp/" load-path))
And now I can type double dashes again. Whew!
Tags: emacs, editors, html
[
12:42 Apr 29, 2014
More linux/editors |
permalink to this entry |
]
Mon, 29 Jul 2013
Increasingly I'm seeing broken sites that send automated HTML mail
with headers claiming it's plain text.
To understand what's happening, you have to know about something called
MIME multipart/alternative.
MIME stands for Multipurpose Internet Mail Extensions:
it's the way mail encodes different types of attachments,
so you can attach images, music, PDF documents or whatever
with your email.
If you send a normal plain text mail message, you don't need MIME.
But as soon as you send anything else -- like an HTML message where
you've made a word bold, changed color or inserted images -- you need it.
MIME adds a Content-Type to the message saying "This is HTML
mail, so you need to display it as HTML when you receive it" or
"Here's a PDF attachment, so you need to display it in a PDF viewer".
The headers for these two cases would look like this:
Content-Type: text/html
Content-Type: application/pdf
A lot of mail programs, for reasons that have never been particularly
clear, like to send two copies of every mail message: one in plain
text, one in HTML. They're two copies of the same message --
it's just that one version has fancier formatting than the other.
The MIME header that announces this is
Content-Type: multipart/alternative
because the two versions, text and HTML, are alternative versions of the
same message. The recipient need only read one, not both.
Inside the multipart/alternative section there will be further
MIME headers, one saying
Content-Type: text/plain
,
where it puts the text of your message,
and one
Content-Type: text/html
, where it puts HTML
source code.
This mostly works fine for real mail programs (though it's a rather
silly waste of bandwidth, sending double copies of everything for no
particularly good reason, and personally I always configure the mailers
I use to send only one copy at a time). But increasingly I'm
seeing automated mail robots that send multipart/alternative mail,
but do it wrong: they send HTML for both parts, or they send a
valid HTML part and a blank text part.
Why don't the site owners notice the problem?
You wouldn't ever notice a problem if you use the default configuration
on most mailers, to show the HTML part if at all possible. But most mail
programs give you an option to show the text part if there is one.
That way, you don't have to worry about those people who like to send
messages in pink blinking text on a plaid background -- all you see is the text.
If your mailer is configured to show plain text, for most messages
you'll see just text -- no colors, no blinking, no annoyances.
But for mail sent by these misconfigured mail robots, what you'll
see is HTML source code.
I've seen this in several places -- lots of spammers do it (who cares?
I was going to delete the message anyway), and one of the local
astronomy clubs does it so I've long since stopped trying to read
their announcements.
But the latest place I've seen this is one that ought to know better:
Coursera. They apparently reconfigured their notification system
recently, and I started getting course notifications that look like this:
/* Client-specific Styles */
#outlook a{padding:0;} /* Force Outlook to provide a "view in browser" button.
*/
body{width:100% !important;} .ReadMsgBody{width:100%;}
.ExternalClass{width:100%;} /* Force Hotmail to display emails at full width */
body{-webkit-text-size-adjust:none;} /* Prevent Webkit platforms from changing
default text sizes. */
/* Reset Styles */
body{margin:0; padding:0;}
img{border:0; height:auto; line-height:100%; outline:none;
text-decoration:none;}
table td{border-collapse:collapse;}
#backgroundTable{height:100% !important; margin:0; padding:0; width:100%
!important;}
p {margin-top: 14px; margin-bottom: 14px;}
/* /\/\/\/\/\/\/\/\/\/\ STANDARD STYLING: PREHEADER /\/\/\/\/\/\/\/\/\/\ */
.preheaderContent div a:link, .preheaderContent div a:visited, /* Yahoo! Mail
Override */ .preheaderContent div a .yshortcuts /* Yahoo! Mail Override */{
color: #3b6e8f;
... and on and on like that. You get the idea.
It's unreadable, even by a geek who knows HTML pretty well.
It would be fine in the HTML part of the message -- but this is
what they're sending in the text/plain part.
I filed a bug, but Coursera doesn't have a lot of staff to respond to
bug reports and it might be quite some time before they fix this.
Meanwhile, I don't want to miss notifications for the algorithms
course I'm currently taking. So I needed a workaround.
How to work around the problem in mutt
I found one for mutt at
alternative_order
and folder-hook.
When in my "classes" folder, I use a folder hook to tell mutt to
prefer text/html format over text/plain, even though my default is text/plain.
Then you also need to add a default folder hook to set the default
back for every other folder -- mutt folder hooks are frustrating
in that way.
The two folder hooks look like this:
folder-hook . 'set unalternative_order *; alternative_order text/plain text'
# Prefer the HTML part but only for Coursera,
# since it sends HTML in the text part.
folder-hook =in/coursera 'unalternative_order *; alternative_order text/html'
alternative_order specifies which types you'd most like to read.
unalternative_order is a lot less clear; the documentation says
it "removes a mime type from the alternative_order list", but doesn't
say anything more than that. What's the syntax? What's the difference
between using unalternative_order or just re-setting alternative_order?
Why do I have to specify it with * in both places? No one seems to know.
So it's a little unsatisfying, and perhaps not the cleanest way.
But it does work around the bug for sites where you really need
a way to read the mail.
Update: I also found this
discussion
of alternative_order which gives a nice set of key bindings to
toggle interactively between the various formats.
It was missing some backslashes, so I had to fiddle with it slightly
to get it to work. Put this in .muttrc:
macro pager ,@aoh= "\
<enter-command> unalternative_order *; \
alternative_order text/enriched text/html text/plain text;\
macro pager A ,@aot= 'toggle alternative order'<enter>\
<exit><display-message>"
macro pager ,@aot= "\
<enter-command> unalternative_order *; \
alternative_order text/enriched text/plain text/html text;\
macro pager A ,@aoh= 'toggle alternative order'<enter>\
<exit><display-message>"
macro pager A ,@aot= "toggle alternative order"
Then just type A (capital A) to toggle between formats. If it doesn't
change the first time you type A, type another one and it should
redisplay. I've found it quite handy.
Tags: mutt, email, html
[
15:13 Jul 29, 2013
More tech/email |
permalink to this entry |
]
Wed, 05 Jun 2013
After upgrading my OS (in this case, to Debian sid), I noticed that
my browser window kept being replaced with an HTML file I was editing
in emacs. I'd hit Back or close the tab, and the next time I checked,
there it was again, my HTML source.
I'm sure it's a nice feature that emacs can show me my HTML in
a browser. But it's not cool to be replacing my current page without
asking. How do I turn it off? A little searching revealed that this
was html-autoview-mode, which apparently at some point started
defaulting to ON instead of OFF. Running M-x html-autoview-mode
toggles it back off for the current session -- but that's no help if I
want it off every time I start emacs.
I couldn't find any documentation for this, and the obvious
(html-autoview-mode nil) in .emacs didn't work -- first, it
gives a syntax error because the function isn't defined until after
you've loaded html-mode, but even if you put it in your html-mode hook,
it still doesn't work.
I had to read the source of
sgml-mode.el.
(M-x describe-function html-autoview-mode
also would have
told me, if I had already loaded html-mode, but I didn't realize that
until later.)
Turns out html-autoview-mode turns off if its argument is negative,
not nil. So I added it to my html derived mode:
(define-derived-mode html-wrap-mode html-mode "HTML wrap mode"
(auto-fill-mode)
;; Don't call an external browser every time you save an html file:
(html-autoview-mode -1)
)
Tags: emacs, editors, html
[
22:48 Jun 05, 2013
More linux/editors |
permalink to this entry |
]
Tue, 03 Apr 2012
How do you show equations on a web page? Every now and then, I write
an article that involves math, and I wrestle with that problem.
The obvious (but wrong) approach: MathML
It was nearly fifteen years ago that MathML was recommended as a
standard for embedding equations inside an HTML page. I remember being
excited about it back then. There were a few problems -- like the
availability of fonts including symbols for integrals, summations
and so forth -- but they seemed minor. That was 1998.
Now, in 2012, I found myself wanting to write an article involving an
integral, so I looked into the state of MathML. I found that even now,
all these years later, it wasn't widely supported.
In Firefox I could show some simple equations, like
and
But when I tried them in Chromium, I learned that webkit-based
browsers don't support MathML. At all. The exception is Safari:
apparently Apple has added some MathML support into their browser
but hasn't contributed that code back to webkit (yet?)
Besides that, MathML is ridiculously hard to use. Here's the code for
that little integral:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<semantics>
<mrow>
<msubsup>
<mo>∫</mo>
<mn>x = 0</mn>
<mi>∞</mi>
</msubsup>
<mfrac>
<mrow>
<mo>ⅆ</mo>
<mi>x</mi>
</mrow>
<mi>x</mi>
</mfrac>
</mrow>
</semantics>
</mrow>
</math>
Ugh! You can't even specify infinity without using an HTML numeric
entity. And the code for the quadratic equation is even worse (use View
Source if you want to see it).
Good ol' tables
Several years ago, I wrote about the
Twelve
Days of Christmas and how to calculate the total number of gifts
represented in the song.
I needed summations, and I was rather proud of working out a way to
use HTML tables to display all the sums and line up everything correctly.
It wasn't exactly publication-quality graphics, but it was readable.
More recently, I worked out a way to do exponentials that way,
and found a hint about
how to do integrals:
| now | | |
| ∫
| P (t) |
dt |
P0 = | ———— |
| 1 + t |
| 0 | | |
Looks a little better than the tiny MathML version. But the code isn't
any easier to read:
<table border="0" cellpadding="0" cellspacing="0">
<tr><td><td align="center"><small><i>now</i></small></td><td></td><td></td></tr>
<tr>
<td>
<td rowspan="3" valign="middle"><font size="6" style="font-size:3em" class="bigsym">∫</font>
<td align="center"><i>P</i> (<i>t</i>)</td>
<td rowspan="3" valign="middle"> <i>dt</i></td></tr>
<tr><td>P<sub>0</sub> =<td align="center">————</td></tr>
<tr><td><td align="center">1 + <i>t</i></td></tr>
<tr><td><td valign="top"><small><i>0</i></small></td><td></td><td></td></tr>
</table>
The solution: MathJax
And then I discovered MathJAX.
It was added recently to the Udacity
forums, and I think it's also what MITx
is using for their courses.
MathJax is fantastic. It's an open-source library that lets you
specify equations in readable ways -- you can use MathML, but you
can also use LaTEX or even ASCII math like `x = (-b +- sqrt(b^2-4ac))/(2a) .`
It uses Javascript: you put your equations in the text of the page
with delimiters like $$ around them (you can control the delimiters),
then run a function that scans the page content and replaces any
equations it sees with pretty graphics. (Viewers using NoScript
or similar extensions will need to allow mathjax.org to see the
equations, unless you make a local copy of the mathjax.org libraries,
which you probably should anyway if you're using a lot of equations.)
For displaying those graphics,
MathJax might use MathML, HTML and CSS, or whatever, depending on the
user's browser ... but you don't have to worry about that.
(Alas,
even
in Firefox, MathML rendering isn't up to par so MathJax doesn't
use it by default, though you can
specify it as
an option if you know your equations render well.)
Here's that integral again, using LaTeX format:
$$ P_0 =\int_0^\infty \frac {P(t) dt}{1 + t} $$
and
$$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$
It's beautiful! And although I don't know LaTex at all -- been wanting an
excuse to learn it -- I put together that integral with five minutes
of web searching. (The quadratic code came from a MathJax demo page.)
Here's what the code looks like:
$$ P_0 =\int_0^\infty \frac {P(t) dt}{1 + t} $$
$$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$
MathJax is even smart enough to notice the code there is in a
<pre> tag, so I didn't have to find a way to escape it.
I'm sold! The MathJax team has really put together a nice package, and
I think we'll be seeing it on a lot more websites.
If you want to try it, start here:
Getting Started
with MathJAX.
Tags: math, science, html, web, mathml
[
16:45 Apr 03, 2012
More science |
permalink to this entry |
]
Thu, 12 Jan 2012
When I give talks that need slides, I've been using my
Slide
Presentations in HTML and JavaScript for many years.
I uploaded it in 2007 -- then left it there, without many updates.
But meanwhile, I've been giving lots of presentations, tweaking the code,
tweaking the CSS to make it display better. And every now and then I get
reminded that a few other people besides me are using this stuff.
For instance, around a year ago, I gave a talk where nearly all the
slides were just images. Silly to have to make a separate HTML file
to go with each image. Why not just have one file, img.html, that
can show different images? So I wrote some code that lets you go to
a URL like img.html?pix/whizzyphoto.jpg, and it will display
it properly, and the Next and Previous slide links will still work.
Of course, I tweak this software mainly when I have a talk coming up.
I've been working lately on my SCALE talk, coming up on January 22:
Fun
with Linux and Devices (be ready for some fun Arduino demos!)
Sometimes when I overload on talk preparation, I procrastinate
by hacking the software instead of the content of the actual talk.
So I've added some nice changes just in the past few weeks.
For instance, the speaker notes that remind me of where I am in
the talk and what's coming next. I didn't have any way to add notes on
image slides. But I need them on those slides, too -- so I added that.
Then I decided it was silly not to have some sort of automatic
reminder of what the next slide was. Why should I have to
put it in the speaker notes by hand? So that went in too.
And now I've done the less fun part -- collecting it all together and
documenting the new additions. So if you're using my HTML/JS slide
kit -- or if you think you might be interested in something like that
as an alternative to Powerpoint or Libre Office Presenter -- check
out the presentation I have explaining the package, including the
new features.
You can find it here:
Slide
Presentations in HTML and JavaScript
Tags: speaking, javascript, html, web, programming, tech
[
21:08 Jan 12, 2012
More speaking |
permalink to this entry |
]
Sun, 08 Jan 2012
I've been having (mis)adventures learning about Python's various
options for parsing HTML.
Up until now, I've avoided doing any HTMl parsing
in my RSS reader FeedMe.
I use regular expressions to find the places where content starts and
ends, and to screen out content like advertising, and to rewrite links.
Using regexps on HTML is generally considered to be a no-no, but it
didn't seem worth parsing the whole document just for those modest goals.
But I've long wanted to add support for downloading images, so you
could view the downloaded pages with their embedded images if you so chose.
That means not only identifying img tags and extracting their src
attributes, but also rewriting the img tag afterward to point to the
locally stored image. It was time to learn how to parse HTML.
Since I'm forever seeing people flamed on the #python IRC channel for
using regexps on HTML, I figured real HTML parsing must be straightforward.
A quick web search led me to
Python's built-in
HTMLParser class. It comes with a nice example for how to use it:
define a class that inherits from HTMLParser, then define
some functions it can call for things like handle_starttag and
handle_endtag; then call self.feed(). Something like this:
from HTMLParser import HTMLParser
class MyFancyHTMLParser(HTMLParser):
def fetch_url(self, url) :
request = urllib2.Request(url)
response = urllib2.urlopen(request)
link = response.geturl()
html = response.read()
response.close()
self.feed(html) # feed() starts the HTMLParser parsing
def handle_starttag(self, tag, attrs):
if tag == 'img' :
# attrs is a list of tuples, (attribute, value)
srcindex = self.has_attr('src', attrs)
if srcindex < 0 :
return # img with no src tag? skip it
src = attrs[srcindex][1]
# Make relative URLs absolute
src = self.make_absolute(src)
attrs[srcindex] = (attrs[srcindex][0], src)
print '<' + tag
for attr in attrs :
print ' ' + attr[0]
if len(attr) > 1 and type(attr[1]) == 'str' :
# make sure attr[1] doesn't have any embedded double-quotes
val = attr[1].replace('"', '\"')
print '="' + val + '"')
print '>'
def handle_endtag(self, tag):
self.outfile.write('</' + tag.encode(self.encoding) + '>\n')
Easy, right? Of course there are a lot more details, but the
basics are simple.
I coded it up and it didn't take long to get it downloading images
and changing img tags to point to them. Woohoo!
Whee!
The bad news about HTMLParser
Except ... after using it a few days, I was hitting some weird errors.
In particular, this one:
HTMLParser.HTMLParseError: bad end tag: ''
It comes from sites that have illegal content. For instance, stories
on Slate.com include Javascript lines like this one inside
<script></script> tags:
document.write("<script type='text/javascript' src='whatever'></scr" + "ipt>");
This is
technically illegal html -- but lots of sites do it, so protesting
that it's technically illegal doesn't help if you're trying to read a
real-world site.
Some discussions said setting
self.CDATA_CONTENT_ELEMENTS = ()
would help, but it didn't.
HTMLParser's code is in Python, not C. So I took a look at where the
errors are generated, thinking maybe I could override them.
It was easy enough to redefine parse_endtag()
to make it not throw
an error (I had to duplicate some internal strings too). But then I
hit another error, so I redefined unknown_decl()
and
_scan_name()
.
And then I hit another error. I'm sure you see where this was going.
Pretty soon I had over 100 lines of duplicated code, and I was still
getting errors and needed to redefine even more functions.
This clearly wasn't the way to go.
Using lxml.html
I'd been trying to avoid adding dependencies to additional Python packages,
but if you want to parse real-world HTML, you have to.
There are two main options: Beautiful Soup and lxml.html.
Beautiful Soup is popular for large projects, but the consensus seems
to be that lxml.html is more error-tolerant and lighter weight.
Indeed, lxml.html is much more forgiving. You can't handle start and
end tags as they pass through, like you can with HTMLParser. Instead
you parse the HTML into an in-memory tree, like this:
tree = lxml.html.fromstring(html)
How do you iterate over the tree? lxml.html is a good parser, but it
has rather poor documentation, so it took some struggling to figure out
what was inside the tree and how to iterate over it.
You can visit every element in the tree with
for e in tree.iter() :
print e.tag
But that's not terribly useful if you need to know which
tags are inside which other tags. Instead, define a function that iterates
over the top level elements and calls itself recursively on each child.
The top of the tree itself is an element -- typically the
<html></html> -- and each element has .tag and .attrib.
If it contains text inside it (like a <p> tag), it also has
.text. So to make something that works similarly to HTMLParser:
def crawl_tree(tree) :
handle_starttag(tree.tag, tree.attrib)
if tree.text :
handle_data(tree.text)
for node in tree :
crawl_tree(node)
handle_endtag(tree.tag)
But wait -- we're not quite all there. You need to handle two
undocumented cases.
First, comment tags are special: their tag attribute,
instead of being a string, is <built-in function Comment>
so you have to handle that specially and not assume that tag
is text that you can print or test against.
Second, what about cases like
<p>Here is some <i>italicised</i> text.</p>
? in this case, you have the p tag, and its text is
"Here is some ".
Then the p has a child, the i tag, with text of "italicised".
But what about the rest of the string, " text."?
That's called a tail -- and it's the tail of the adjacent i tag it follows,
not the parent p tag that contains it. Confusing!
So our function becomes:
def crawl_tree(tree) :
if type(tree.tag) is str :
handle_starttag(tree.tag, tree.attrib)
if tree.text :
handle_data(tree.text)
for node in tree :
crawl_tree(node)
handle_endtag(tree.tag)
if tree.tail :
handle_data(tree.tail)
See how it works? If it's a comment (tree.tag isn't a string),
we'll skip everything -- except the tail. Even a comment
might have a tail:
<p>Here is some <!-- this is a comment --> text we want to show.</p>
so even if we're skipping comment we need its tail.
I'm sure I'll find other gotchas I've missed, so I'm not releasing
this version of feedme until it's had a lot more testing. But it
looks like lxml.html is a reliable way to parse real-world pages.
It even has a lot of convenience functions like link rewriting
that you can use without iterating the tree at all. Definitely worth
a look!
Tags: python, programming, html
[
15:04 Jan 08, 2012
More programming |
permalink to this entry |
]
Fri, 25 Jun 2010
Several groups I'm in insist on using LinkedIn for discussions,
instead of a mailing list. No idea why -- it's so much harder to use
-- but for some reason that's where the community went.
Which is fine except for happens just about every time I try to view
a discussion:
I get a notice of a thread that sounds interesting, click on the link
to view it, read the first posting, hit the space bar to scroll down
... whoops! Focus was in that silly search field at the top right of the page,
so it won't scroll.
It's even more fun if I've already scrolled down a bit with the
mousewheel -- in that case, hitting spacebar jumps back up to the
top of the page, losing any context I have as well as making me
click in the page before I can actually scroll.
Setting focus to search fields is a good thing on some pages.
Google does it, which makes terrific sense -- if you go to
google.com, your main purpose is to type something in that search box.
It doesn't, however, make sense on a page whose purpose is to
let people read through a long discussion thread.
Since I never use that search field anyway, though, I came up with
a solution using Firefox's user css.
It seems there's no way to make an input field un-focusable or
read-only using pure CSS (of course, you could use Javascript and
Greasemonkey for that); but as long as you don't need to use it,
you can make it disappear entirely.
Add this line to chrome/userContent.css inside your Firefox profile
(create it if it doesn't already exist):
form#global-search span#autocomplete-container input#main-search-box {
visibility:hidden;
}
Then restart Firefox and load a discussion page.
The search box should be hidden,
and spacebar should scroll the page just like it does on most web pages.
Of course, this will need to be updated the next time
LinkedIn changes their page layout. And it's vaguely possible that
somewhere else on the web is a page with that hierarchy of element names.
But that's easy enough to fix: run a View Page Source
on the LinkedIn page and add another level or two to the CSS rule.
The concept is the important thing.
Tags: html, web, css, tips, annoyances
[
17:17 Jun 25, 2010
More tech/web |
permalink to this entry |
]
Sun, 20 Jun 2010
Regular readers probably know that I use
HTML
for the slides in my talks, and I present them either with Firefox
in fullscreen mode, or with my own Python
preso
tool based on webkit.
Most of the time it works great. But there's one situation that's
always been hard to deal with: low-resolution projectors.
Most modern projectors are 1024x768, and have been for quite a few years,
so that's how I set up my slides. And then I get asked to give a talk
at a school, or local astronomy club, or some other group that
has a 10-year-old projector that can only handle 800x600. Of course,
you never find out about this ahead of time, only when you plug in
right before the talk. Disaster!
Wait -- before you object that HTML pages shouldn't use pixel values and
should work regardless of the user's browser window size: I completely
agree with you. I don't specify absolute font sizes or absolute
positioning on web pages -- no one should.
But presentation slides are different: they're designed for
a controlled environment where everyone sees the same thing using the
same software and hardware.
I can maintain a separate stylesheet -- that works for making the
font size smaller but it doesn't address the problem of pictures too
large to fit (and we all like to use lots of pictures in presentations,
right?) I can maintain two separate copies of the slides for the two sizes,
but that's a lot of extra work and they're bound to get out of sync.
Here's a solution I should have thought of years ago: full-page zoom.
Most major browsers have offered that capability for years, so the
only trick is figuring out how to specify it in the slides.
IE and the Webkit browsers (Safari, Konqueror, etc.) offer a wonderful
CSS property called zoom. It works like this:
body {
zoom: 78.125%;
}
78.125% is the ratio between an 800-pixel projector and a 1024-pixel one.
Just add this line, and your whole page will be scaled down to the
right size. Lovely!
Lovely, except it doesn't work on Firefox
(bug 390936).
Fortunately, Firefox has another solution: the more general and not yet
standardized CSS transform, which Mozilla has implemented as the
Mozilla-specific property
-moz-transform.
So add these lines:
body {
position: absolute; left: 0px; top: 0px;
-moz-transform: scale(.78125, .78125);
}
The position: absolute is needed because when Firefox scales
with -moz-transform, it also centers whatever it scaled, so the
slide ends up in the top center of the screen.
On my laptop, at least, it's the upper left part of the screen that
gets sent to the projector, so slides must start in the upper left corner.
The good news is that these directives don't conflict; you can put
both zoom and -moz-transform in the same rule and things
will work fine. So I've added this to the body rule in my slides.css:
/* If you get stuck on an 800x600 projector, use these:
zoom: 78.125%;
position: absolute; left: 0px; top: 0px;
-moz-transform: scale(.78125, .78125);
*/
Uncomment in case of emergency and all will be well.
(Unless you use Opera, which doesn't seem to understand either version.)
Tags: speaking, html, css, browsers, firefox, mozilla
[
12:14 Jun 20, 2010
More tech/web |
permalink to this entry |
]
Mon, 29 Mar 2010
I maintain the websites for several clubs. No surprise there -- anyone
with even a moderate knowledge of HTML, or just a lack of fear of
it, invariably gets drafted into that job in any non-computer club.
In one club, the person in charge of scheduling sends out an elaborate
document every three months in various formats -- Word, RTF, Excel, it's
different each time. The only regularity is that it's always full of
crap that makes it hard to turn it into a nice simple HTML table.
This quarter, the formats were Word and RTF. I used unrtf to turn
the RTF version into HTML -- and found a horrifying page full of
lines like this:
<body><font size=3></font><font size=3><br>
</font><font size=3></font><b><font size=4></font></b><b><font size=4><table border=2>
</font></b><b><font size=4><tr><td><b><font size=4><font face="Arial">Club Schedule</font></font></b><b><font size=4></font></b><b><font size=4></font></b></td>
<font size=3></font><font size=3><td><font size=3><b><font face="Arial">April 13</font></b></font><font size=3></font><font size=3><br>
</font><font size=3></font><font size=3><b></b></font></td>
I've put the actual page content in bold; the rest is just junk,
mostly doing nothing, mostly not even legal HTML,
that needs to be eliminated if I want
the page to load and display reasonably.
I didn't want to clean up that mess by hand! So I needed some regular
expressions to clean it up in an editor.
I tried emacs first, but emacs makes it hard to try an expression then
modify it a little when the first try doesn't work, so I switched to vim.
The key to this sort of cleanup is non-greedy regular expressions.
When you have a bad tag sandwiched in the middle of a line containing
other tags, you want to remove everything from the <font
through the next > -- but no farther, or else you'll delete
real content. If you have a line like
<td><font size=3>Hello<font> world</td>
you only want to delete through the <font>, not through the </td>.
In general, you make a regular expression non-greedy by adding a ?
after the wildcard -- e.g. <font.*?>. But that doesn't work
in vim. In vim, you have to use \{M,N} which matches
from M to N repetitions of whatever immediately precedes it.
You can also use the shortcut \{-} to mean the same thing
as *? (0 or more matches) in other programs.
Using that, I built up a series of regexp substitutes to clean up
that unrtf mess in vim:
:%s/<\/\{0,1}font.\{-}>//g
:%s/<b><\/b>//g
:%s/<\/b><b>//g
:%s/<\/i><i>//g
:%s/<td><\/td>/<td><br><\/td>/g
:%s/<\/\{0,1}span.\{-}>//g
:%s/<\/\{0,1}center>//g
That took care of 90% of the work, leaving me with hardly any cleanup
I needed to do by hand. I'll definitely keep that list around for
the next time I need to do this.
Tags: regexp, html, editors, vim
[
23:02 Mar 29, 2010
More linux/editors |
permalink to this entry |
]