Topic: Wikicha Administravia

This thread is for general ideas wrt advancing tea on the net, not even specifically wikicha, we can build our own pages if things don't fit the model of a wiki. Other ideas I have currently is a blog aggregator based around this group - the livejournal page is not blog agnostic, one aggregator is manually maintained by a single person, and the last one is owned by a commercial vendor with a serious signal to noise problem. Our first topic for improvement is The Glossary.

Will and I had a chat earlier about building out a glossary of tea terms (think Babelcarp) on wikicha.

Why is this better than Babelcarp?
- Searchable, sortable as a complete set.
- Anyone can add or edit
- Easy to edit in batch
- Indexed by Google, both lets users find meaningful results directly from their search engine and raises the profile of wikicha in search engines,
- You may happen upon something you didn't even know you are looking for!

So we will start by building a simple wiki table with the following columns: pinyin, simplified characters, traditional characters, wiki article link, English description.
When I have a prototype ready to go, I'll link it on this thread so folks can have a look.

Re: Wikicha Administravia

Regarding the community aggregator, I set up an example here with some feeds from folks I figured wouldn't mind much (if you do, I'll promptly remove you).
Could be improved a bit aesthetically, but I have always really enjoyed community aggregators in the open source world and I would be happy to give this a good home if it seems useful.
http://brandonhale.us/~brandon/planet-n … es/output/

Re: Wikicha Administravia

ps - I am definitely very for this, though I hope that if a bunch of Teadrunk folks are involved in discussing / editing the glossary that we can work out some cross-promotion and maybe have some of the discussion parts here (rather than a wiki talk page). Just a thought. I think such a glossary could be a really big help to users of the forum, to Wikicha, and also to people who are looking for information about tea.

I think it would be great if we could include sound recordings by native speakers, and links to online dictionaries (like nciku) as well.

Re: Wikicha Administravia

william wrote:

ps - I am definitely very for this, though I hope that if a bunch of Teadrunk folks are involved in discussing / editing the glossary that we can work out some cross-promotion and maybe have some of the discussion parts here (rather than a wiki talk page). Just a thought.

To the end of cross promotion, and related to my rambling post above, I put a link to the forum on my own livechat on the wiki front page. It gets quite a few hits so maybe it will entice a few new users here. The tie ins / cross promotion are part of my long lived scheme to have a connected group of websites providing English language tea knowledge. This still sounds vague but the wiki and community blog are examples.

I was also thinking this and livechat are better ways to discussion wiki changes than in the Talk section, generally. I often miss things that happen in Talk anyway, even though I am subscribed to the all changes rss.

william wrote:

I think it would be great if we could include sound recordings by native speakers, and links to online dictionaries (like nciku) as well.

Definitely for this, but I don't have the any ideas how to pull this one off. I have no idea how to pronounce most things, like Shui xian for example.

Re: Wikicha Administravia

brandon wrote:
william wrote:

I think it would be great if we could include sound recordings by native speakers, and links to online dictionaries (like nciku) as well.

Definitely for this, but I don't have the any ideas how to pull this one off. I have no idea how to pronounce most things, like Shui xian for example.

Nciku has recordings of a lot of individual words as pronounced in fairly standard mandarin, though because of tone interactions, phrases or compound words might sound slightly different than the words strung together (see the example below of two individual words, then the compound word). Teaspring and some other merchants have recordings of some tea names. If we get a big enough list, we could maybe cajole (or bribe) some native speakers into doing recordings for us, at least in Mandarin. I know Cloud was working on recordings of words for his pu'er glossary, but not sure if they're online anywhere or if we're allowed to use them.

As far as knowing how to pronounce things, at least in terms of stuff written in hanyu pinyin, a good start is looking at some of the excellent guides online, like the one at Sinosplice. One bad thing about having a romanization system that looks like English but isn't designed to be pronounced phonetically as it would be in English is that it's very tempting and easy to pronounce things incorrectly (even if you "know better").

Since you asked about a specific word, I'll try and give you some rough ideas; disclaimer... I'm neither a native speaker of Chinese nor a very good speaker... maybe someone else could explain it better or more accurately. Shuǐ xiān is a little tough; in Mandarin, it's between "shooey" and "shway", and shien (like dog in french), with a tone that starts in the middle and goes up on the first word, and a high flat tone on the second. The difference between sh and x is a little tricky to explain... they're similar sounds, but with sh, the tongue is pretty far back in the mouth, where x is closer to a normal western sh, but with the tongue even a little further forward, so the sound is a little brighter. A lot of Mandarin speakers (esp. folks in South China and Taiwan) don't actually say 'sh' 'zh' and 'ch' sounds "properly" (i.e., in standard Mandarin) - so you might also hear it said with a straight s sound on the first word (sway shien). As a non-native speaker, I think most people would argue it's better to err on the side of standard Mandarin.

http://www.nciku.com/search/zh/detail/%E6%B0%B4/1314250
http://www.nciku.com/search/zh/detail/%E4%BB%99/1316460

You can actually hear the two together at:
http://www.nciku.com/search/zh/detail/% … B%99/39460
(shui xian is a compound word which refers to a particular flower)

6 (edited by brandon 2009-01-24 00:05:08)

Re: Wikicha Administravia

Ok Will, I laid out the bare format to see how it might look. it would be nice if we could get the raw data from somewhere and just do some scripting to put it into the wiki format.

http://wikicha.com/index.php/Glossary

Mediawiki does indeed have client-side sortable tables, currently working my way through this doc:
http://meta.wikimedia.org/wiki/Help:Sorting

Re: Wikicha Administravia

brandon wrote:

Ok Will, I laid out the bare format to see how it might look. it would be nice if we could get the raw data from somewhere and just do some scripting to put it into the wiki format.

http://wikicha.com/index.php/Glossary

Nice. That's pretty much what I had in mind. Maybe an "aliases" column too for common terms used for a tea from other romanization systems, nicknames, etc. (i.e., for tieguanyin, ti kwan yin, ti guan yin, tiekwanyin, TGY, TKY). My other idea was to have top level sections, and maybe sub-sections, like "teas", "teaware" => "yixing clay types", etc. At some point might be good to be able to have internal references ("see foo" or "see also foo"), preferably based on existing wiki redirects / links.

Is there a way to tie the stuff in the glossary to the article via a template? I.e., have a standard template at the top of the page for a particular tea, and have that tied to the stuff in the glossary and vice-versa? In other words, a self-updating system, so that if someone edits the tea or the glossary, it changes it both places? Probably the simplest way to do this would be to have the actual data in the page for the tea (in a standard format), then have the glossary extract that information dynamically (or periodically). Doing it periodically would have some advantages in terms of caching.

Scraping the data from (say) Babelcarp would be simple technically, however, I imagine Lew Perin might not appreciate it unless you can get him on-board with the project. The big advantage (or disadvantage) is that the data would be updatable by anyone. We would certainly (at some point) want to have a search interface as well as a flat "glossary" type page. The main problem I see with importing external data is what you do if there's already an existing wikicha page for a term (either with the exact same name, or with a similar, but different name).

Getting a big chunk of names in simplified and auto-translating to traditional is also easy (via Google translate or the like). Generating pinyin on the fly might be semi-easy if you wrote your own plugin - Chinese pera-kun uses a built in dictionary (which I think can be used by anyone) that maps characters to their pinyin.

I'm not much of a PHP programmer, but I could write some perl crap to do scraping / data-wrangling of stuff if no one else volunteers.

Re: Wikicha Administravia

ps - Don't mean to sound too greedy all at once. I know everything can't be perfect from the start, and something is better than nothing. I'm just brainstorming on how this thing worked in my head when I was thinking about it.

Content is obviously the most important thing.

Re: Wikicha Administravia

What you suggest is actually pretty possible with modifying the plugin I used to categorize the Chinese greens. It will be awhile before I start hacking on that, though.

Re: Wikicha Administravia

William wrote:

Maybe an "aliases" column too for common terms used for a tea from other romanization systems, nicknames, etc. (i.e., for tieguanyin, ti kwan yin, ti guan yin, tiekwanyin, TGY, TKY).

I was messing with the Wikicha site, and I agree with the above suggestion. Also I would like to suggest adding English names as some teas have a lot of them which are often confusing to people.

I would also like to ask what people thought about adding pinyin tones to the glossary and about how we should standardize the pinyin in the glossary. In some places each syllable is separated by a space, and some aren't. It might be better to stick the words together, othewise it could affect searches.

红焙浅瓯新火活,龙团小碾斗晴窗

Re: Wikicha Administravia

I think definitely hanyu pinyin (with tone markers); as far as sticking words together or not, not so sure. I guess compound words together, individual words apart? I would actually think everything separate would be most consistent.

As I was saying before, using the computer to generate the pinyin (and then correcting any minor inaccuracies) is probably the quickest way to do this.

Re: Wikicha Administravia

I was also wondering what people thought about making more subcategories. There are already separate areas for Yixing and Chinese teas, but should we separate Chinese teas into the six categories (black, red, green, white, yellow, oolong) so they would be sortable?

红焙浅瓯新火活,龙团小碾斗晴窗

Re: Wikicha Administravia

Yeah, to mock it up, but before we go too far in putting data into the page I need to work out how to dynamically generate the tables based on metadata on related pages.
Or we can cut and paste the data later.

14 (edited by LaoChaGui 2009-01-31 10:21:46)

Re: Wikicha Administravia

brandon wrote:

Yeah, to mock it up, but before we go too far in putting data into the page I need to work out how to dynamically generate the tables based on metadata on related pages.
Or we can cut and paste the data later.

Good idea. I would offer to help, but I am effectively programming illiterate. You should post somewhere (here, maybe) or shoot me an email when you have the formatting all set. I would like to help adding stuff to the glossary, but adding a bunch of disorganized data probably isn't much of a help.

红焙浅瓯新火活,龙团小碾斗晴窗

Re: Wikicha Administravia

Well adding the data is helpful as long as it's formatted in a way that will let us (later) format the whole glossary in a more attractive way.

As long as each individual article contains that information in a way that's consistent and extractable, it's still helpful to put the information in... I do feel, though, that it would be helpful if it could be edited glossary-style too... that would be much more time efficient in terms of building a comprehensive glossary (with an associated set of stub articles).