![](images/wiki.gif) | ApacheYahoo |
WebHome | UnixGeekTools | Geekfarm | About This Site
Pointers
my notes from ApacheCon 2005
yahoo is really big
- 411M unique visitors per month
- 191M active registered users
- 11.4M fee-paying customers
- 3.4B average daily pageviews
yapache
- yahoo's modified version of apache
- pronounced why-apache
- based on 1/3 - port to 2.2 in 2006
supressing HTTP Server header
- Server: Apache/1.3.3 (UNIX) ......
- header is generally useful for debugging
- NOTE: windows media server - won't play stream without server header
- security - security through obscurity
- bandwidth conservation - really adds up
- apache license forbids calling derived projects "apache"
- the real reason - history - "netscape guide by Yahoo"
- business agreement with netscape
why still using 1.3?
- yahoo added gzip support in 1998
- performs really well
- very stable
- competent with the code base
- don't need no stinkin' threads anyways
what's wrong with threads
- too hard for most programmers to use
- even for experts, development is painful
- prefork MPM R00LZ!
- prefer processes over thread
- better fault isolation
- one child crashes, only a single user gets disconnected
- better programming model for C/C++
- private data by default
- shared data requires extra work - ( mmap + syncronization )
logfiles
- common log format problems
- no standard place to put cookies, ad ids, request duration
- time spent on formatting
- escaping unsafe chars (\")
- format timestamps to human readable timestamp
- when parsing, convert back to time_t
- don't bother logging 200 status code - wasted bytes
- HTTP protocol version in %r - do we really care if 1.0 or 1.1?
yapache Access Log
- IP
- request time
- request duration
- bytes sent
- uri + http host
- http method (+content-length if POST/PUT)
- response stats (only if not 200 OK)
- Cookies
- User-Agent
- Referer
- Ad IDs
- User defined... see slide
- one request per line
- first 32 bytes
- numberic values in hex followed by URI
- no delimiter
- others followed by ^E delimited named fields
Signal-free Log Rotation
- no signals or pipes
- rotate logfiles by renaming them
- stat() logfile every 60 seconds
- if inode changed, close and re-open
- during 60 second interval, child procs may write to either logfile
- log directory must be writable by User
bandwidth reduction
- smaller 30x response bodies significantly reduced
- user will never seen, agent will never render
- can't have a blank body - some user agents don't deal
- on-the-fly gzip
- similar in spirit to mod_deflate
- prereq - http/1.1, Accept-Encoding: gzip, IE 6+ or Mozilla 5+
- Disabled when CPU < 10% idle
- gzip level 6 - a little more cpu, but worth it
how many servers?
- StartServers, MaxServers, etc.
- MaxClients - the only one that matters
- everything else is relative
- constant pool size is good
- start all MaxClients servers at once
- put host in load-balancer rotation
- never kill off idle servers
- any servers killed by MaxRequestsPerChild still get replaced
- for 99% of sites, MaxClients is sufficient
- yapache - Min/Max/StartServers are disabled
- MaxClients usually set < 100
waiting for the client sucks - blocking
- let the kernel do the buffering
- large SendBufferSize - 224k
- NO_LINGCLOSE
- SO_ACCEPTFILTER with "httpready"
- apache won't wake up from accept() until a full HTTP GET request has been buffered by kernel
- Entire request present in first read()
- Apache child processes able to do useful work immediately
- more efficient use of server pool
- SendBufferSize - 229376
- to go higher, adjust kernel tunable - kern.ipc.maxsockbuf (FreeBSD)
- set to max response size ( HTML + headers )
- tradeoff
- avoids blocking on write() to socket
- more kernel memory consumed
- NO_LINGCLOSE
- don't wait for the client to read the response
- write full response into socket buffer
- close the socket
- apache child returns to pool - kernel worries about completing data xfer to client
- no idea if client read whole response
- if client bails out halfway through or goes away, apache won't log it
- no change to TCP window sizes
YahooHostHtmlComment
- comments at end of HTML pages
- for debugging page or cache problems
- users save html, send to customer care
- engineers examine error log on server using timestamp/servername
- ap_finalize_request_protocol() patch
- http://foo.yahoo.com/bin/hostname
SSL Acceleration
- Cavium Nitrox CN1120 - 14k RSA opts/s - OpenSSL 0.9.7 engine API
- can handle as much SSL traffic as a non-SSL server w/o card
- stunnel on port 443 - delegates RSA to hardware
- forwards to apache port 80
- mod_stunnel - apache + stunnel glue
- overrides getpeername()
- returns ip address of actual client
- emulates mod_ssl environment
- sets an environment variable if running under SSL
- tell the difference between ssl and non-ssl traffic in web applications
Kicking the Bucket
- avoid mod_whatkilledus.c
- stay away from this, but interesting
- trashed stacks frequently cause SEGV or BUS
- fatal signal handlers can get into an infinite coredump loop
- system calls to generate core causes additional coredumps
- our set_signals() never uses sig_coredump()
- let child core quickly and in-context
- corefiles w/o CoreDumpDirectory
- FreeBSD - sysctl -w kern.coredump=1 kern.sugid_coredump=1 kern.corefile="/var/crash/%N.core.%U"
- linux - see slide
- can fill up disks if you're droping core like crazy
- don't multi-signal in recliam_child_process
- default - parent process sends SIGHUP, then another SIGHUP SIGTERM, then SIGKILL
- yapache skips second SIGHUP and SIGTERM
Include directive
- yahoo's httpd.conf endis with Include conf/include/*.conf
- wildcard safer than entire directory
- avoid Emacs abc.conf~ backup files - sorted after abc.conf, overrides
- Yahoo sites install their own $SR/conf/include/foobar.conf
- override settings such as ServerAdmin or MaxClients
setproctitle() in child_main()
- ip address and url shown in child process name
- only works on freebsd
ysar - inspired by SysV sar
- requests per second, cpu, mem, sysc/pkt, outputbps
take aways
- every byte counts
- every cpu cycle counts
- use the right tools for the job
- apache - dynamic content generation
- os - buffering content in and out
- dedicated chips for crypto
- when it's time to die
- fail fast and in-context
- use multi-process for fault isolation
questions
- yapache source is not available
- tons of different dynamic content solutions - lots of PHP, dso, etc.
- load balancers - altion and foundry hardware - layer 4
Updated Sun Jul 23, 2006 12:10 PM