Skip to content

Log Format

Rafael JPD edited this page Jan 12, 2025 · 1 revision

Expected Log File Content Format

A series of line formats are supported. It is important that each line contains the real IP (IPv4 or IPv6) of the accessed URL, the date and time of each access, the HTTP method of access (GET), the access status code (200, 204, 301, among others), the accessed URL, and the user agent used to obtain the content (e.g., mobile device browser). Other information such as content size and response time may also be included in the log.

It is extremely important that the IP provided in each log line is real, as this information is used to determine a user session, a concept used in the subsequent steps of the access calculator to remove double clicks and other noise that artificially increase the results. If the provided IP is local (127.0.0.1, 168.0.0.1, 17.0.0.1, 10.0.0.1, among others) or is absent, the line is discarded, as it does not allow the definition of a user session - this invalidates the access count according to Project COUNTER R5 standards.

If the access is made by a user agent that is not a web browser, such as a robot or a crawler, the line is also discarded. URLs representing static files (e.g., images, style sheets, among others) are also discarded. Below are some examples of valid log lines, i.e., those that represent accesses to article pages in abstract or full format:

  1. scielo.isciii.es 117.64.147.191 - - [12/Feb/2024:04:23:09 +0100] "GET /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract HTTP/1.1" 200 18575 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3432.118 Safari/537.36" 90571 364 18950
  2. 45.65.189.47 45.65.189.47, 198.41.230.129 - [06/Oct/2024:00:00:16 -0300] "GET /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract HTTP/1.1" 200 166 "https://www.scielo.cl/scielo.php?pid=S0718-50732020000300308&script=sci_arttext&tlng=pt" "Mozilla/5 .0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15"
  3. 186.130.151.215 186.130.151.215 172.69.138.111 [10/Dec/2024:00:00:12 0300] "GET /scielo.php?pid=S0718-07642017000400014&script=sci_arttext HTTP/1.1" 304 166 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
  4. 186.130.151.215 186.130.151.215 172.69.138.111 [10/Dec/2024:00:00:120300] "GET /scielo.php?pid=S0718-07642017000400014&script=sci_arttext HTTP/1.1" 304 166 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
  5. 45.65.189.47 - [06/Oct/2024:00:00:16 -0300] "GET /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract HTTP/1.1" 200 166 "https://www.scielo.cl/scielo.php?pid=S0718-50732020000300308&script=sci_arttext&tlng=pt" "Mozilla/5 .0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15"
  6. 45.65.189.47 - [06/Oct/2024:00:00:16 -0300] "GET /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract HTTP/1.1" 200 166 "https://www.scielo.cl/scielo.php?pid=S0718-50732020000300308&script=sci_arttext&tlng=pt" "Mozilla/5 .0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15"

The following table lists the expected/detected fields in each log line highlighted in the previous list:

IP HTTP Method HTTP Code URL User Agent
117.64.147.191 GET 200 /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3432.118 Safari/537.36
45.65.189.47 GET 200 /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract Mozilla/5 .0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15
186.130.151.215 GET 304 /scielo.php?pid=S0718-07642017000400014&script=sci_arttext Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
186.130.151.215 GET 304 /scielo.php?pid=S0718-07642017000400014&script=sci_arttext Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
45.65.189.47 GET 200 /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract Mozilla/5 .0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15
45.65.189.47 GET 200 /scielo.php?lng=es&nrm=i&pid=S0213-91112023000100500&script=sci_abstract Mozilla/5 .0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15

Clone this wiki locally