Go ahead!

Memoization for Everything

Collect invalid documents for bulk-insert in mongo-ruby-driver

| Comments

Mongo gem 1.6.0 includes my pull request(and HISTORY).

Background

Now, a service consists of many systems in production. As a result, some systems insert broken or invalid data to MongoDB.

Here is problem.

mongo-ruby-driver’s bulk-insert is dead or alive. If inserting docs has one invalid docuemnt, then insert operation failed. In addition, we can’t find invalid documents.

This behavior is not usable. We want to handle invalid documents, e.g. output to local file, ignoring documents and etc.

My pull request resolves this problem.

Usage

I introduced :collect_on_error to insert options.

insert without :collect_on_error:

1
2
# docs is [{}, {}, ...]
result = collection.insert(docs)

result is an array of inserted document.

insert with :collect_on_error:

1
result, invalid_docs = collection.insert(docs, :collect_on_error => true)

result is same as insert without :collect_on_error. invalid_docs is an array of invalid document which removed ObjectId field. We can handle invalid_docs manually. For example, see fluent-plugin-mongo.

Enjoy MongoDB with Ruby!

Released fluent-logger-d

| Comments

fluent-logger-d.

This client library has some TODOs, but works fine. I tested posting 5,000,000 messages with 5 threads, no problem.

In fact, D community now doesn’t want such library. Because current D community doesn’t face to the production.

I hope this library with Fluentd helps your system design in the future.

Fluentd meetup in Japan

| Comments

Fluentd meetup in Japan was held on Feb 4th.

This meetup included more than 120 over hackers. It has been really exciting and I had a great time. Thanks to Fluentd developers and users!

My presentation

I talked about “Dive into Fluent plugin”. The purpose of this presentaion shares know-how of Fluentd plugin. I hope this presentation helps your developing of Fluend plugin.

I didn’t talk about some un-documented features, e.g. EventStream families, ObjectBufferedOutput, the details of DetachMultiProcessMixin and etc. Let’s dive into Fluentd source code if you know these features.

Enjoy Fluentd and plugins :)

  • Slideshare(en)
  • Ustream(ja)

Sorry, the first few minutes is lost.

YAML engine mismatched problem in RubyGems

| Comments

Overview

I hit this problem at fluent-plugin-mongo 0.6.0. Some user failed fluent-plugin-mongo 0.6.0 installation with following message.

1
2
3
# fluent-gem install fluent-plugin-mongo
ERROR:  While executing gem ... (NoMethodError)
    undefined method `call' for nil:NilClass

In short, you should use the range form, e.g.:

1
gem.add_dependency "mongo", [">= 1.5.2", "<= 1.5.2"]

for an exact match dependency for old or Syck environment when building a gem.

Detail

YAML has ‘=’ keyword for default key.

RubyGems uses ‘=’ for an exact match dependency. Current Ruby’s YAML engine is Psych. Psych treat unquoted ‘=’ correctly.

For example, if you add an exact match dependency to the gem:

1
gem.add_dependency "mongo", "= 1.5.2"

then Psych generates following metadata:

1
2
3
4
5
6
7
8
9
10
11
- !ruby/object:Gem::Dependency
  name: mongo
  requirement: &id002 !ruby/object:Gem::Requirement
    none: false
    requirements:
    - - =
      - !ruby/object:Gem::Version
        version: 1.5.2
  type: :runtime
  prerelease: false
  version_requirements: *id002

Problem is here. See line 6.

Old Ruby’s YAML engine is Syck. Syck can’t treat unquoted ‘=’ correctly. In Psych, the result of loading this metadata is correct:

1
"= 1.5.2"

but, Syck parses unquoted ‘=’ as a YAML’s default key by mistake. So loaded result is broken:

1
"#<Syck::DefaultKey:0x0000010380cc40> 1.5.2"

RubyGems don’t know #<Syck::DefaultKey:0x0000010380cc40> operation. In the result, the gem installation failed in Syck environement.

Solution

Use the range form for an exact match dependency

Psych and Syck treat ‘<=’ / ‘>=’ correctly, so we can use following form to avoid this problem.

1
gem.add_dependency "mongo", [">= 1.5.2", "<= 1.5.2"]

This is an ad-hoc approach, but works fine :)
I strongly recommend this approach.

Update RubyGems for user

Latest RubyGems fixed this problem. See repository

Build a gem using Syck

Using Syck, generated metadata is below:

1
2
3
4
5
6
7
8
9
10
11
- !ruby/object:Gem::Dependency
  name: mongo
  requirement: &id002 !ruby/object:Gem::Requirement
    none: false
    requirements:
    - - "="
      - !ruby/object:Gem::Version
        version: 1.5.2
  type: :runtime
  prerelease: false
  version_requirements: *id002

Syck treat ‘”=”’ as a string, not default key. So we can use ‘=’ for an exact match dependency.

If you use Bundler, then put YAML::ENGINE.yamler = "syck" on top of gemspec to enable Syck engine. But some environment doesn’t have Syck engine, so this approch doesn’t work in such environment…